12/11/2018
The present work describes an advanced text categorization procedure developed and successfully
used in aerospace industry, especially for safety assessment, analysis and improvement. The purpose is
the computerized analysis and interpretation of human reported free-text aviation safety records, in order to
automatically “read”, discover and treat anomalies occurred in the field. The methodology and algorithms
were verified on actual, significant and appropriate ASRS (Aviation Safety Reporting System) data base
(http://asrs.arc.nasa.gov/index.html) as well as other similar data bases containing millions of unprocessed safety
and reliability reports. One of the most important applications and goals of the research is to assign new incoming
safety event reports to one or more of several predefined categories on the basis of their textual content.
Optimal categorization functions can be constructed from labeled training examples (i.e., after human expertise)
by means of supervised learning algorithm and cross-validation. Numerous methods for text categorization
have been previously developed such as Neural Networks, Naive Bayes, AdaBoost, Linear Discriminant
Analysis, Logistic Regression, Support Vector Machines (SVM), etc. SVM has become a popular
learning algorithm, used in particular for large, high-dimensional classification problems; it has been shown
to give most accurate classification results in a variety of applications. However, the direct application of
these methods to Aerospace Anomaly Discovery is restricted for the following reasons:
a) fully automatic procedure can support only middle values of Recall and Precision (50-75 %);
b) lack of stability of the reports statistical parameters - i.e. the frequency of words in a report has been
changing on a "year to year" basis.