Text mining is an emerging technology that can be used to augment existing data in corporate databases by making unstructured text data available for analysis. The incredible increase in online documents, which has been mostly due to the expanding internet, has renewed the interest in automated document classification and data mining. The demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Text classification is the process of classifying documents into predefined categories based on their content. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. Both classification and association rule mining are indispensable to practical applications. For association rule mining, the target of discovery is not pre-determined, while for classification rule mining there is one and only one predetermined target. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, such an integrated framework, called associative classification is used for text categorization The algorithm presented here for text classification uses words as features , to derive feature set from preclassified text documents. The concept of Naïve Bayes classifier is then used on derived features for final classification.
Shrivastava, Padmavati and Ansari, Uzma
"Text Categorization based on Associative Classification,"
International Journal of Computer and Communication Technology: Vol. 1
, Article 4.
Available at: https://www.interscience.in/ijcct/vol1/iss2/4