Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 2

Abstract

 Abstract : Most machine learning techniques employed in the area of text classification require the features ofthe documents to be effectively selected owing to the large chunk of data encountered in the classificationprocess and term weights built from document vectors for proper infusing into the respective classifieralgorithms. Effective selection of the most important features from the raw documents is achieved byimplementing more extensive pre-processing techniques and the features obtained were ranked using the chisquarestatistical approach for the elimination of irrelevant features and proper selection of more relevantfeatures in the entire corpus. The most relevant ranked features obtained are converted to word vectors which isbased on the number of occurrences of words in the documents or categories concerned, using the probabilisticcharacteristics of Naïve Bayes as a vectorizer for machine learning classifiers. This hybrid vector space modelwas experimented on legal text categories and the study revealed better discovered features using the preprocessingand ranking technique, while better term weights from the documents was successfully built formachine learning classifiers used in the text classification process.

Authors and Affiliations

Obasi, Chinedu Kingsley , Ugwu, Chidiebere

Keywords

Related Articles

A Survey on Multiple Patient Data Semantic Conflicts and the Methods of Electronically Data Exchange Advantages and Disadvantages

Abstract: In last few years heterogeneous healthcare information such as electronically patient data has gains a great attention especially from clinicians, researchers, health care organizations. The government of unite...

 Alltalk™- A Windows Phone Messenger with Cross Language Communication

 Abstract:In day to day life, messengers or chatting applications provide facility for instant messaging over the internet. Exchange of messages takes place in universally used languages like English, French, etc. w...

 Network Traffic Load Balancing In Gateways

 Abstract: Load balancing is the practice of evenly distributing work among multiple devices. This technique provides several important benefits. Load Balancing in Gateways is a very important in today's emerging wo...

 Adaptive Steganography Based Enhanced Cipher HidingTechnique for Secure Data Transfer

 Abstract:There have been enormous number of attacks recorded during electronic transmission of informationbetween the source and intended receiver and indeed this has called for a more robust and efficient method f...

Threats and Security using Trust Techniques in Wireless Sensor Networks

Abstract: Wireless Sensor Networks are implementing on large scale in real time environments due to its incredible uses in real life. Wireless Sensor Networks don’t need human interference for its working so theycan plac...

Download PDF file
  • EP ID EP137568
  • DOI -
  • Views 124
  • Downloads 0

How To Cite

Obasi, Chinedu Kingsley, Ugwu, Chidiebere (2015).  Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches. IOSR Journals (IOSR Journal of Computer Engineering), 17(2), 42-50. https://www.europub.co.uk/articles/-A-137568