Extracting Multiwords From Large Document Collection Based N-Gram

Abstract

 Multiword terms (MWTs) are relevant strings of words in text collections. Once they are automatically extracted, they may be used by an Information Retrieval system, suggesting its users possible conceptual interesting refinements of their information needs. As a matter of fact, these multiword terms point to relevant information, often corresponding to topics and subtopics in the text collection, and maybe quite useful specially for highly refining generic queries. A new approach is proposed to find collocation from text document. As mentioned earlier, a collocation is just a set of words occurring together more often than by chance in a corpus. Collocations are extracted based on the frequency of the joint occurrence of the words as well as that of the individual occurrences of each of the words in the whole text. Intuitively, when a set of words is extracted as a collocation, then the joint occurrence of the words must be high in comparison to that of the constituent individual words.

Authors and Affiliations

M. Nirmala 1 , Dr. E. Ramaraj

Keywords

Related Articles

 Communication and Implementation of Plug and Play Enabled Devices in Sensor Network

 Most of sensor networks are static in nature and do not provide standardized and systematic network management capabilities. Intelligence is not associated with the sensors. In order to make a sensor network more...

 Improving Accuracy in Decision Making for Detecting Intruders

 Normal host based Intrusion detection system provides us some alerts of data integrity breach on the basis of policy violation and unauthorized access. There are some factors responsible if any employee of the ente...

Compressed Sensing Based Image Encoding Technique for Wireless Sensor Networks

The Wireless Sensor Network (WSN) is the one, which generally consists of cameras themselves, which have some local image processing, communication and storage capabilities, and one or more central computers, where image...

 Classification algorithm in Data mining: An Overview

 Data Mining is a technique used in various domains to give meaning to the available data Classification is a data mining (machine learning) technique used to predict group membership for data instances. In this pap...

Information at Your Fingertips Anywhere Anytime Anyway (A3) MCC – Survey

The cloud means total cost of ownership to build and maintain the datacenter infrastructure which includes both hard and soft related costs. An accurate comparison requires knowledge of all available over the life of the...

Download PDF file
  • EP ID EP130998
  • DOI -
  • Views 63
  • Downloads 0

How To Cite

M. Nirmala 1, Dr. E. Ramaraj (2013).  Extracting Multiwords From Large Document Collection Based N-Gram. International Journal of P2P Network Trends and Technology(IJPTT), 3(5), 282-285. https://www.europub.co.uk/articles/-A-130998