Urdu Word Segmentation using Machine Learning Approaches
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2018, Vol 9, Issue 6
Abstract
Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology.
Authors and Affiliations
Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan, Burhan Ullah
A Survey on Location Privacy-Preserving Mechanisms in Mobile Crowdsourcing
Mobile Crowdsourcing (MCS) surfaced as a new affluent method for data collection and processing as a result of the boom of sensor-rich mobile devices popularity. MCS still has room for improvement, particularly in protec...
Creating and Protecting Password: A User Intention
Students Academic Information System (SAIS) is an application that provides academic information for the students. The security policy applied by our university requires the students to renew their SAIS password based on...
Solving Word Tile Puzzle using Bee Colony Algorithm
In this paper, an attempt has been made to solve the word tile puzzle with the help of Bee Colony Algorithm, in order to find maximum number of words by moving a tile up, down, right or left. Bee Colony Algorithm is a ty...
Study of the Performance of Multi-hop Routing Protocols in Wireless Sensor Networks
Currently in the literature, there are quite a num-ber of multi-hop routing algorithms, some of which are subject to normalization. Routing protocols based on clustering provide an efficient method for extending the life...
An Evaluation of IFC-CityGML Unidirectional Conversion
Interoperability between building information models (BIM) and geographic information models has a strong potential to bring benefit to different demands in construction analysis, urban planning, homeland security...