Vectorization of Text Documents for Identifying Unifiable News Articles
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 7
Abstract
Vectorization is imperative for processing textual data in natural language processing applications. Vectorization enables the machines to understand the textual contents by converting them into meaningful numerical representations. The proposed work targets at identifying unifiable news articles for performing multi-document summarization. A framework is introduced for identification of news articles related to top trending topics/hashtags and multi-document summarization of unifiable news articles based on the trending topics, for capturing opinion diversity on those topics. Text clustering is applied to the corpus of news articles related to each trending topic to obtain smaller unifiable groups. The effectiveness of various text vectorization methods, namely the bag of word representations with tf-idf scores, word embeddings, and document embeddings are investigated for clustering news articles using the k-means. The paper presents the comparative analysis of different vectorization methods obtained on documents from DUC 2004 benchmark dataset in terms of purity.
Authors and Affiliations
Anita Kumari Singh, Mogalla Shashi
A Novel Architecture for Intrusion Detection in Mobile Ad hoc Network
Today’s wireless networks are vulnerable in many ways including illegal use, unauthorized access, denial of service attacks, eavesdropping so called war chalking. These problems are one of the main issues for wider uses...
Database-as-a-Service for Big Data: An Overview
The last two decades were marked by an exponential growth in the volume of data originating from various data sources, from mobile phones to social media contents, all through the multitude devices of the Internet of Thi...
A Survey on Content-based Image Retrieval
The widespread of smart devices along with the exponential growth of virtual societies yield big digital image databases. These databases can be counter-productive if they are not coupled with efficient Content-Based Ima...
A Decision Tree Approach for Predicting Student Grades in Research Project using Weka
Data mining in education is an emerging multidiscipline research field especially with the upsurge of new technologies used in educational systems that led to the storage of massive student data. This study used classifi...
A New Type Method for the Structured Variational Inequalities Problem
In this paper, we present an algorithm for solving the structured variational inequality problem, and prove the global convergence of the new method without carrying out any line search technique, and the global R-conver...