Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams

Abstract

Documents created and distributed on the Internet are ever changing in various forms. Most of existing works are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. In order to characterize and detect personalized and abnormal behaviors of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining Useraware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Here present solutions to solve this innovative mining problem through three phases: pre-processing to extract probabilistic topics and identify sessions for different users, generating all the STP candidates with (expected) support values for each user by patterngrowth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that our approach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantly reflect users' characteristics. Swati V. Mengje | Prof. Rajeshri R. Shelke"Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd101.pdf http://www.ijtsrd.com/engineering/computer-engineering/101/efficient-way-to-identify-user-aware-rare-sequential-patterns-in-document-streams/swati-v-mengje

Authors and Affiliations

Keywords

Related Articles

Forecasting the Drought in Bali using the Multilayer Perceptron Method

Disasters have a huge impact on a country and a region. Bali is one of the provinces in Indonesia which has some disaster, one of the disasters that occurred in Bali was drought. Forecasting of droughtinn Bali is necessa...

Liquid Flow Control by Using Fuzzy Logic Controller

Flow measurement and control are essential in plant process control. The fuzzy logic method is very useful for such problem solving approach such as hydro power generation. In this paper, rule base and membership functio...

Application of Nonlinear Two-Dimension Wave Equation Dual Reciprocity Boundary Element Method

The constructive numerical implementation of the two-dimensional dual boundary element method. This paper present to solve nonlinear 2-D wave equation defined over a rectangular spatial domain the boundary conditions. Tw...

Narrative Politics in the Discourse of 9 11 in the Novel of Joseph ONeills in Netherlnds

This paper discusses Netherland with particular consideration paid to how post 9 11 New York City can be transformed in a narrative that stresses its diverse, multi ethnic and multi cultural character. At last, the prese...

A Review on Therapeutic Multipurpose Medicinal use of Tinospora Cordifolia

The present review gives the information about the Tinospora cord folia Gauche or Amrita is used as medicine such as Ayurvedic, Uninai, Sridhar and Homeopathy that is also called AYUSH. Tinospora cordifolia is also calle...

Download PDF file
  • EP ID EP357279
  • DOI -
  • Views 208
  • Downloads 0

How To Cite

(2017). Efficient Way to Identify User Aware Rare Sequential Patterns in Document Streams. International Journal of Trend in Scientific Research and Development, 1(4), -. https://www.europub.co.uk/articles/-A-357279