Aggregating textual and video data from movies

Journal Title: Romanian Journal of Human - Computer Interaction - Year 2016, Vol 9, Issue 3

Abstract

In this paper, we present an automatically annotated corpus based on movie screenplays (script) and subtitles. We extract the relevant textual information from movie screenplays and subtitles using a regular expression approach. Then, we synchronize screenplays with subtitles using a matching algorithm, thus bounding each sentence from a script between two temporal limits. We also developed an application using the corpus to test our approach and to show practical situations where this corpus is useful. The application employs topic detection and it involves searching for a specified topic in the movie text and marking the topic as non-existent, episodic or primary topic for the analyzed text. The major problem we faced while working on this system was the unexpected structure of the screenplay sheets as this kind of files are not entirely written using a standardized format which can be easily parsed and structured automatically. Some types of errors can be overcome with regular expressions, but there are other errors that need a machine learning approach to be surpassed.

Authors and Affiliations

Alexandru Hulea, Traian Rebedea

Keywords

Related Articles

Mood and Sentiment Assessment Using Latent Semantic Analysis

The analysis of written communication can reveal subtle information, such as speaker’s emotional state, attitude and intentions. However, these cannot always be extracted accurately, at a level comparable to humans’ abil...

Named entities identification

An important topic in natural language processing is represented by named entities recognition inside texts. This article describes a novel approach used for detecting named entities that tries to improve the results obt...

Mirrors of the World – Supporting Situational Awareness with Computer Screens

In this paper we develop a notion of support for social and situational awareness. Our initial ideas are based on the metaphor of using a mirror to see what you are not looking at. We provide two studies that, for differ...

Evaluation of an augmented reality based learning platform using the think aloud protocol an peer tutoring

In this paper are presented the evaluation result of an augmented reality platform for teaching chemistry based on a methodology which combines the think aloud protocol – TAP and peer-tutoring method. During TAP, the stu...

Romanian dependency parser developed based on parsers for other Romanic languages

Determining the syntactic dependencies is an important task in natural language processing, as it is useful for improving the results of a wide range of applications, such as machine translation, opinion mining, question...

Download PDF file
  • EP ID EP28990
  • DOI -
  • Views 361
  • Downloads 10

How To Cite

Alexandru Hulea, Traian Rebedea (2016). Aggregating textual and video data from movies. Romanian Journal of Human - Computer Interaction, 9(3), -. https://www.europub.co.uk/articles/-A-28990