Content Evocation Using Web Scraping and Semantic Illustration

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 3

Abstract

  Abstract: Web scraping is the process of automatically collecting information from the World Wide Web. It is a field with active developments, sharing a common goal with the semantic web vision, an ambitious initiative that still requires breakthroughs in text processing, artificial intelligence and human-computer interactions. It means extraction of content from different web pages using web scrapping and semantic illustration. Web Scrapping is a process of evocation of content from HTML pages and related to web indexing. A commonly used measure for tree similarity is the tree edit distance which easily can be extended to be a measure of how well a pattern can be matched in a tree. An obstacle for this approach is its time complexity, so we consider if faster algorithms for constrained tree edit distances are usable for web scraping, and to reduce the size of the tree representing the web page. Different applications of web scraping are used by current market to achieve best web scraping output, Like Web Data Extraction, Data Collection, Screen Scraping. Many different algorithms are used for web scraping like “tree pattern matching”, “tree mapping”, “approximate tree matching”. But in general “tree edit distance” algorithm is used. But with this algorithm many issues of incorrectness of data, low efficiency and higher time complexity have analyzed. In this research I am focus to improve the performance of tree edit distance problem. And I am also trying to focus on higher bound time complexity of this algorithm.

Authors and Affiliations

Vasani Krunal A

Keywords

Related Articles

 Exponential software reliability using SPRT: MLE

 In Classical Hypothesis testing volumes of data is to be collected and then the conclusions are drawn, which may need more time. But, Sequential Analysis of Statistical science could be adopted in order to  ...

 Image Based Relational Database Watermarking: A Survey

 Abstract: In past few years relational databases watermarking has emerged a great topic for research becauseof increase in use of relational databases. The basic need for relational database watermarking is to prev...

 Defended Data Embedding For Chiseler Avoidance in Visible  Cryptography by Using Morphological Transform Domain

 This paper proposes a data-veiling technique for binate images in morphological transform domain for authen- tication purpose. To attain blind watermark drawing, it is difficult to use the detail coordinate pre...

 RKO Technique for Color Visual Cryptography

 Abstract : To maintain the secrecy and confidentiality of images two different approaches are being followed, Image Encryption and Visual Cryptography. The former being encrypting the images through encryption algo...

Download PDF file
  • EP ID EP116095
  • DOI 10.9790/0661-16395460
  • Views 126
  • Downloads 0

How To Cite

Vasani Krunal A (2014).  Content Evocation Using Web Scraping and Semantic Illustration. IOSR Journals (IOSR Journal of Computer Engineering), 16(3), 54-60. https://www.europub.co.uk/articles/-A-116095