Disambiguation of Lithuanian Homographs Based on the Frequencies of Lexemes and Morphological Tags

Journal Title: Kalbu studijos / Studies about Languages - Year 2009, Vol 14, Issue 0

Abstract

In the text-to-speech synthesis it is necessary to stress the text. The main problem is that currently existing algorithms of stress for Lithuanian produce more than a single stressing possibility for some words (homographs). The method based on frequency of occurrences of certain lexemes and morphological tags was proposed in this work. Such method has never been used for Lithuanian. The frequencies were calculated using text corpus containing 1 million words. Text corpus was stressed automatically and then corrected manually. Disambiguation of homographs is performed by removing less frequently used grammatical forms and lexemes. Additional problems arise due to the fact that a single word can correspond to more than two grammatical forms. The method based on the frequencies of pairs of grammatical forms was proposed in this work. It was shown that the frequencies of morphological tags play more important role than the frequencies of lexemes. The method proposed allows disambiguating the homographs with the accuracy of 85.01%. Despite the fact that the method proposed does not employ contextual information, the results achieved are comparable with those achieved with the algorithm ID3 that uses the context.

Authors and Affiliations

Tomas Anbinderis, Pijus Kasparaitis

Keywords

Related Articles

Case Study: English for Specific Purposes in Moodle Area

This article examines application of Moodle tasks for vocabulary revision in English for Specific Purposes (ESP). The study is based on the analysis of data obtained from the survey of students’ attitudes to Moodle task...

Politinių kalbų patriotizmo elementai tinklaraščiuose

Tinklaraščiai vis labiau traukia dėmesį ne tik kaip asmeniniai dienoraščiai, bet ir kaip visuomenės gyvenimą atspindintys rašiniai, pavyzdžiui, tekstai, paskelbti rinkimų kampanijos metu. Tačiau lingvistinių tinklaraščių...

Integrating Content and Language in Higher Education: A Case of KTU

Content and language integrated learning (CLIL) is a good way to develop both: language and content skills. Language, meaning and content are integrated, and by extending language, meaning and content resources extend ac...

Semantiniai pleonazmai anglų ir lietuvių kalbose ir jų vertimas

Straipsnyje nagrinėjami anglų kalbos semantiniai pleonazmai ir jų vertimas į lietuvių kalbą. Pleonazmas dažnai laikomas klaida arba keistu absurdišku reiškiniu. Tačiau vertėjams dažnai daro įtaką originalo kalbos pleonaz...

Lietuvių kalbos būtieji vientisiniai laikai ir jų atitikmenys anglų kalboje

veiksmą, vykusį prieš kalbamąjį momentą, o skiria juos pagrindinė santykio su kalbamuoju momentu reikšmė: būtasis kartinis laikas gali reikšti labai artimą veiksmą atskaitos momentui: ar tai būtų dabartis, ar kitas praei...

Download PDF file
  • EP ID EP85952
  • DOI -
  • Views 131
  • Downloads 0

How To Cite

Tomas Anbinderis, Pijus Kasparaitis (2009). Disambiguation of Lithuanian Homographs Based on the Frequencies of Lexemes and Morphological Tags. Kalbu studijos / Studies about Languages, 14(0), 25-31. https://www.europub.co.uk/articles/-A-85952