Development of the algorithm of keyword search in the Kazakh language text corpus

Abstract

<p>The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.</p>The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh language

Authors and Affiliations

Akerke Akanova, Nazira Ospanova, Yevgeniya Kukharenko, Gulmira Abildinova

Keywords

Related Articles

The study of influence of natural antioxidants on quality of peanut and linseed oil blends during their storage

<p>Influence of various natural antioxidants (oil extracts of sage and black currant leaves, garlic and hips) on in-storage quality of oils has been studied. Dynamics of the acid and peroxide numbers of fat of the develo...

Adaptive control over non­linear objects using the robust neural network FCMAC

The paper explores issues related to the application of artificial neural networks (ANN) when solving the problems on identification and control of nonlinear dynamic systems. We have investigated characteristics of the n...

Development of method and algorithm of dynamic gyrocompassing for high­speed systems of navigation and control of movement

<p>The main direction of solving the problem of creating and improving motion control systems for modern aerospace objects is the use of redundant information coming from inertial sensors and a receiver of satellite navi...

Modeling the resonance of a swinging spring based on the synthesis of a motion trajectory of its load

<p>The paper reports a technique for building the resonance trajectories of the motion of a swinging spring load. A swinging spring is the kind of a mathematical pendulum consisting of a point load attached to a weightle...

Current distribution of 137cs in sod-podzolic soils of different types of forest conditions

We have examined the current distribution of <sup>137</sup>Cs in the turf-podzolic forest soils for different types of forest conditions. The analysis of the redistribution of <sup>137</sup>Cs in soil in 30 years after t...

Download PDF file
  • EP ID EP667099
  • DOI 10.15587/1729-4061.2019.179036
  • Views 78
  • Downloads 0

How To Cite

Akerke Akanova, Nazira Ospanova, Yevgeniya Kukharenko, Gulmira Abildinova (2019). Development of the algorithm of keyword search in the Kazakh language text corpus. Восточно-Европейский журнал передовых технологий, 5(2), 26-32. https://www.europub.co.uk/articles/-A-667099