Development of the algorithm of keyword search in the Kazakh language text corpus
Journal Title: Восточно-Европейский журнал передовых технологий - Year 2019, Vol 5, Issue 2
Abstract
<p>The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.</p>The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh language
Authors and Affiliations
Akerke Akanova, Nazira Ospanova, Yevgeniya Kukharenko, Gulmira Abildinova
Development of a method for triangulation of inhomogeneous regions represented by functions
<p>In the process of designing structures from inhomogeneous materials, there is the need to build discrete models that consider the peculiarities of the geometrical shape of subdomains from different materials. The firs...
Analysis of correlation dimensionality of the state of a gas medium at early ignition of materials
<span lang="EN-US">We have considered the application of the method of nonlinear dynamic systems in order to analyze and detect the structural patterns in the dynamics of increments in the state of a gas medium generated...
Development of the method for the formation of one-dimensional contours by the assigned interpolation accuracy
<p>The purpose of the study is to develop a method for the formation of a one-dimensional contour with provision of a given accuracy of interpolation. Determination of the accuracy of interpolation relies on the formatio...
Development of the method to control telecommunication network congestion based on a neural model
<p>The circuit of congestion control using feedback by the sign of function of sensitivity to telecommunications network performance was considered. To determine a given function, the use of a simple neural network model...
Improving efficiency for ensuring data group anonymity by developing an information technology
<p>Widespread introduction of methods that ensure anonymity of information about individual groups (teams) of respondents in the field of official statistics is restrained by the lack of relevant industrial information t...