Formulaicity in Turkish: Evidence from the Turkish National Corpus

Journal Title: Mersin Üniversitesi Dil ve Edebiyat Dergisi - Year 2016, Vol 13, Issue 2

Abstract

Formulaic sequences are the most frequently occurred forms in a language. Identification of formulaic sequences in language is useful for a wide range of areas including linguistics, second language learning, natural language processing, etc. To identify formulaic sequences in a language, the most preferred method is to use a corpus, which may be formed from written texts or tape-recorded conversations in the language, and count the frequencies of sequences in the corpus. Then, most frequently occurring sequences are examined to find formulas. Numerous studies have been made to identify formulas for several languages like English. There exists only few studies about formulaicity in Turkish and most of these studies focus on identifying formulas in the forms of multi word units. Turkish, however, is an agglutinating language having a rich and complex morphology, therefore formulaic sequences in affixation should be discovered. Only very limited studies about formulaicity in affixation of Turkish exist in the literature. In this study, we try to discover formulaic sequences in affixation of Turkish by counting frequent suffix n-grams in written and spoken Turkish by using the Turkish National Corpus, which is a balanced, large scale, and general-purpose corpus for contemporary Turkish. We list the most frequent suffix combinations not only for verbs but also for all lexical categories like noun, adjective, verb, and adverb for both written and spoken corpora from Turkish National Corpus, and discuss similarities and differences in affixation in written and spoken usage of Turkish. We observe that, we prefer shorter suffix sequences in spoken Turkish than in written Turkish, and as the length of the suffix n-grams increase, we use different formulaic sequences in written and spoken Turkish.

Authors and Affiliations

Selma Ayşe Özel, Yasin Bektaş, Hakan Yılmazer

Keywords

Related Articles

Corpus Linguistics Studies: Intergenerational Solidarity Scale Development

Human interaction could be a focus of linguistics or sociology. When it is considered from a social perspective and the data is collected from language, the concepts reflected in language are examined. In such cases, sem...

Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis

The primary goal of this article is to explain the technologies and workflows used to build the METU Spoken Turkish Corpus (STC), which is pioneered by the late Prof. Dr. Şükriye Ruhi. The Web Based Corpus Management Sys...

Formulaicity within Turkish Words

One of the main insights to emerge from the last fifty years of corpus linguistics has been a greater understanding of the pervasiveness of formulaic language. Rather than exercising the full generative capacity of langu...

Colligational Patterns of Turkish Multi-Word Units

In multi-word unit (MWU) extraction studies, most of the challenges for rich morphology languages like Turkish can be overcome by the study of how colligational filtering works in our minds, along with how statistical an...

A Pragmatic Analysis of (U)Lan in Spoken Turkish Corpus and Turkish National Corpus

In this study, the Turkish interjection "UlAn" and its derivatives, which can be classified as face-threatening according to current (im)politeness theories, have been examined with a corpus-based, descriptive approach....

Download PDF file
  • EP ID EP198086
  • DOI -
  • Views 153
  • Downloads 0

How To Cite

Selma Ayşe Özel, Yasin Bektaş, Hakan Yılmazer (2016). Formulaicity in Turkish: Evidence from the Turkish National Corpus. Mersin Üniversitesi Dil ve Edebiyat Dergisi, 13(2), 1-33. https://www.europub.co.uk/articles/-A-198086