Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis

Journal Title: Mersin Üniversitesi Dil ve Edebiyat Dergisi - Year 2017, Vol 14, Issue 2

Abstract

The primary goal of this article is to explain the technologies and workflows used to build the METU Spoken Turkish Corpus (STC), which is pioneered by the late Prof. Dr. Şükriye Ruhi. The Web Based Corpus Management System, which is crucial to the building of STC, contains a set of workflows, data formats and export options that make it easy to transcribe, control and publish corpus data. Corpus Management System was developed by the STC project members using the Python programming language and it enables the collaboration of remote project members with different roles through an online interface. Within the STC, 286,391 words long speech are transcribed and checked; in addition, 79,189 words long recordings are made ready to publish. The article presents general statistics about the recordings in the STC and discusses what needs to be done for the publication of a large scale version of the STC.

Authors and Affiliations

Güneş Acar

Keywords

Related Articles

Acoustic Correlates of Perceived Sexual Orientation

This study aimed to examine whether sexual orientation can be detected from monologue readings and narration. The main research question was, if naïve listeners could perceive the speakers’ sexual orientation accurately,...

Trouble, Tension and Provocation in the News Interview

The news interview, an vital part of contemporary cultural life, is an object which deserves the attention of researchers in interactional linguistics. Through its specific construction, its defined goals, and its format...

Sociolinguistic Implications of Addressing Terms Used in The Web Language

Sociolinguistics came into being as a common field of study of sociology and linguistics via the relation between language and society. The social aspect of language and its comprehensive structure including cultural ele...

Laughter in Turkish: A Preliminary Study on Corpus Occurrences and Patterns

Laughter is one of the important components of human interaction and usually expressed acoustically and visually (Hempelmann, 2017; Trouvain & Truong, 2017). People laugh with various emotions, such as joy, affection, am...

Stance Taking and Positioning: Construction of Self in Marriage Programs

The present study depends on the instant and dialogic identity construction processes in everyday talk and their realisations in marriage programs. These programs offer the participants and their spouse candidates a cont...

Download PDF file
  • EP ID EP273735
  • DOI -
  • Views 107
  • Downloads 0

How To Cite

Güneş Acar (2017). Spoken Turkish Corpus in Its Present Form: A Technical and Statistical Analysis. Mersin Üniversitesi Dil ve Edebiyat Dergisi, 14(2), 1-14. https://www.europub.co.uk/articles/-A-273735