Tatar Socio-Political Terminology in a Bilingual Thesaurus

Authors

  • Alfiya Galieva
  • Olga Nevzorova
  • Dzhavdet Suleymanov

Keywords:

Bilingual Thesaurus, The Tatar Language, Socio-political Vocabulary, Lexical Synonymy

Abstract

This paper discusses the general methodology of compiling the Russian-Tatar Socio-Political thesaurus (http://tattez.antat.ru/) which is being developed on the basis of the Russian RuThes thesaurus format (http://www.labinform.ru/pub/ruthes/index.htm) as a hierarchy of concepts. It touches upon some important practical aspects of implementing this project and describes its actual status/. The project is aimed at compiling the whole body of modern Tatar vocabulary related to different aspects of the socio-political sphere, such as the state government, economy, social life, justice, warfare, culture, and religion. The Tatar Thesaurus also comprises some general lexicon branches representing lexical items which can be found in various domain-specific texts. Each concept is linked with a set of language expressions (single words and multiword expressions) referring to it in texts (lexical entries). The Tatar component is based on the list of concepts of RuThes, so the basic structure of the conceptual relations of RuThes is preserved.  In the process of compiling the Tatar Thesaurus, data from the following available Tatar corpora is used:

  1. Tatar National Corpus (http://tugantel.tatar/?lang=en);
  2. Corpus of Written Tatar (http://www.corpus.tatar/en).

The article also discusses terminological gaps in the Tatar language and explores some differences in semantic relations in Russian and Tatar, which are used to construct terms.

The main challenge of working on this project is concerned with acquiring lexical data and representing Tatar socio-political vocabulary as fully as possible, including a large number of synonymous items in actual use. The location of the Tatar culture at the intersection of Occidental and Oriental civilizations leads to active lexical borrowing both from the Arab-Muslim and from the European cultural areas. Borrowing vocabulary from European languages is carried out through the mediation of the Russian language, where a huge number of words and constructions are taken from. Besides, a significant part of the synonyms are formed with Turkic and Tatar lexical material. Therefore, in modern Tatar, we observe synonyms of various origins (Turkic, Russian, Arabic, Persian, Greek, Latin, and English) that give us a rich lexical material. In addition, the grammar system of Tatar enables the coining of terms of different derivational and syntactic structures. As a result, socio-political terms have variants and synonyms of different lexical composition and structure, all of which are to be fixed in the Tatar Thesaurus.

Currently, the Russian-Tatar Socio-Political thesaurus contains 9,000 concepts and is constantly replenished using special software designed for this project.

Author Biographies

  • Alfiya Galieva

    Tatarstan Academy of Sciences

  • Olga Nevzorova

    Tatarstan Academy of Sciences

  • Dzhavdet Suleymanov

    Tatarstan Academy of Sciences

References

Bilgin, O., Çetinoğlu, Ö., Oflazer, K. 2004 - 'Building a wordnet for Turkish. Romanian Journal of Information Science and Technology, 7(1-2), 163-172.

Corpus of Written Tatar - URL: http:/corpus.tatar/.

Doborjginidze, N., Lobzhanidze, I. 2016 - Corpus of the Georgian language. In Margalitadze T., Meladze G. (eds.). Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity. 6 – 10 September, 2016 — Tbilisi, Ivane Javakhishvili Tbilisi State University, 328 – 334.

El-Haj, M., Kruschwitz, U., Fox, C. 2015 - 'Creating language resources for under-resourced languages: methodologies, and experiments with Arabic'. Language Resources and Evaluation, 49(3), 549-580. http://eprints.lancs.ac.uk/71289/1/ELHAJ_LREV.pdf

Fellbaum, C. 2010 - 'Wordnet'. In Theory and applications of ontology: computer applications. Springer, 231–243.

Galieva, A. 2018 - 'Synonymy in Modern Tatar Reflected by the Tatar-Russian Socio-Political Thesaurus'. In: Čibej J., Gorjanc V., Kosem I., Krek S. (eds.) Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts. Ljubljana, 585 — 994.

Galieva, A., Nevzorova, O., Yakubova, D. 2017 - 'Russian-Tatar Socio-Political Thesaurus: Methodology, Challenges, the Status of the Project'. In: Angelova, G. et al. (eds). International Conference Recent Advances in Natural Language Processing, Varn, 245 - 252.

Information materials on the final results of the 2010 All-Russian Population Census - (2010) [Informatsionnye materialy ob okonchatel'nyh itogah Vserossijskoj perepisi naseleniya 2010 goda]. URL: http://www.gks.ru/free_doc/new_site/perepis2010/perepis_itogi1612.ht

m.

Khakimov, B.E. 2018 - 'The Experience of Thesaurus Modeling of Tatar Information Technologies Terminology' [Opyt tezaurusnogo modelirovaniya tatarskoy terminologii informatsionnyih tehnologiy]. Kazanskaya nauka, 11, 193-198.

Loukachevitch, N. 2011 - Thesauri in information retrieval problems [Tezaurusy v zadachakh informatsionnogo poiska]. Moscow: Moscow State University Press.

Loukachevitch, N., Dobrov, B. 2014 - 'RuThes Linguistic Ontology vs. Russian Wordnets'. In: Proceedings of the Seventh Global Wordnet Conference. Tartu: University of Tartu Press, 154-162.

Loukachevitch, N., Dobrov B. 2015 - 'The Socio-Political Thesaurus as a Resource for Automatic Document Processing in Russian'. Terminology, vol. 21(2), 237-262.

Margalitadze, T., Meladze, G. 2016 - 'Importance of the Issue of Partial Equivalence for Bilingual Lexicography and Language Teaching'. In: Margalitadze T., Meladze G. (eds.) Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity. 6 – 10 September, 2016 — Tbilisi, Ivane Javakhishvili Tbilisi State University, 2016, 787-797.

Miller, G. A. 1995 - 'Wordnet: a Lexical Database for English'. Communications of the ACM, 38(11):39–41.

Russian-Tatar Socio-Political Thesaurus. URL: http://tattez. turklang.tatar/

Scannell, K. P. 2007 - 'The Crúbadán Project: Corpus Building for Under-resourced Languages'. In Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, Vol. 4, pp. 5-15.

Shvedova, N. Yu. (ed.) 1980 - Russian Grammar [Russkaya grammatika], V. Moskow: Nauka.

Socio-Political Subcorpus of Tatar National Corpus/ URL: http://tugantel.tatar/corpus/op/.

Tachbelie, M. Y., Abate, S. T., Besacier, L. 2011 - 'Part-of-Speech Tagging for Underresourced and Morphologically Rich Languages—the Case of Amharic'. In the Conference on Human Language Technology for Development, Alexandria, Egypt, 2-5 May, 50-55.

Tatar National Corpus - http://tugantel.tatar/?lang=en Vossen, P. 1997 - 'Eurowordnet: a Multilingual Database for Information Retrieval'. In: Proceedings of the DELOS workshop on Cross-language Information Retrieval, 5–7.

Vossen, P. 2002 - EuroWordNet: General Document. URL:

http://vossen.info/docs/2002/EWNGeneral.pdf.

Yandex Translate - URL: https://translate.yandex.com/

Downloads

Published

01/20/2020

Issue

Section

Articles

How to Cite

Tatar Socio-Political Terminology in a Bilingual Thesaurus. (2020). Terminology Issues, 4, 80-99. https://terminology.ice.tsu.ge/terminology/article/view/49