Corpora
Our linguistic corpora are available both on this website (minlang.iling-ran.ru/corpora) and on an independent platform at corpora.iling-ran.ru. Most of the corpora are based on the Tsakorpus interface. However, some of them are only integrated…
Some of the corpora are currently represented only by SIL FieldWorks files or pdf-files. We are working on making them available on their own programme platforms with a user-friendly search interface. The list of the available corpora includes the corpus of Kullu—a minority language spoken in India. Further development of our corpus project is aimed at a wider coverage beyond the languages of Russia, and the Kullui corpus can be regarded as the first step in this direction.
If you have any questions or suggestions for improvement of our corpus platform, or if you would like your corpus to be integrated into the platform, please contact us at minlanglab@iling-ran.ru.
The corpus consists of translations into the Upper Taz dialect of the North Selkup language of a number of legal texts: the Charter (basic law) of the Yamalo-Nenets Autonomous Okrug, as well as federal laws and laws of the Yamalo-Nenets Autonomous Okrug relating to the Indigenous Small-Numbered Peoples of the North. The translations were produced as part of a project by the YNAO government and published in two books:
- Charter (Basic Law) of the Yamalo-Nenets Autonomous Okrug of December 28, 1998, No. 56-zao (In the Selkup language). Salekhard, 2008.
- Federal Laws and Laws of the Yamalo-Nenets Autonomous Okrug (In the Selkup language). Salekhard, 2008.
В корпус входят тексты на северных, южных и восточных диалектах эвенкийского языка из мультимедийного архива ЛАЛС НИВЦ МГУ / ЛИСМЯ ИЯз РАН, записанные в 1998–2021 гг. в ходе экспедиций по документации эвенкийского языка под рук. О. А. Казакевич, а также архивные эвенкийские тексты, записанные Г. М. Василевич в 1930-1950е гг. и Е. А. Лебедевой в 1950-1960е гг. Морфологическое аннотирование выполнено в основном Е. Л. Клячко при участии Н. К. Митрофановой.
Корпус создан Е. Л. Клячко на базе платформы Тимофея Архангельского (Tsakorpus).
The corpus data have been collected by the participants of the MSU field project on Hill Mari. The project is carried out at the Department of Theoretical and Applied Linguistics (Lomonosov Moscow State University, Faculty of Philology). It has been supported by the RSSF grant №16-04-18 037е and the RFBR grants №17-04-18 036е, 16-06-00 536а and 19-012-00 627.
The corpus structure and annotation is also a result of joint work of the participants of the MSU field project on Hill Mari.
Корпус состоит из 15 архивных ительменских текстов, записанных В. И. Иохельсоном в 1910–1911 гг. и А. П. Володиным в 1962–1973 гг. Морфологическое аннотирование выполнено К. О. Шейфер, С. К. Ганиевой и М. Р. Плугарёвым.
Программная часть разработана Максимом Бажуковым
The corpus includes texts on all three dialects of Ket that were recorded in 2002–2014 fieldwork held under the direction of O. A. Kazakevich and archived in the Laboratory for Computational Lexicography (Scientific Research Computer Center Moscow State University) / Laboratory for Study and Preservation of Minority Languages (Incstitute of Linguistics, Russian Academy of Sciences), as well as archival texts recorded by G. M. Korsakov in 1937. Morphological annotation was performed by Yu. E. Galyamina and E. M. Budyanskaya.
Minority languages of the world
The corpus of Kullui, one of Indo-Aryan languages of North India, was created by a team of scholars documenting the language—E. Renkovskaya (Institute of Linguistics, RAS), J. Mazurova (Institute of Linguistics, RAS) and A. Krylova (Institute of Oriental Studies, RAS). The software for the corpus was developed by E. Korovina (Institute of Linguistics, RAS). Currently, the corpus includes spontaneous and elicited texts in the central dialect of Kullui, recorded in 2014–2017 during field trips to the Kullu district (the villages of Naggar, Bashing, Thava, and Suma). The project was supported by the RFBR grant №19-012-00 355 (2019–2021).