TriMCo Dialectal Corpus

Data: The empirical bases of the TriMCo project will be the TriMCo Dialectal Corpus. This corpus is based on first hand data obtained in the fieldwork in the respective areas.

Technicalities: We use the software ELAN (developed by the Max Planck Institute for Psycholinguistics in Nijmegen). This software "is a professional tool for the creation of complex annotations on video and audio resources".

Currently, Ilya Khait (assisted by Liudmila Radchankava) is creating a semi-automatic tagging software to morphologically annotate the West Russian subcorpus. This work is almost done and Ilya is planning to develop similar softwares for Belarusian, Lithuanian and finally Latgalian.

 

Structure of the Corpus:


The Proportional Amount of Records and Their Geographic Distribution in the Corpus (21.05.2015)

The TriMCo Dialectal Corpus has the following subcorpora:

    • Belarusian Dialectal Corpus: A considerable collection of records that have been made by prof dr. Björn Wiemer, dr. Aksana Erker and others during several fieldwork expeditions is already in possession; the creation of this subcorpus is being carried out under the supervision of dr. Aksana Erker.
    • Latgalian Dialectal Corpus: is being created under the supervision of dr. Ilja A. Seržant, in collaboration with prof. dr. Nicole Nau (University of Poznań), Anna Putāne (Rīga), dr. Klinta Ločmele (University of Latvia), dr. Ilga Šuplinska (University of Rezekne). The main data processing work is being done by Anna Putāne. Former transcribers are: dr. Klinta Ločmele (University of Latvia) and Evika Muizniece (Latvian Academy of Culture), BA. Field work is being carried out by dr. Klinta Ločmele (University of Latvia). The transcriptions have been proofread by prof. dr. Vytautas Kardėlis (University of Vilnius).
    • Differently from other corpora, this corpus is funded by the Project "Diachronic Typology of Differential Argument Marking" (Incoming Fellowship Programme, Grant Agreement Number: 291784, Zukunftskolleg, University of Konstanz).
    • Eastern Lithuanian Dialectal Corpus: this subcorpus will be based on the data obtained by prof. dr. Vytautas Kardėlis (University of Vilnius). On the bases of these records, Eastern Lithuanian Dialectal Corpus is being created under the supervision of prof. dr. Vytautas Kardelis at the University of Vilnius;
    • Western Russian (Pskov Group) Dialectal Corpus: the work is being organized in collaboration with dr. Igor' Isaev (V. V. Vinogradov Institute of the Russian Language, Russ. Acad. of Science, Division for Dialectology and Linguistic Geography). Furthermore, dr. Zep Honselaar made his records from the area of Pskov (funded by the NWO) available for our corpus. The records will be entered into the corpus under supervision of dr. Aksana Erker.