Incipient diffusion of lexical innovations (2017 - )

Modern linguistics has so far failed to answer the question as to which factors determine the degree to which lexical innovations (neologisms) are adopted by the members of a speech community, begin to spread and are established in the lexicon of a language. This question is addressed by the Incipient diffusion and lexical innovations project with reference to English. The project aims to answer these questions by collecting large amounts of data on the use and spread of very recent neologisms on the Internet, automatically processing and analyzing it.

Detailed description of the project can be found here.

BDPA (2014 - )

Benchmark Database of Phonetic Alignments in Historical Linguistics and Dialectology (BDPA) is a publicly available benchmark database of manually edited phonetic alignments which can serve as a platform to test the performance of automatic alignment algorithms. The database consists of a great variety of alignments drawn from a large number of different sources. The data is arranged in a such way that typical problems encountered in phonetic alignment analyses (metathesis, diversity of phonetic sequences) are represented and can be directly tested.

Alignments of the Phonetischer Atlas von Deutschland (2013 - )

In this project all words from the original data set are divided into 187 cognate sets, tokenized, automatically multi-aligned and later manually corrected. More information can be found here.

Quantitative Historical Linguistics (2010 - 2014)

This project aims to uncover and clarify phylogenetic relationships between native South American languages using quantitative methods. The two main objectives of the projects are digitalization of the lexical resources on native South American languages and development of new and innovative computer-assisted methods to quantitatively analyze this information. More information on the project can be found here.

Buldialect - Measuring Linguistic Unity and Diversity in Europe (2006 - 2010)

Buldialect was a joint project between the Univesity of Tübingen, University of Groningen and Bulgarian Academy of Sciences. The aim of this project was to a) create Bulgarian phonetic and lexical digital dialect data base; b) analyze the data using the existing statistical methods from dialectometry and compare it to the traditional scholarship; c) develop new quantitative methods that can be used to analyze dialect and language data. During the project,  pronunciations variants of 156 words from 197 villages in Bulgarian were compiled and digitalized. Lexical  data base consist of lexical varaints of 110 words collected from the same set of villages. The results of applying different quantitative techniques, both the old ones and those developed during the course of this project, to the Bulgarian pronunciation data have shown that some of the traditional divisions of this area have to be questioned and that they were not based purely on the linguistic criteria. More on the data and the techniques used in this project can be found in my PhD thesis. More information can be found here.

P.P.Njegoš: Collected Works (in Serbian) (2005 - 2010)

More information can be found here.

Danilo Kiš: Collected Works (in Serbian) (2001 - 2003)

More information can be found here.