The schedule of Methods XV can be viewed here. This schedule includes all abstracts!
Abstracts of keynote speakers
Dialect variation in online social media
Jacob Eisenstein (Georgia Institute of Technology, USA)
While social media language is sometimes described as a dialect in itself, in fact it displays a remarkable amount of internal variation, aligning with both geography and ethnicity. Such variation can be revealed by computational statistical methods that search for patterns of association between language and geography from large corpora of unannotated text. I will also discuss the relationship between written language in social media and traditional dialect variation. It is well known that online writing contains phonetically-inspired spellings, but perhaps more surprising is that these spellings reproduce some of the systematic context-sensitivity of the spoken language variables that they transcribe. Finally, I will present new research on the social properties of online dialect variation, with evidence that authors modulate their use of social media variables depending on both the context and their audience.
[This is joint work with David Bamman, Brendan O'Connor, Umashanthi Pavalanathan, Tyler Schnoebelen, Noah Smith, and Eric P. Xing.]
A matter of scale only?
Frans Gregersen (University of Copenhagen, Denmark)
In discussing methods of analysis in historical linguistics, dialectology and sociolinguistics it is a commonplace to note that the historical time frame is much larger and that we are blessed with the hindsight of actually knowing which processes of variation have gone to completion, i.e. which sounds/letters/constructions/words have in fact changed during the period we are looking at. On the other hand, sociolinguistics may study in detail how the candidates for change are launched and how these sounds/letters/constructions/words vary during the small slice of time we are able to study with the recordings we have at our disposal. Dialectology is placed in the middle here. Vis a vis historical linguistics, the dialectologists focus on variation inside the national languages. In contrast to sociolinguistics, however, the dialectologists are often focused on the language (system) and not on portraying the speech communities inside which the variation is found. In my paper I will exemplify the problem we have of fitting all these types of evidence stemming from the use of different methods, into a coherent picture of language change and variation. The data will come from the LANCHART Centre study of changes in Danish
The Dialectology of the Future
Mark Liberman (University of Pennsylvania, USA)
New data sources and new analysis methods promise astonishing opportunities for linguistic research at all levels of analysis. We can often now do in hours what would have taken decades for scholars and scientists of earlier times. But these methodological innovations are still far from reaching their potential: We need several additional technical and cultural steps in order to turn the promise into (more than a partial and occasional) reality.
This talk will focus on two key innovations that will have an especially big impact on the dialectology of the future. One is technical: Universal phonetic analysis and pronunciation modeling. The other is cultural: The Open Data movement, combined with new kinds of crowd-sourcing. In both cases, there are hopeful early developments, and momentum in positive directions. These trends will transform the dialectology of the future, in ways that we should be preparing for today.
Heritage Languages as new dialects
Naomi Nagy (University of Toronto)
Members of Toronto's Heritage Language Variation and Change Project (http://projects.chass.utoronto.ca/ngn/HLVC/0_0_home.php) collaboratively designed a multilingual corpus to allow inter-generational, cross-linguistic, and diatopic (heritage vs. homeland varieties) comparisons in order to develop generalizations about the types of variable features, structures or rules that are borrowed earlier and more often in contact contexts, using consistent methods across studies of different languages and variables. While it is debatable whether these heritage varieties constitute new and distinctive dialects of Cantonese, Faetar, Italian, Korean, Russian and Ukrainian, it is certain that we have needed to implement innovative methods in order to efficiently compare heritage varieties and their potential input languages. Innovations include:
- integrating transcription, coding and extraction of sociolinguistic variables in ELAN;
- automated forced alignment of transcription and audio at the phonemic level for languages beyond English;
- use of online formant extractors, again for languages beyond English;
- an interactive web map with animated voice clips;
- multivariate regression techniques for comparing across variables, languages, locations, and speakers spanning three generations since immigration;
- integration of research conducted at many levels: in undergraduate and graduate courses, by paid and volunteer research assistants, and by students and professors in (so far) nine countries;
- online, public sharing of our methods, tools, instruments and controlled sharing of data.
It is hoped that this work may help predict the future of (these) dialects and advance the study of dialects.
Dialectal variation and population genetics in Siberia
Brigitte Pakendorf (Université de Lyon, France)
Several of the ethnolinguistic groups of Siberia are settled over large territories with restricted possibilities of communication and interaction between individual settlements, resulting in large-scale dialectal diversification. Among the territorially most widespread and dialectally most fragmented groups are the North Tungusic Evens, who are settled over northeastern Siberia, from the Lena-Jana watershed in the west to the Chukotka and Kamchatka Peninsulas in the east. This geographical spread has resulted in substantial linguistic fragmentation, with Burykin (2004: 85) distinguishing 13 dialects and up to 24 subdialects (dialekt and govor, respectively, in Russian); mutual intelligibility is severely restricted between the peripheral dialects. In contrast, the Turkic language Sakha (Yakut) is relatively homogenous, notwithstanding the fact that it, too, is spoken by people who are settled over a vast territory. This holds even when Dolgan - classified as a separate language mainly on sociopolitical grounds - is included in the survey. Thus, territorial dispersal alone cannot account for strong dialectal diversification; rather, other factors such as the duration of the dispersal as well as contact influence must also play a role.
The factors that lead to increased dialectal variation, namely restricted communication within the speech group and contact with outside groups, are also expected to increase genetic variability among the speakers of such dialects. This should therefore result in a correlation between dialectal diversity and genetic diversity. In order to better understand the factors at play in processes of dialectal diversification, this hypothesized correlation will here be explored using both linguistic data from different dialects and molecular genetic data from their speakers and contrasting Even with Sakha and Dolgan.