Project Description | Triangulation Approach for Modelling Convergence with a High Zoom-In Factor

1. State of research and preliminary work

From the geographic viewpoint, the object of our proposed project is intended to be the Baltic-Slavic Contact Zone (BSCZ). The BSCZ forms part of the Circum-Baltic Area (CBA). It is defined as the overlap region between Slavic and Baltic dialects (with only few blurred edges). It runs (with a medium width of 100-130 km) on both sides along the contemporary northern state border of Belarus, starting from the “triangle” between Poland, Belarus and Lithuania (around Hrodna) and protracting for some 480 km to the NE into SE-Latvia where Belarusian interferes with Latgalian. This territory cuts slightly north to the “triangle” between Latvia, Russia and Belarus (around Rēzekne, Ludza on the Latvian and Sebež on the Russian side); see map. The project aims to systematically equilibrate approaches used in various disciplines dealing with the description and/or explanation of structural variation and change in language: typology, areal and contact linguistics, historical comparative linguistics and dialectology (dialect geography). Our main focus will be an areal linguistic one. Although starting from some well-motivated top-down assumptions based on the latest findings, our endeavor is primarily conceived of as a bottom-up verification on a fine-grained level of data-driven differentiation. For this purpose, methods of data mining that have recently started to be applied in linguistic research will be used as more powerful exploratory devices (see section 2). The involved notions and methods as well as the novelty of their application in the proposed project should first be grounded in this section. Modern areal linguistics began as a natural response to the observation that typological features are unevenly distributed over continents and smaller regions of the world (cf., e.g., Nichols 1992), and attempts have been made to establish typological and areal profiles of particular areas on the basis of salient features considered to be outstanding (cf. Sandfeld 1926, Masica 1976, Haspelmath 2001, Thomason 22007: ch. 5, Heine/Kuteva 2005: ch. 5; 2006, among many others). Only some 20 years ago, however, did typological research start to abandon its avoidance of areal biases (cf. Wiemer/Wälchli 2012: 6-9). In the meantime it seems to have been accepted that, wherever we observe an outstanding amount of structural features in a distinguished linguistic area (see below), we have to systematically reckon with the interplay of at least three kinds of factors, namely: whether and to which degree these features are (A) inherited from common ancestors (“genealogical”), (B) typologically frequent (unmarked), (C) contact-induced (cf., inter alia, Campbell et al. 1986; Dahl 2001). They will henceforth be referred to as ‘factors (A-C)’.

Before continuing to survey the disciplines named in the second paragraph, a clarification of our understanding of some basic notions seems appropriate. First, we assume that ‘varieties’ in linguistic ‘space’ arise from “clustering tendencies within a continuum”, which consists of a diatopic, a diastratic and a diachronic dimension (the diaphasic and the diamesic dimension can be neglected). Any variety can be discerned only against the background of areal and socially relevant continua as “concentration areas (…) identified by a particular frequency of certain variants, by the co-occurrence of several features and possibly by some diagnostic traits, which appear in that variety only” (Berruto 2010: 236); the particular concentration and combination of features is felt by speakers to be distinct enough to set these structures and rules apart from other sets of rules on all structural levels (cf. the notion ‘Vollvarietät’ in Schmidt/Herrgen 2011: 51 and passim). We will speak of ‘variety’ whenever we are aware of significant and systematic structural differences between subsets (along the aforementioned dimensions), which traditionally refer to the same language, and whenever we do not want to ascribe specific structural properties to all varieties of a language as a whole. ‘Dialects’, in turn, are diatopically defined varieties which typically show a great structural distance from standard (or roof) varieties (Schmidt/Herrgen 2011: 59; cf. also Berruto 2010: 230). This encompasses ‘primary’ as well as ‘secondary’, but not ‘tertiary’ dialects (in Coseriu’s sense), for reasons given in Krefeld (2011). Although we are aware of the possible impact of social networks and standard varieties, we want to base our understanding of ‘linguistic space’ basically on relative geographic distance. This simple notion of ‘space’ needs to be “corrected” only (i) in case of traffic routes and/or (notoriously scarce) data on migrations, or (ii) if influence from standard varieties is obvious. It can be allowed insofar as diastratic diversification in the BSCZ is considerably lower than in Western Europe (e.g., in zones of overlap or influence within or between Germanic and Romance varieties), and the target of our investigation is varieties that are only or predominantly spoken and generally betray a very local range of networks. Used as a kind of null hypothesis, this simple notion of space also bears a methodological advantage: given the comparatively “flat” sociolinguistic diversification in the BSCZ, any significant deviations from correlations between geographic distance and distance in terms of linguistic structure can be used as indicators of directions of (ancient or recent) spread. Such “mismatches” between geographic and linguistic distance have to be expected, and it is exactly these which reflect convergence and for which we will subject specific features to triangulation (see below). The grid of spots from which field records have been collected and from where they still will have to be provided, in order to attain a sufficiently dense coverage of comparable primary data, corresponds to these assumptions.

Second, as for ‘linguistic areas’, we are aware that a methodologically impeccable definition of this notion is troublesome (cf. Bickel/Nichols 2006; Wiemer/Wälchli 2012: 14-18), if not impossible (Bisang 2010). Bisang, however, admits that the “problems with the exact definition of a linguistic area do not exclude the relevance of contact-induced change within a certain geographic area” (2010: 428). He proposes “a less rigid concept [than ‘linguistic area’; BW] in terms of zones of convergence” (2010: 429). The BSCZ can be understood in just this sense; it has arisen from an accumulation of ‘areal patterns’. These, in turn, should be understood as “a spatial constellation of linguistic features across languages which is significantly different from a random distribution and which cannot fully be explained by other factors than areality such as genealogic relatedness or universal principles. Areal patterns are cumulative or, put differently, epiphenomena for which it is extremely unlikely that they could have developed without any contact.” (Wälchli 2012: 233f.) The distribution of such patterns should, thus, differ significantly from their distribution among both genealogically related and unrelated languages (or varieties) within a larger, but compact region. The significance of a feature depends on (i) its typological markedness and salience on a larger areal background, (ii) the internal structure of the feature (scalar vs. categorical, binary vs. multiple values, etc.), (iii) its token-frequency and (iv) the nature of constraints it shows within the stock of lexical stems (= ‘lexical expansion’) of the relevant varieties. As a rule of thumb, the more we analyze features that are salient in comparison to the immediate environment of a postulated area, and the more they are heterogeneous and do not show one- or double-sided implicational relations, the more robust the assumed area is.

Notwithstanding these insights, typology, areal and contact linguistics, historical comparative linguistics and dialect geography only quite recently have begun to systematically realize the advantages they can gain from their interdisciplinary exchange. Thus more deliberate cross-fertilization between typology and dialectology only started at the turn of the third millennium (cf., e.g., Anderwald/Kortmann 2002: 160, Kortmann (ed.) 2004, Anderwald/Szmrecsanyi 2009). Likewise, the rather recent interest of typologists in areal and, thus, contact-conditioned variation of grammatical phenomena has only lately been mirrored by more serious attempts at renewing dialect geography; admittedly, these attempts have already yielded some methodological stimuli for areal typology (Goebl 2001; Glaser 2008; forthc.). In general, however, the neglect of dialectological research not only for general linguistic issues (Chambers/Trudgill 21998: 15), but also for other disci-plines concerned with “raumzeitlich verankerter Sprachverschiedenheit” (Oesterreicher 2007: 54) has only changed very recently. With only a few exceptions, dialectology of Germanic, Romance, let alone Slavic and Baltic languages has neglected issues of language (or dialect) contact beyond the respective family. Additionally it has not taken areal clines and similar questions going beyond that borderline into due account (see f. 2). Likewise, systematic research in dialectal syntax is recent; cf., for instance, investigations into dialectal syntax of German (Seiler 2005; Glaser 2008) or of English (Kortmann 2002).

First systematic attempts at integrating two or three of the factors (A-C) have been made, inter alia, by Kortmann (ed., 2004), Matras et al. (eds., 2006), Ramat/Roma (eds., 2007); for an overview cf. also Wiemer/Wälchli (2012: 9-18). Among publications most relevant for the present project are Nau (1995) and Koptjevskaja-Tamm/Wälchli (2001). They account for a number of the Circum-Baltic features selected on the micro-level involving a nuanced analysis, much in the spirit of dialectology, linguistic geography, historical linguistics, and on the macro-level emphasizing the typological (un)markedness of features.

Let us now come to models that help make structural variation in space visible (for a partial overview cf. Britain 2010: 148-151). Most of these, which have hitherto been widely applied, show shortcomings which make them rather inconvenient for the object of our study, namely: for an empirically reliable assessment of the motifs of convergence phenomena among (non-standard) varieties of different language groups within larger areal clines. The shortcomings of known models are as follows: they do not account for diastratic diffusion (wave models), they too rely greatly on the assumption of variety-internal homogeneity, do not permit correlating features to have weight and/or are not suited for scalar features (isopleth maps). All models that rest on the notion of the traditional isogloss are confined to only one language group or even language (dialect atlases and approaches based on them, like dialectometry3 and other taxonomic models4). Many models and projects have been restricted to specific categories (e.g., nominal ones, as with the MDABJa 2005 for the Balkan languages), and most of them do not seem to allow for the differentiation of hotbeds or rather create the impression of linear development in space and time (isopleth and wave models).

Fallacies born by such an impression have been discussed in Wiemer/Wälchli (2012: 12). In order to avoid them, we need to discern multiple hotbeds and, for this purpose, have to work on a much higher zoom-in level. The necessity and fruitfulness of such an approach can be induced, for instance, from a comparison of the results in Seržant (2012) and Wiemer (forthc.) concerning the areal diffusion of predication patterns based on non-agreeing anteriority participles (perfect, passive and impersonals, including evidential extensions). Moreover, we need to look at heterogeneous phenomena on the basis of corpora of natural speech in order to get at “aggregate variability” in a way similar to the impressive study of Szmrecsanyi (2011); however, this investigation again did not cross the border of one language (namely British English). In turn, Seiler (2005) has demonstrated how to get around without the notion of isoglosses if one wants to adequately capture more complicated and diatopically layered grammatical constraints. In his investigation of Swiss German dialects he introduced syntactic variables and replaced isoglosses and all derivative notions by ‘inclined planes’ (“Prominenz über schiefe Ebenen”; 2005: 330f.). His proposal is particularly worth application in our project, since, for some features, it allows to very elegantly demonstrate if and how two (or more) alternative realizations of grammatical categories (or notional concepts) show inverted directions of increase—decrease; in other words: how they run into one another geographically as complementary realizations (with a smaller zone of overlap demonstrating clear hierarchies of constraints). For instance, on the Baltic side of the BSCZ, verbal particles and prefixes occur as equivalent means of the modification of verbal action, and they show an areally inverted functional load (N-S-cline), manifested in both type and token frequency (see feature F 8); cf. Wälchli (2001), Wiemer (2013). Based on this, let us now come to our proposal: triangulation. This cover term is meant to comprise procedures by which phenomena of structural convergence in a dedicated area are systematically assessed by cross-verification of complementary approaches that concentrate on at least one of the three kinds of factors (A-C). Weaknesses of one approach are to be counterbalanced by the strong sides of complementary ones. We “borrow” the term ‘triangulation’ from empirical social sciences, where analogical procedures have been experiencing wide application already for some decades (Olson 2004). In linguistics, less systematic attempts at triangulation have been undertaken implicitly, for instance, in Balkan linguistics; but, as discussed above, a systematic account of factors (A-C) was, until recently, outside the focus of interdisciplinary cross-fertilization. We want to make the concept of triangulation explicit in linguistics and have already elaborated on it in Wiemer/Erker (2011) and Wiemer et al. (2012).

Likewise, multivariate, quantifying approaches used to model the profile and rise of convergence zones have begun to be exploited only recently, but mostly they have concerned very large areas (cf. Koptjevskaja-Tamm/Wälchli 2001; Wälchli 2012; Cysouw, forthc.). Neighbor-Nets (NNs) and Multidimensional Scaling (MDS) have turned out to be very appropriate tools for making such areal distributions visible. They are void of the disadvantages of the methods mentioned above: they do not rely upon taxonomies of features, nor do they require the notion of isoglosses, instead they permit the representation and analysis of interrupted implicational clines (Wälchli 2012: 250), they allow for bundling (clines of) features and help visualize relative distances between features, varieties, or speakers. These procedures are also in line with the importance (and descriptive adequacy) of “intra-domain implications”, i.e. of the cross-linguistic analysis of more fine-grained hierarchies within restricted grammatical domains (for which cf. Haspelmath 2008). Of course, cumulative effects of areal patterns are generally easier to disclose for larger areas, for which genealogical relatedness diminishes. However, an increase in the fine-grained character of features and a detailed account of their variation, of the history of grams involved and their distribution as well as of restrictions among lexical stems counterbalance this disadvantage. NNs and MDS can visualize (dis)similarities even across family-boundaries and, since the data are to be analyzed on a fine-grained level; on this level they help detect different hotbeds. NNs and MDS have been applied both in areal linguistics (Cysouw 2005; forthc.) and dialect geography (Szmrecsanyi 2011; Streck/Auer, forthc.). However, to our knowledge, this project is the first which will attempt to use NNs and MDS with respect to a comparatively small contact zone and with genealogically close varieties of different languages, but with a considerably higher zoom-in factor.

Now, as for the current research status concerning the BSCZ, it more or less reflects the general situation of a still low degree of cross-fertilization among the disciplines discussed above. In contrast to a long tradition of areal research in Slavic linguistics, especially for the Balkans (cf., among many others, Topolinjska et al. (eds.) 2005; Mišeska Tomić 2006; Friedman 2008), the BSCZ has remained remarkably understudied in terms of areal linguistics proper. Dialectologists working in this region have hardly ever looked at their data from this perspective or thoroughly taken contact varieties into account; for some few exceptions cf. articles in Toporov (ed., 1972), Sudnik (ed., 1980) and Jankowiak (2009), cf. also Nevskaja/Sudnik (1978), Smułkowa (1988), Grek-Pabisowa/Maryniakowa (1993). Comprehensive accounts of features converging across family boundaries form an exception (cf. Lekomceva 1972a, b; Sudnik 1975). This is maintained as well for the investigation of regional Polish (Pol. polszczyzna kresowa), which otherwise belongs to the best-studied varieties of the BSCZ.5 Likewise, in the important double-volume Dahl/Koptjevskaja-Tamm (eds., 2001), which does focus on areal and contact linguistic issues of the broader area, Polish varieties and Belarusian rural dialects were hardly paid attention to; it does, however, contain a valuable digest on areally relevant properties of the Russian dialects of Old Believers in the Baltics (Čekmonas 2001). On the other hand, the BSCZ is a particularly well-suited test case for triangulation because it is comparatively small and dominated by varieties of only two language groups: Slavic and Baltic. Moreover, the areal intersection of Baltic and Slavic allows for a sharp geographic delineation (see very first paragraph), leaving aside a few insular dialects. Simultaneously, different studies have already shown that salient features of the BSCZ can and should be conceived of as being embedded in larger areal continua, among which the most prominent one runs in NE-SW direction, roughly from Lake Ladoga (NE) toward Mazowsze and Podlasie (SW). Cf., for instance, Wiemer/Giger (2005: ch. 12), Maryniakowa (1976) regarding resultatives (see feature F 10), Ambrazas (2001) on nominative objects (F 12), Wiemer (2012a) on clausal syntax without agreement triggering NPs, Wiemer (2006b; 2012a; forthc.) and Seržant (2012) on the relation between Actor-demoting participial constructions, the perfect and its evidential reinterpretation (F 9-10), Wälchli (2000) and Wiemer (2006a) on evidentiality, Wälchli (2001) and Wiemer (2013) on verbal particles (F 8), Seržant (2005; 2008; 2010) on phonetics and phonology, Seržant (forthc. a, d) on non-canonical alignment patterns (F 9), Seržant (forthc. b) on object and subject marking (F 11) as well as Koptjevskaja-Tamm/ Wälchli (2001) for a general survey (including F 4, 6, 9-12), partially cf. also Wiemer (2003b) and specifically Wiemer/Erker (2011) from the perspective of particular dialects. These continua create something similar to concentric circles in which the number of other families (genera) increases (first of all on account of Finnic). Most of the features F 1-12 chosen for systematic investigation in the project were not present in proto-stages of Slavic or Baltic. We thus have a better chance of discerning the role of (mutual) contact influence between both groups. First, their diachronic development, though still known quite fragmentarily, has been investigated much better than for most languages outside Western and Central Europe. Second, we can allow for a much more fine-grained analysis of specific features as has been possible for most languages beyond the Germanic and Romance ones.

Why are Belarusian rural vernaculars of central significance? The pervasiveness and intensity of convergent development in the BSCZ can be observed most clearly in Belarusian mixed vernaculars of the countryside. They are the most wide-spread variety of this region. For centuries, Belarusian rural vernaculars have been playing a key role in sociolinguistic terms since they have served as the main and constant transmitter in communication and language shift (Wiemer 2003c: 109-119). This factor, together with their low social prestige, a practically absent roof variety and, thus, a low degree of “filtering” norms, has probably allowed contact phenomena that are typical for the whole BSCZ to enter into these Belarusian varieties in the most unhampered way. Generally, Belarusian dialects of the BSCZ have been subject to both inner-Slavic and Baltic influences and, in the last decades, to increasing dialect leveling and mixing with Russian (Wiemer 2003a). These mixed vernaculars have often been referred to by the Polish pseudo-term mowa prosta (lit. ‘simple speech’) or język tutejszy (‘hereish language’). They differ from the so-called ‘trasjanka’, which is an urban “melange” of Belarusian with Russian arisen in post-war Soviet Union (cf., e.g., Hentschel/ Tesch 2006; Kittel et al. 2010). Recently Lithuanian influence seems to have receded even in phonetics (Wiemer 2006c). Leveling and mixing have only been described by Smułkowa (2010). Despite their key role, Belarusian rural vernaculars have remained the worst studied type of variety in the BSCZ, even from a dialectological perspective. For an early case study in the phonetics of one [!] speaker cf. Broch (1958), the first detailed accounts of the variation in nominal inflection based on two geographically distant spots of the BSCZ (around Braslaŭ and Lida) have been written by Erker (2009; forthc.). Jankowiak (2009) contains the only systematic account of structural features (and their sociolinguistic background) of Belarusian dialects in the northeast corner of the BSCZ (= Latgalia). Woolhiser (2005) is the only study investigating the impact of new political borders on divergence of Belarusian dialects (regions of Białystok, Poland, and Hrodna, Belarus). Hardly any studies exist on Belarusian spoken on either side of the border region with Lithuania and Latgalia that would inquire into the type and range of grammatical variation sub specie language contact and/or areal significance. An exception to this is Wiemer et al. (2004) and, first of all, Wiemer (2003a) where, mainly on the basis of personal field records, it is shown that structural variation in the Belarusian rural vernaculars of the BSCZ does not differ in principle from known features of ‘polszczyzna kresowa‘. The hypothesis that this Polish variety and rural Belarusian do not differ so much in the types of singular features and their variation as in their combination and proportion of their token frequency was further developed in Wiemer/Erker (2011). Moreover, this study shows that the Belarusian vernaculars are to be considered as core varieties of the BSCZ reflecting its overall structural convergence since they are the most obvious manifestation of almost all prominent features of this region. Aksana Erker’s dissertation project on the Belarusian vernacular of two selected spots in Belarus (around Braslaŭ and Lida; see map) has recently been finíshed. In addition to a structural description, this thesis contains an assessment of dialect features in terms of a broader East Slavic background.