Corpus linguistics is a methodology in linguistics that involves computer-based empirical analyses (both quantitative and qualitative) of actual patterns of language use by employing electronically available, large collections of naturally occuring spoken and written texts, so-called corpora. Corpus-based and other types of empirical linguistic research have shown that speakers' intuitions oftentimes provide only limited access to the open-ended nature of language, which can cause problems when examining unexpected or infrequent linguistic structures, e.g. as regards lexical co-occurrence patterns, patterns of variation between grammatical constructions, word meaning, or idioms and metaphorical language.
The findings of corpus-based research has been widely applied to fields such as lexicography, most notably in the form of corpus-informed dictionaries such as the Longman Dictionary of Contemporary English, grammar (corpus-informed, descriptive grammars such as, for example, the Longman Grammar of Spoken and Written English), foreign language teaching (learner dictionaries, teaching materials and classroom methodology, e.g. in the form of Data-Driven Learning (DLL) activities), and language testing and assessment.
Many English linguistics classes in Bremen involve at some point students' own collection, processing and analysis of empirical data, often by making use of electronic corpora. In advanced classes in particular, students will be asked to carry out corpus-based projects, sometimes involving replications and extensions of earlier case studies. We offers students a wide range of computerized corpora of the two main varieties of English, British and American English, (Mark Davies' BYU corpus portal), other varieties of English (various subcorpora of the International Corpus of English, ICE) and learner corpora.
List of corpora available in: CIP-Labor (GW 2, A 3.390 ) at FB 10
Anderson, W. & J. Corbett (2009), Exploring English with Online Corpora: An Introduction. Basingstoke: Palgrave Macmillan.
Hoffmann, Sebastian et al. (2008), Corpus Linguistics with BNCweb - a Practical Guide. Frankfurt/Main: Peter Lang.
Lindquist, Hans (2009), Corpus Linguistics and the Description of English. Edinburgh : Edinburgh University Press.
McEnery, Tony & Wilson, Andrew (²2001), Corpus Linguistics. Edinburgh: Edinburgh University Press.
McEnery, Tony, Yukio Tono & Xiao, Richard (2006), Corpus-based Language Studies: An Advanced Resource Book. London: Routledge.
Mukherjee, Joybrato (2009), Anglistische Korpuslinguistik. Eine Einführung. Berlin: Erich Schmidt.
English language corpora in the foreign language classroom: What does corpus linguistics have to offer to foreign language teaching?
The link between findings of corpus-based research and (foreign) language teaching is that corpus evidence suggests which language items and processes are most likely to be encountered by language users (what is frequent and typical) and may thus deserve more time in classroom instruction. Corpora and corpus-data
- help teachers and students make better informed decisions and improve teaching material to become more authentic, i.e. representative of contemporary usage. Traditional textbooks often include simplified, non-authentic English and invented sentences which rarely, if at all, occur in natural speech situations.
- provide "real English" and reveal what native speakers typically write or say in natural discourse as to
- lexical co-occurrence patterns (collocation, colligation, semantic prosody)
- the most common meaning if a word has several senses
- items that are frequent in or across different text types
- help students to develop their own descriptive and analytical skills which improves language awareness.
Corpora in the classroom: Data-Driven Learning (DLL)
Corpora and corpus material can be used in the classroom in several ways. For instance, teachers can use computer-generated concordances and develop activities and exercises to have students explore regularities of patterning in the target language. DDL activities can range from teacher-led and relatively closed concordance-based exercises to entirely learner-centred corpus-browsing projects which involve a high degree of learner autonomy.
Aijmer, Karin, ed. (2009), Corpora and Language Teaching. Amsterdam: Benjamins.
Aston, Guy et al., eds. (2004), Corpora and Language Learners. Amsterdam: Benjamins.
Braun, Sabine et al. (eds.), Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods. Frankfurt/Main: Lang.
Campoy-Cubillo, M.C., Bellés-Fortuño, B. & Gea-Valor, M.L., eds. (2010), Corpus-BasedApproaches to English Language Teaching. London & New York: Continuum.
Flowerdew, John (2009), “Corpora in language teaching”, in Long, Michael H. & Catherine J. Doughty (eds.), The Handbook of Language Teaching. Oxford: Blackwell, 327-350.
Mukherjee, Joybrato (2002), Korpuslinguistik und Englischunterricht. Eine Einführung. Frankfurt/Main: Lang.
O'Keeffe, Anne & McCarthy, Michael, eds. (2010), The Routledge Handbook of Corpus Linguistics. New York: Routledge. [section V "Using a corpus for language pedagogy and methodology"; section VI "Designing corpus-based materials for the language classroom"]
Römer, Ute (2008), “Corpora and language teaching”, in Lüdeling, Anke & Merja Kytö (eds.), Corpus Linguistics. An International Handbook (volume 1). Berlin: Mouton de Gruyter, 112-130.
Römer, Ute (2011), “Corpus research applications in second language teaching”, Annual Review of Applied Linguistics 31, 205-225.