This guide is an in-depth listing of resources on Linguistics available to students and faculty at the University of Minnesota.
Select List of Corpora
- Linguistic Data ConsortiumLDC is a repository for linguistic datasets, corpora, and other resources for research on human language. Use the link above to view descriptions and sign up for access to the corpora and datasets available to current UMN faculty, students, and staff. Quick overview of corpora available.
- BAS CLARIN RepositoryCorpora of spoken language archived in the Bavarian Archive for Speech Signals (BAS). Most resources marked with 'free for science' (ACA) or 'public' (PUB) can be downloaded for free by academic users.
- English-Corpora.orgFormerly the "BYU Corpora," a portal to a number of English-language corpora including: iWeb: The Intelligent Web-based Corpus, News on the Web (NOW), Global Web-Based English, Wikipedia Corpus,
Corpus of Contemporary American English (COCA),
Corpus of Historical American English (COHA), The TV Corpus, The Movie Corpus, and the Corpus of American Soap Operas - International Corpus of EnglishEach ICE corpus consists of one million words of spoken and written English produced after 1989.
- OLAC: Open Language Archives Community"OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources."
- UCLA Phonetics Lab ArchiveThe materials on this site comprise audio recordings illustrating phonetic structures from over 200 languages with phonetic transcriptions, plus scans of original field notes where relevant.
- PHOIBLEA repository of cross-linguistic phonological inventory data, which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample.
For assistance in finding corpora, contact Brian Vetruba (bvetruba@umn.edu; book an appointment).
Last Updated: Sep 30, 2024 10:53 AM
URL: https://libguides.umn.edu/linguisticsadvanced