The Center for Applied Linguistics Collection contains 118 hours of recordings documenting North American English dialects. The recordings include speech samples, linguistic interviews, oral histories, conversations, and excerpts from public speeches. They were drawn from various archives, and from the private collections of fifty collectors, including linguists, dialectologists, and folklorists. They were submitted to the Center for Applied Linguistics as part of a project entitled "A Survey and Collection of American English Dialect Recordings," which was funded by the Center for Applied Linguistics and the National Endowment for the Humanities.
The corpora are supported by hundreds of universities and thousands of individuals throughout the world, and by organizations like the German Research Foundation. The corpus data (e.g. full-text, word frequency) has been used by many companies, including Amazon, Google, Microsoft, IBM, Sony, Disney, Intel, Adobe, Samsung; several AI companies; language-related companies like Merriam-Webster, Dictionary.com, Grammarly, Duolingo, TurnItIn, Oxford University Press, Sketch Engine; and many more.
Archive of primary-source recordings of English-language dialects and accents as heard around the world. Includes roughly 1400 samples from 120 countries and territories. Recordings are primarily in English, are of native speakers, and include both English-Language dialects and English spoken in the accents of other languages.
The LINGUIST List is operated at Indiana University, Department of Linguistics. The aim of the list is to provide a forum where academic linguists can discuss linguistic issues and exchange linguistic information. LINGUIST List offers support to graduate students in linguistics and summer interns, who serve in return as editors of the list and help with the development and maintenance of the list server and website.
The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.
The World Atlas of Languages is an interactive and dynamic online tool that documents different aspects and features of language status in countries and languages around the world. It aims to provide a detailed record of languages as communicative tools and knowledge resources in their sociocultural and socio-political contexts.
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. LDC was formed in 1992 to address the critical data shortage then facing language technology research and development.
OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.
PHOIBLE is a repository of cross-linguistic phonological inventory data, which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. Release 2.0 from 2019 includes 3020 inventories that contain 3183 segment types found in 2186 distinct languages. A bibliographic record is provided for each source document; note that some languages in PHOIBLE have multiple entries based on distinct sources that disagree about the number and/or identity of that languageās phonemes.
The Endangered Languages Archive (ELAR) is a digital repository for preserving multimedia collections of endangered languages from all over the world, making them available for future generations.
The Endangered Languages Project puts technology at the service of the organizations and individuals working to confront the language endangerment by documenting, preserving and teaching them. Through this website, users can not only access the most up to date and comprehensive information on endangered languages as well as language resources being provided by partners, but also play an active role in putting their languages online by submitting information or samples in the form of text, audio or video files.
The World Language Mapping System (WLMS) has been a project of Global Mapping International (GMI) and several contributors for over 25 years. The data is provided in Esri shapefile (.shp) and file geodatabase formats for GIS systems. The source for language names and codes is 19th edition (2016) ISO 639-3 standard. ArcGIS layer files are included to help with symbology. Linguists, cultural geographers, and other researchers will find this data valuable in understanding the locations and distribution of languages throughout the world.
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors.