Until recently, corpus studies of natural bilingual speech and, more specifically, codeswitching in bilingual speech have used a manual method of glossing, partof- speech tagging, and clause-splitting to prepare the data for analysis. In our article, we present innovative tools developed for the first large-scale corpus study of codeswitching triggered by cognates. A study of this size was only possible due to the automation of several steps, such as morpheme-by-morpheme glossing, splitting complex clauses into simple clauses, and the analysis of internal and external codeswitching through the use of database tables, algorithms, and a scripting language.
ASJC Scopus subject areas
- Information Systems
- Language and Linguistics
- Linguistics and Language
- Computer Science Applications