This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.
- digital humanities
- 1641 depositions
- corpus linguistics
Sweetnam, M. S., & Fennell, B. A. (2012). Natural Language Processing and Early-Modern Dirty Data: Applying IBM Languageware to the 1641 Depositions. Literary and Linguistic Computing, 27(1), 39-54. https://doi.org/10.1093/llc/fqr050