Natural Language Processing and Early-Modern Dirty Data: Applying IBM Languageware to the 1641 Depositions

Mark S. Sweetnam, Barbara A. Fennell

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.
Original languageEnglish
Pages (from-to)39-54
Number of pages16
JournalLiterary and Linguistic Computing
Volume27
Issue number1
Early online date15 Dec 2011
DOIs
Publication statusPublished - Apr 2012

Keywords

  • linguistics
  • digital humanities
  • 1641 depositions
  • corpus linguistics

Fingerprint

Dive into the research topics of 'Natural Language Processing and Early-Modern Dirty Data: Applying IBM Languageware to the 1641 Depositions'. Together they form a unique fingerprint.

Cite this