Natural Language Processing and Early-Modern Dirty Data

Applying IBM Languageware to the 1641 Depositions

Mark S. Sweetnam, Barbara A. Fennell

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.
Original languageEnglish
Pages (from-to)39-54
Number of pages16
JournalLiterary and Linguistic Computing
Volume27
Issue number1
Early online date15 Dec 2011
DOIs
Publication statusPublished - Apr 2012

Fingerprint

Linguistics
linguistics
Processing
language
software
Linguistic Analysis
Source Text
Software
Deposition
Natural Language Processing

Keywords

  • linguistics
  • digital humanities
  • 1641 depositions
  • corpus linguistics

Cite this

Natural Language Processing and Early-Modern Dirty Data : Applying IBM Languageware to the 1641 Depositions. / Sweetnam, Mark S.; Fennell, Barbara A.

In: Literary and Linguistic Computing, Vol. 27, No. 1, 04.2012, p. 39-54.

Research output: Contribution to journalArticle

@article{3a1d3bfde0e248fb93d34667c8994586,
title = "Natural Language Processing and Early-Modern Dirty Data: Applying IBM Languageware to the 1641 Depositions",
abstract = "This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.",
keywords = "linguistics, digital humanities, 1641 depositions, corpus linguistics",
author = "Sweetnam, {Mark S.} and Fennell, {Barbara A.}",
year = "2012",
month = "4",
doi = "10.1093/llc/fqr050",
language = "English",
volume = "27",
pages = "39--54",
journal = "Literary and Linguistic Computing",
issn = "0268-1145",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Natural Language Processing and Early-Modern Dirty Data

T2 - Applying IBM Languageware to the 1641 Depositions

AU - Sweetnam, Mark S.

AU - Fennell, Barbara A.

PY - 2012/4

Y1 - 2012/4

N2 - This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.

AB - This article provides an account of the steps involved in adapting IBM's Languageware natural language processing software to a large corpus of highly non-standard 17th century documents. It examines the challenges encountered as part of this process, and outlines the approach adopted to provide a robust and reusable tool for the linguistic analysis of early modern source texts.

KW - linguistics

KW - digital humanities

KW - 1641 depositions

KW - corpus linguistics

U2 - 10.1093/llc/fqr050

DO - 10.1093/llc/fqr050

M3 - Article

VL - 27

SP - 39

EP - 54

JO - Literary and Linguistic Computing

JF - Literary and Linguistic Computing

SN - 0268-1145

IS - 1

ER -