An Architecture for Language Processing for Scientific Texts

Ann Copestake, Peter Corbett, Peter Murray-Rust, CJ Rupp, Advaith Siddharthan, Simone Teufel, Ben Waldron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces
RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.
Original languageEnglish
Title of host publicationProceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006)
PublisherNational e-Science Centre
Number of pages8
ISBN (Electronic)0955398800
Publication statusPublished - 2006
EventUK e-Science All Hands Meeting 2006 - Nottingham, United Kingdom
Duration: 18 Sep 200621 Sep 2006

Conference

ConferenceUK e-Science All Hands Meeting 2006
CountryUnited Kingdom
CityNottingham
Period18/09/0621/09/06

Fingerprint

Processing
XML
Semantics

Cite this

Copestake, A., Corbett, P., Murray-Rust, P., Rupp, CJ., Siddharthan, A., Teufel, S., & Waldron, B. (2006). An Architecture for Language Processing for Scientific Texts. In Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006) National e-Science Centre.

An Architecture for Language Processing for Scientific Texts. / Copestake, Ann; Corbett, Peter; Murray-Rust, Peter; Rupp, CJ; Siddharthan, Advaith; Teufel, Simone; Waldron, Ben.

Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006). National e-Science Centre, 2006.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Copestake, A, Corbett, P, Murray-Rust, P, Rupp, CJ, Siddharthan, A, Teufel, S & Waldron, B 2006, An Architecture for Language Processing for Scientific Texts. in Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006). National e-Science Centre, UK e-Science All Hands Meeting 2006, Nottingham, United Kingdom, 18/09/06.
Copestake A, Corbett P, Murray-Rust P, Rupp CJ, Siddharthan A, Teufel S et al. An Architecture for Language Processing for Scientific Texts. In Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006). National e-Science Centre. 2006
Copestake, Ann ; Corbett, Peter ; Murray-Rust, Peter ; Rupp, CJ ; Siddharthan, Advaith ; Teufel, Simone ; Waldron, Ben. / An Architecture for Language Processing for Scientific Texts. Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006). National e-Science Centre, 2006.
@inproceedings{47e74e549d6148bd933ca983c648589a,
title = "An Architecture for Language Processing for Scientific Texts",
abstract = "We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.",
author = "Ann Copestake and Peter Corbett and Peter Murray-Rust and CJ Rupp and Advaith Siddharthan and Simone Teufel and Ben Waldron",
year = "2006",
language = "English",
booktitle = "Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006)",
publisher = "National e-Science Centre",

}

TY - GEN

T1 - An Architecture for Language Processing for Scientific Texts

AU - Copestake, Ann

AU - Corbett, Peter

AU - Murray-Rust, Peter

AU - Rupp, CJ

AU - Siddharthan, Advaith

AU - Teufel, Simone

AU - Waldron, Ben

PY - 2006

Y1 - 2006

N2 - We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.

AB - We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.

M3 - Conference contribution

BT - Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006)

PB - National e-Science Centre

ER -