An Architecture for Language Processing for Scientific Texts

Ann Copestake, Peter Corbett, Peter Murray-Rust, CJ Rupp, Advaith Siddharthan, Simone Teufel, Ben Waldron

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution


We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces
RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.
Original languageEnglish
Title of host publicationProceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006)
PublisherNational e-Science Centre
Number of pages8
ISBN (Electronic)0955398800
Publication statusPublished - 2006
EventUK e-Science All Hands Meeting 2006 - Nottingham, United Kingdom
Duration: 18 Sep 200621 Sep 2006


ConferenceUK e-Science All Hands Meeting 2006
Country/TerritoryUnited Kingdom


Dive into the research topics of 'An Architecture for Language Processing for Scientific Texts'. Together they form a unique fingerprint.

Cite this