Abstract
We describe the architecture for language processing adopted on the eScience project `Extracting the Science from Scientic Publications' (nicknamed SciBorg). In this approach, papers from different sources are rst processed to give a common XML format (SciXML). Language processing modules operate on the SciXML in an architecture that allows for (partially) parallel deep and shallow processing and for a flexible combination of domain-independent and domain-dependent techniques. Robust Minimal Recursion Semantics (RMRS) acts both as a language for representing the output of processing and as an integration language for combining different modules. Language processing produces
RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.
RMRS markup represented as standoff annotation on the original SciXML. Information extraction (IE) of various types is defined as operating on RMRSs. Rhetorical analysis of the texts also partially depends on IE-like patterns and supports novel methods of information access.
Original language | English |
---|---|
Title of host publication | Proceedings of the UK e-Science Programme All Hands Meeting 2006 (AHM2006) |
Publisher | National e-Science Centre |
Number of pages | 8 |
ISBN (Electronic) | 0955398800 |
Publication status | Published - 2006 |
Event | UK e-Science All Hands Meeting 2006 - Nottingham, United Kingdom Duration: 18 Sep 2006 → 21 Sep 2006 |
Conference
Conference | UK e-Science All Hands Meeting 2006 |
---|---|
Country/Territory | United Kingdom |
City | Nottingham |
Period | 18/09/06 → 21/09/06 |