Knowledge extraction from Chinese wiki encyclopedias

Zhi Chun Wang*, Zhi Gang Wang, Juan Zi Li, Jeff Z. Pan

*Corresponding author for this work

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

The vision of the Semantic Web is to build a 'Web of data' that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

Original languageEnglish
Pages (from-to)268-280
Number of pages13
JournalJournal of Zhejiang University: Science C
Volume13
Issue number4
Early online date3 Apr 2012
DOIs
Publication statusPublished - Apr 2012

Fingerprint

Ontology
Semantic Web
Semantics

Keywords

  • Knowledge base
  • Linked Data
  • Ontology
  • Semantic Web

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Knowledge extraction from Chinese wiki encyclopedias. / Wang, Zhi Chun; Wang, Zhi Gang; Li, Juan Zi; Pan, Jeff Z.

In: Journal of Zhejiang University: Science C, Vol. 13, No. 4, 04.2012, p. 268-280.

Research output: Contribution to journalArticle

Wang, Zhi Chun ; Wang, Zhi Gang ; Li, Juan Zi ; Pan, Jeff Z. / Knowledge extraction from Chinese wiki encyclopedias. In: Journal of Zhejiang University: Science C. 2012 ; Vol. 13, No. 4. pp. 268-280.
@article{c1962bb1295c48d49e53b1e7a2b62e63,
title = "Knowledge extraction from Chinese wiki encyclopedias",
abstract = "The vision of the Semantic Web is to build a 'Web of data' that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.",
keywords = "Knowledge base, Linked Data, Ontology, Semantic Web",
author = "Wang, {Zhi Chun} and Wang, {Zhi Gang} and Li, {Juan Zi} and Pan, {Jeff Z.}",
year = "2012",
month = "4",
doi = "10.1631/jzus.C1101008",
language = "English",
volume = "13",
pages = "268--280",
journal = "Journal of Zhejiang University: Science C",
issn = "1869-1951",
publisher = "Zhejiang University Press",
number = "4",

}

TY - JOUR

T1 - Knowledge extraction from Chinese wiki encyclopedias

AU - Wang, Zhi Chun

AU - Wang, Zhi Gang

AU - Li, Juan Zi

AU - Pan, Jeff Z.

PY - 2012/4

Y1 - 2012/4

N2 - The vision of the Semantic Web is to build a 'Web of data' that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

AB - The vision of the Semantic Web is to build a 'Web of data' that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

KW - Knowledge base

KW - Linked Data

KW - Ontology

KW - Semantic Web

UR - http://www.scopus.com/inward/record.url?scp=84860744109&partnerID=8YFLogxK

U2 - 10.1631/jzus.C1101008

DO - 10.1631/jzus.C1101008

M3 - Article

AN - SCOPUS:84860744109

VL - 13

SP - 268

EP - 280

JO - Journal of Zhejiang University: Science C

JF - Journal of Zhejiang University: Science C

SN - 1869-1951

IS - 4

ER -