Building a large scale knowledge base from Chinese Wiki Encyclopedia

Zhichun Wang, Zhigang Wang, Juanzi Li, Jeff Z. Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.

Original languageEnglish
Title of host publicationThe Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings
Pages80-95
Number of pages16
DOIs
Publication statusPublished - 18 Jun 2012
EventJoint International Semantic Technology Conference, JIST 2011 - Hangzhou, China
Duration: 4 Dec 20117 Dec 2011
https://dblp.org/db/conf/aswc/jist2011 (Link to Conference papers)

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7185 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceJoint International Semantic Technology Conference, JIST 2011
CountryChina
CityHangzhou
Period4/12/117/12/11
Internet address

Fingerprint

Knowledge Base
Ontology
Semantics
Resources
Linked Data
Semantic Web
Websites
Semantic Search
SPARQL
Knowledge Sharing
Question Answering
Wikipedia
Range of data
Thing
Schema
Person
Coverage
Attribute
Cover
Concepts

Keywords

  • Knowledge base
  • Linked Data
  • Ontology
  • Semantic Web

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wang, Z., Wang, Z., Li, J., & Pan, J. Z. (2012). Building a large scale knowledge base from Chinese Wiki Encyclopedia. In The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings (pp. 80-95). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7185 LNCS). https://doi.org/10.1007/978-3-642-29923-0_6

Building a large scale knowledge base from Chinese Wiki Encyclopedia. / Wang, Zhichun; Wang, Zhigang; Li, Juanzi; Pan, Jeff Z.

The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings. 2012. p. 80-95 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7185 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Wang, Z, Li, J & Pan, JZ 2012, Building a large scale knowledge base from Chinese Wiki Encyclopedia. in The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7185 LNCS, pp. 80-95, Joint International Semantic Technology Conference, JIST 2011, Hangzhou, China, 4/12/11. https://doi.org/10.1007/978-3-642-29923-0_6
Wang Z, Wang Z, Li J, Pan JZ. Building a large scale knowledge base from Chinese Wiki Encyclopedia. In The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings. 2012. p. 80-95. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-29923-0_6
Wang, Zhichun ; Wang, Zhigang ; Li, Juanzi ; Pan, Jeff Z. / Building a large scale knowledge base from Chinese Wiki Encyclopedia. The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings. 2012. pp. 80-95 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{010a3fcb323d46f38146d935fd4ab2b0,
title = "Building a large scale knowledge base from Chinese Wiki Encyclopedia",
abstract = "DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.",
keywords = "Knowledge base, Linked Data, Ontology, Semantic Web",
author = "Zhichun Wang and Zhigang Wang and Juanzi Li and Pan, {Jeff Z.}",
year = "2012",
month = "6",
day = "18",
doi = "10.1007/978-3-642-29923-0_6",
language = "English",
isbn = "9783642299223",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "80--95",
booktitle = "The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings",

}

TY - GEN

T1 - Building a large scale knowledge base from Chinese Wiki Encyclopedia

AU - Wang, Zhichun

AU - Wang, Zhigang

AU - Li, Juanzi

AU - Pan, Jeff Z.

PY - 2012/6/18

Y1 - 2012/6/18

N2 - DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.

AB - DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.

KW - Knowledge base

KW - Linked Data

KW - Ontology

KW - Semantic Web

UR - http://www.scopus.com/inward/record.url?scp=84862206329&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-29923-0_6

DO - 10.1007/978-3-642-29923-0_6

M3 - Conference contribution

AN - SCOPUS:84862206329

SN - 9783642299223

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 80

EP - 95

BT - The Semantic Web - Joint International Semantic Technology Conference, JIST 2011, Proceedings

ER -