Transfer learning based cross-lingual knowledge extraction for Wikipedia

Zhigang Wang, Zhixing Li, Juanzi Li, Jie Tang, Jeff Z. Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called Wiki-CiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.

Original languageEnglish
Title of host publicationProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages641-650
Number of pages10
Volume1
ISBN (Print)9781937284503
Publication statusPublished - Aug 2013
Event51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: 4 Aug 20139 Aug 2013

Conference

Conference51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
CountryBulgaria
CitySofia
Period4/08/139/08/13

Fingerprint

Wikipedia
learning
language
learning method
knowledge
Language

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Wang, Z., Li, Z., Li, J., Tang, J., & Pan, J. Z. (2013). Transfer learning based cross-lingual knowledge extraction for Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 641-650). Association for Computational Linguistics (ACL).

Transfer learning based cross-lingual knowledge extraction for Wikipedia. / Wang, Zhigang; Li, Zhixing; Li, Juanzi; Tang, Jie; Pan, Jeff Z.

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1 Association for Computational Linguistics (ACL), 2013. p. 641-650.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z, Li, Z, Li, J, Tang, J & Pan, JZ 2013, Transfer learning based cross-lingual knowledge extraction for Wikipedia. in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol. 1, Association for Computational Linguistics (ACL), pp. 641-650, 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, 4/08/13.
Wang Z, Li Z, Li J, Tang J, Pan JZ. Transfer learning based cross-lingual knowledge extraction for Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. Association for Computational Linguistics (ACL). 2013. p. 641-650
Wang, Zhigang ; Li, Zhixing ; Li, Juanzi ; Tang, Jie ; Pan, Jeff Z. / Transfer learning based cross-lingual knowledge extraction for Wikipedia. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1 Association for Computational Linguistics (ACL), 2013. pp. 641-650
@inproceedings{ffa6c37b90654b1e8fe3c3c32725b98c,
title = "Transfer learning based cross-lingual knowledge extraction for Wikipedia",
abstract = "Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called Wiki-CiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.",
author = "Zhigang Wang and Zhixing Li and Juanzi Li and Jie Tang and Pan, {Jeff Z.}",
year = "2013",
month = "8",
language = "English",
isbn = "9781937284503",
volume = "1",
pages = "641--650",
booktitle = "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Transfer learning based cross-lingual knowledge extraction for Wikipedia

AU - Wang, Zhigang

AU - Li, Zhixing

AU - Li, Juanzi

AU - Tang, Jie

AU - Pan, Jeff Z.

PY - 2013/8

Y1 - 2013/8

N2 - Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called Wiki-CiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.

AB - Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called Wiki-CiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.

UR - http://www.scopus.com/inward/record.url?scp=84907368159&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781937284503

VL - 1

SP - 641

EP - 650

BT - Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

PB - Association for Computational Linguistics (ACL)

ER -