A multi-strategy learning approach to competitor identification

Tong Ruan; Yeli Lin; Haofen Wang; Jeff Z. Pan

doi:10.1007/978-3-319-15615-6_15

A multi-strategy learning approach to competitor identification

Tong Ruan^*, Yeli Lin, Haofen Wang, Jeff Z. Pan

^*Corresponding author for this work

Computing Science

East China University of Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

9 Citations (Scopus)

Abstract

Competitor identification tries to find competitors of some entity in a given field, which is the key to the success of market intelligence. Manually collecting competitors is labor-intensive and time consuming. So automatic approaches are proposed for this purpose. However, these approaches suffer from the following two main challenges. Competitor information might not only be contained in semi-structured sources like lists or tables, but also be mentioned in free texts. The diversity of its sources make competitor identification quite difficult. Also, these competitors might not always occur in form of their full names. The occurrences of name variants further increase the diversity, and make the task more challenging. In this paper, we propose a novel unsupervised approach to identify competitors from prospectuses based on a multi-strategy learning algorithm. More precisely, we first extract competitors from lists using some predefined heuristic rules. By leveraging redundancies among competitor information in lists, tables, and texts, these competitors are fed as seeds to distantly supervise the learning process to find table columns and text patterns containing competitors. The whole process is iteratively performed. In each iteration, the newly discovered competitors of high confidence from various sources are treated as new seeds for bootstrapping. The experimental results show the effectiveness of our approach without human intentions and external knowledge bases. Moreover, the approach significantly outperforms traditional named entity recognition approaches.

Original language	English
Title of host publication	Joint International Semantic Technology Conference
Subtitle of host publication	JIST 2014, Semantic Technology
Editors	T Supnithi, T Yamaguchi, J Pan, V Wuwongse, M Buranarach
Publisher	Springer-Verlag
Pages	197-212
Number of pages	16
Volume	8943
ISBN (Electronic)	9783319156149
DOIs	https://doi.org/10.1007/978-3-319-15615-6_15
Publication status	Published - 2015
Event	4th Joint International Conference on Semantic Technology, JIST 2014 - Chiang Mai, Thailand Duration: 9 Nov 2014 → 11 Nov 2014

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	8943
ISSN (Print)	03029743
ISSN (Electronic)	16113349

Conference

Conference	4th Joint International Conference on Semantic Technology, JIST 2014
Country/Territory	Thailand
City	Chiang Mai
Period	9/11/14 → 11/11/14

Bibliographical note

This work is funded by the National Key Technology R&D Program through project No. 2013BAH11F03

Keywords

Competitor mining
Distant supervision
Unsupervised learning
Wrapper induction

Access to Document

10.1007/978-3-319-15615-6_15Licence: Unspecified

Cite this

Ruan, T., Lin, Y., Wang, H., & Pan, J. Z. (2015). A multi-strategy learning approach to competitor identification. In T. Supnithi, T. Yamaguchi, J. Pan, V. Wuwongse, & M. Buranarach (Eds.), Joint International Semantic Technology Conference: JIST 2014, Semantic Technology (Vol. 8943, pp. 197-212). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8943). Springer-Verlag. https://doi.org/10.1007/978-3-319-15615-6_15

A multi-strategy learning approach to competitor identification. / Ruan, Tong; Lin, Yeli; Wang, Haofen et al.
Joint International Semantic Technology Conference: JIST 2014, Semantic Technology. ed. / T Supnithi; T Yamaguchi; J Pan; V Wuwongse; M Buranarach. Vol. 8943 Springer-Verlag, 2015. p. 197-212 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8943).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Ruan, T, Lin, Y, Wang, H & Pan, JZ 2015, A multi-strategy learning approach to competitor identification. in T Supnithi, T Yamaguchi, J Pan, V Wuwongse & M Buranarach (eds), Joint International Semantic Technology Conference: JIST 2014, Semantic Technology. vol. 8943, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8943, Springer-Verlag, pp. 197-212, 4th Joint International Conference on Semantic Technology, JIST 2014, Chiang Mai, Thailand, 9/11/14. https://doi.org/10.1007/978-3-319-15615-6_15

Ruan T, Lin Y, Wang H, Pan JZ. A multi-strategy learning approach to competitor identification. In Supnithi T, Yamaguchi T, Pan J, Wuwongse V, Buranarach M, editors, Joint International Semantic Technology Conference: JIST 2014, Semantic Technology. Vol. 8943. Springer-Verlag. 2015. p. 197-212. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2015 Feb 21. doi: 10.1007/978-3-319-15615-6_15

Ruan, Tong ; Lin, Yeli ; Wang, Haofen et al. / A multi-strategy learning approach to competitor identification. Joint International Semantic Technology Conference: JIST 2014, Semantic Technology. editor / T Supnithi ; T Yamaguchi ; J Pan ; V Wuwongse ; M Buranarach. Vol. 8943 Springer-Verlag, 2015. pp. 197-212 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{8ae4c1581566475b9ab262cdc804a54d,

title = "A multi-strategy learning approach to competitor identification",

abstract = "Competitor identification tries to find competitors of some entity in a given field, which is the key to the success of market intelligence. Manually collecting competitors is labor-intensive and time consuming. So automatic approaches are proposed for this purpose. However, these approaches suffer from the following two main challenges. Competitor information might not only be contained in semi-structured sources like lists or tables, but also be mentioned in free texts. The diversity of its sources make competitor identification quite difficult. Also, these competitors might not always occur in form of their full names. The occurrences of name variants further increase the diversity, and make the task more challenging. In this paper, we propose a novel unsupervised approach to identify competitors from prospectuses based on a multi-strategy learning algorithm. More precisely, we first extract competitors from lists using some predefined heuristic rules. By leveraging redundancies among competitor information in lists, tables, and texts, these competitors are fed as seeds to distantly supervise the learning process to find table columns and text patterns containing competitors. The whole process is iteratively performed. In each iteration, the newly discovered competitors of high confidence from various sources are treated as new seeds for bootstrapping. The experimental results show the effectiveness of our approach without human intentions and external knowledge bases. Moreover, the approach significantly outperforms traditional named entity recognition approaches.",

keywords = "Competitor mining, Distant supervision, Unsupervised learning, Wrapper induction",

author = "Tong Ruan and Yeli Lin and Haofen Wang and Pan, {Jeff Z.}",

note = "This work is funded by the National Key Technology R&D Program through project No. 2013BAH11F03 ; 4th Joint International Conference on Semantic Technology, JIST 2014 ; Conference date: 09-11-2014 Through 11-11-2014",

year = "2015",

doi = "10.1007/978-3-319-15615-6_15",

language = "English",

volume = "8943",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer-Verlag",

pages = "197--212",

editor = "T Supnithi and T Yamaguchi and J Pan and V Wuwongse and M Buranarach",

booktitle = "Joint International Semantic Technology Conference",

}

TY - GEN

T1 - A multi-strategy learning approach to competitor identification

AU - Ruan, Tong

AU - Lin, Yeli

AU - Wang, Haofen

AU - Pan, Jeff Z.

N1 - This work is funded by the National Key Technology R&D Program through project No. 2013BAH11F03

PY - 2015

Y1 - 2015

N2 - Competitor identification tries to find competitors of some entity in a given field, which is the key to the success of market intelligence. Manually collecting competitors is labor-intensive and time consuming. So automatic approaches are proposed for this purpose. However, these approaches suffer from the following two main challenges. Competitor information might not only be contained in semi-structured sources like lists or tables, but also be mentioned in free texts. The diversity of its sources make competitor identification quite difficult. Also, these competitors might not always occur in form of their full names. The occurrences of name variants further increase the diversity, and make the task more challenging. In this paper, we propose a novel unsupervised approach to identify competitors from prospectuses based on a multi-strategy learning algorithm. More precisely, we first extract competitors from lists using some predefined heuristic rules. By leveraging redundancies among competitor information in lists, tables, and texts, these competitors are fed as seeds to distantly supervise the learning process to find table columns and text patterns containing competitors. The whole process is iteratively performed. In each iteration, the newly discovered competitors of high confidence from various sources are treated as new seeds for bootstrapping. The experimental results show the effectiveness of our approach without human intentions and external knowledge bases. Moreover, the approach significantly outperforms traditional named entity recognition approaches.

AB - Competitor identification tries to find competitors of some entity in a given field, which is the key to the success of market intelligence. Manually collecting competitors is labor-intensive and time consuming. So automatic approaches are proposed for this purpose. However, these approaches suffer from the following two main challenges. Competitor information might not only be contained in semi-structured sources like lists or tables, but also be mentioned in free texts. The diversity of its sources make competitor identification quite difficult. Also, these competitors might not always occur in form of their full names. The occurrences of name variants further increase the diversity, and make the task more challenging. In this paper, we propose a novel unsupervised approach to identify competitors from prospectuses based on a multi-strategy learning algorithm. More precisely, we first extract competitors from lists using some predefined heuristic rules. By leveraging redundancies among competitor information in lists, tables, and texts, these competitors are fed as seeds to distantly supervise the learning process to find table columns and text patterns containing competitors. The whole process is iteratively performed. In each iteration, the newly discovered competitors of high confidence from various sources are treated as new seeds for bootstrapping. The experimental results show the effectiveness of our approach without human intentions and external knowledge bases. Moreover, the approach significantly outperforms traditional named entity recognition approaches.

KW - Competitor mining

KW - Distant supervision

KW - Unsupervised learning

KW - Wrapper induction

UR - http://www.scopus.com/inward/record.url?scp=84928902021&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-15615-6_15

DO - 10.1007/978-3-319-15615-6_15

M3 - Published conference contribution

AN - SCOPUS:84928902021

VL - 8943

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 197

EP - 212

BT - Joint International Semantic Technology Conference

A2 - Supnithi, T

A2 - Yamaguchi, T

A2 - Pan, J

A2 - Wuwongse, V

A2 - Buranarach, M

PB - Springer-Verlag

T2 - 4th Joint International Conference on Semantic Technology, JIST 2014

Y2 - 9 November 2014 through 11 November 2014

ER -

A multi-strategy learning approach to competitor identification

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this