DAGGER: Instance Selection for Combining Multiple Models Learnt from Disjoint Subsets

W. H. E. Davies, Peter Edwards

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We introduce a novel instance selection method for combining multiple learned models. This technique results in a single comprehensible model. This is to be contrasted with current methods that typically combine models by voting. The core of the technique, the DAGGER (Disjoint Aggregation using Example Reduction) algorithm selects training example instances which provide evidence for each decision region within each local model. A single model is then learned from the union of these selected examples. We describe experiments on models learned from disjoint training sets which show that DAGGER performs as well as weighted voting on this task and that it extracts examples which are more informative than those that can be selected at random. The experiments were conducted on models learned from disjoint subsets generated with a uniform random distribution. DAGGER is actually designed for use on naturally distributed tasks, with non-random distribution. We discuss how one view of the experimental results suggests that DAGGER should work well on this type of problem.
Original languageEnglish
Title of host publicationInstance Selection and Construction for Data Mining
EditorsHuan Liu, Hiroshi Motoda
PublisherSpringer
Pages319-336
Number of pages18
ISBN (Electronic)978-1-4757-3359-4
ISBN (Print)978-1-4419-4861-8
DOIs
Publication statusPublished - 2001

Publication series

NameThe Springer International Series in Engineering and Computer Science
PublisherSpringer
Volume608
ISSN (Print)0893-3405

    Fingerprint

Keywords

  • sampling
  • data-mining
  • distributed learning

Cite this

Davies, W. H. E., & Edwards, P. (2001). DAGGER: Instance Selection for Combining Multiple Models Learnt from Disjoint Subsets. In H. Liu, & H. Motoda (Eds.), Instance Selection and Construction for Data Mining (pp. 319-336). (The Springer International Series in Engineering and Computer Science ; Vol. 608). Springer . https://doi.org/10.1007/978-1-4757-3359-4_18