Enhancing music information retrieval by incorporating image-based local features

Leszek Kaliciak*, Ben Horsburgh, Dawei Song, Nirmalie Wiratunga, Jeff Pan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings
Pages226-237
Number of pages12
DOIs
Publication statusPublished - 2012
Event8th Asia Information Retrieval Societies Conference, AIRS 2012 - Tianjin, China
Duration: 17 Dec 201219 Dec 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7675 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Asia Information Retrieval Societies Conference, AIRS 2012
CountryChina
CityTianjin
Period17/12/1219/12/12

Fingerprint

Music Information Retrieval
Local Features
Information retrieval
Music
Spectrogram
Baseline
Classify
Vision

Keywords

  • Co-occurrence matrix
  • Colour moments
  • Fourier transform
  • K-means algorithm
  • Local features

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kaliciak, L., Horsburgh, B., Song, D., Wiratunga, N., & Pan, J. (2012). Enhancing music information retrieval by incorporating image-based local features. In Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings (pp. 226-237). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7675 LNCS). https://doi.org/10.1007/978-3-642-35341-3_19

Enhancing music information retrieval by incorporating image-based local features. / Kaliciak, Leszek; Horsburgh, Ben; Song, Dawei; Wiratunga, Nirmalie; Pan, Jeff.

Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings. 2012. p. 226-237 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7675 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kaliciak, L, Horsburgh, B, Song, D, Wiratunga, N & Pan, J 2012, Enhancing music information retrieval by incorporating image-based local features. in Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7675 LNCS, pp. 226-237, 8th Asia Information Retrieval Societies Conference, AIRS 2012, Tianjin, China, 17/12/12. https://doi.org/10.1007/978-3-642-35341-3_19
Kaliciak L, Horsburgh B, Song D, Wiratunga N, Pan J. Enhancing music information retrieval by incorporating image-based local features. In Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings. 2012. p. 226-237. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-35341-3_19
Kaliciak, Leszek ; Horsburgh, Ben ; Song, Dawei ; Wiratunga, Nirmalie ; Pan, Jeff. / Enhancing music information retrieval by incorporating image-based local features. Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings. 2012. pp. 226-237 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{693de67bef324709b94a2027a583117d,
title = "Enhancing music information retrieval by incorporating image-based local features",
abstract = "This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the {"}bag of visual words{"} method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46{\%} (with 5{\%} theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.",
keywords = "Co-occurrence matrix, Colour moments, Fourier transform, K-means algorithm, Local features",
author = "Leszek Kaliciak and Ben Horsburgh and Dawei Song and Nirmalie Wiratunga and Jeff Pan",
year = "2012",
doi = "10.1007/978-3-642-35341-3_19",
language = "English",
isbn = "9783642353406",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "226--237",
booktitle = "Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings",

}

TY - GEN

T1 - Enhancing music information retrieval by incorporating image-based local features

AU - Kaliciak, Leszek

AU - Horsburgh, Ben

AU - Song, Dawei

AU - Wiratunga, Nirmalie

AU - Pan, Jeff

PY - 2012

Y1 - 2012

N2 - This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.

AB - This paper presents a novel approach to music genre classification. Having represented music tracks in the form of two dimensional images, we apply the "bag of visual words" method from visual IR in order to classify the songs into 19 genres. By switching to visual domain, we can abstract from musical concepts such as melody, timbre and rhythm. We obtained classification accuracy of 46% (with 5% theoretical baseline for random classification) which is comparable with existing state-of-the-art approaches. Moreover, the novel features characterize different properties of the signal than standard methods. Therefore, the combination of them should further improve the performance of existing techniques. The motivation behind this work was the hypothesis, that 2D images of music tracs (spectrograms) perceived as similar would correspond to the same music genres. Conversely, it is possible to treat real life images as spectrograms and utilize music-based features to represent these images in a vector form. This points to an interesting interchangeability between visual and music information retrieval.

KW - Co-occurrence matrix

KW - Colour moments

KW - Fourier transform

KW - K-means algorithm

KW - Local features

UR - http://www.scopus.com/inward/record.url?scp=84871580503&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-35341-3_19

DO - 10.1007/978-3-642-35341-3_19

M3 - Conference contribution

AN - SCOPUS:84871580503

SN - 9783642353406

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 226

EP - 237

BT - Information Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings

ER -