TY - JOUR
T1 - A topological data analysis based classification method for multiple measurements
AU - Riihimäki, Henri
AU - Chachólski, Wojciech
AU - Theorell, Jakob
AU - Hillert, Jan
AU - Ramanujam, Ryan
N1 - HR was partly supported by a collaboration agreement between the University of Aberdeen and EPFL. WC was partially supported by VR 2014-04770 and Wallenberg AI, Autonomous System and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation, Göran Gustafsson Stiftelse. JT is fully funded by the Wenner-Gren Foundation. JH is partially supported by VR K825930053. RR is partially supported by MultipleMS. The collaboration agreement between EPFL and University of Aberdeen played a role in the design of the neuron spiking analysis and in providing the data required, i.e. the neuronal network and the spiking activity. Open access funding provided by Karolinska Institute.
PY - 2020/7/29
Y1 - 2020/7/29
N2 - Background: Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. A machine learning model with cross-validation is then applied for classification. When test this on three case studies, accuracy exceeds an alternative support vector machine (SVM) voting model in most situations tested, with additional benefits such as reporting data subsets with high purity along with feature values. Results: For 100 examples of 3 different tree species, the model reached 80% classification accuracy after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. The alternative SVM classifier achieved a maximum accuracy of 68.7%. Using data from 100 examples from each class of 6 different random point processes, the classifier achieved 96.8% accuracy, vastly outperforming the SVM. Using two outcomes in neuron spiking data, the TDA classifier was similarly accurate to the SVM in one case (both converged to 97.8% accuracy), but was outperformed in the other (relative accuracies 79.8% and 92.2%, respectively). Conclusions: This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
AB - Background: Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. A machine learning model with cross-validation is then applied for classification. When test this on three case studies, accuracy exceeds an alternative support vector machine (SVM) voting model in most situations tested, with additional benefits such as reporting data subsets with high purity along with feature values. Results: For 100 examples of 3 different tree species, the model reached 80% classification accuracy after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. The alternative SVM classifier achieved a maximum accuracy of 68.7%. Using data from 100 examples from each class of 6 different random point processes, the classifier achieved 96.8% accuracy, vastly outperforming the SVM. Using two outcomes in neuron spiking data, the TDA classifier was similarly accurate to the SVM in one case (both converged to 97.8% accuracy), but was outperformed in the other (relative accuracies 79.8% and 92.2%, respectively). Conclusions: This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
KW - Topological data analysis
KW - machine learning
KW - multiple measurement analysis
KW - Machine learning
KW - Multiple measurement analysis
KW - Trees/anatomy & histology
KW - Humans
KW - Rats
KW - Support Vector Machine
KW - Machine Learning
KW - Algorithms
KW - Animals
KW - Lasers
KW - Computer Simulation
KW - Data Analysis
UR - http://www.scopus.com/inward/record.url?scp=85088852643&partnerID=8YFLogxK
U2 - 10.1186/s12859-020-03659-3
DO - 10.1186/s12859-020-03659-3
M3 - Article
C2 - 32727348
VL - 21
JO - BioMed Central Bioinformatics
JF - BioMed Central Bioinformatics
SN - 1471-2105
M1 - 336
ER -