LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things

Jin Wang, Yangning Tang, Shiming He*, Changqing Zhao, Pradip Kumar Sharma, Osama Alfarraj, Amr Tolba

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

65 Citations (Scopus)
8 Downloads (Pure)

Abstract

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.

Original languageEnglish
Article number2451
Number of pages19
JournalSensors (Switzerland)
Volume20
Issue number9
Early online date26 Apr 2020
DOIs
Publication statusPublished - May 2020

Bibliographical note

Funding: This work was funded by the National Natural Science Foundation of China (Nos. 61802030), the Research Foundation of Education Bureau of Hunan Province, China (No. 19B005), and the International Cooperative Project for “Double First-Class”, CSUST (No. 2018IC24), the open research fund of Key Lab
of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education (No. JZNY201905), the Open Research Fund of the Hunan Provincial Key Laboratory of Network Investigational Technology (No. 2018WLZC003). This work was funded by the Researchers Supporting Project No. (RSP-2019/102) King Saud University, Riyadh, Saudi Arabia. Acknowledgments: We thank Researchers Supporting Project No. (RSP-2019/102) King Saud University, Riyadh,
Saudi Arabia, for funding this research. We thank Francesco Cauteruccio for proofreading this paper.

Keywords

  • Device management
  • IoT
  • Log anomaly detection
  • Log event
  • Log template
  • Word2vec
  • log anomaly detection
  • log template
  • log event
  • word2vec
  • device management

Fingerprint

Dive into the research topics of 'LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things'. Together they form a unique fingerprint.

Cite this