{"id":3621,"date":"2022-08-30T13:56:08","date_gmt":"2022-08-30T11:56:08","guid":{"rendered":"https:\/\/blog.outscale.com\/utsep-a-log-parsing-algorithm-for-cloud-infrastructures\/"},"modified":"2023-01-04T18:23:46","modified_gmt":"2023-01-04T16:23:46","slug":"utsep-a-log-parsing-algorithm-for-cloud-infrastructures","status":"publish","type":"post","link":"https:\/\/blog.outscale.com\/en\/utsep-a-log-parsing-algorithm-for-cloud-infrastructures\/","title":{"rendered":"USTEP: A Log Parsing Algorithm for Cloud Infrastructures"},"content":{"rendered":"<p><strong>In this blog post, we will talk about automatic log parsing and its relevance in anomaly detection. This project is the result of a fruitful collaboration between LISITE-ISEP research laboratory<\/strong>\u00a0<strong>and 3DS OUTSCALE. Lately, I had a chance to present my work at\u00a0the last edition of <\/strong><a href=\"https:\/\/icdm2021.auckland.ac.nz\/\" target=\"_blank\" rel=\"noopener\"><em><strong>IEEE International Conference of Data Mining.<\/strong><\/em><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>As we have <a href=\"https:\/\/blog.outscale.com\/en\/automated-anomaly-detection-in-cloud-infrastructure\/\" target=\"_blank\" rel=\"noopener\">previously<\/a> seen, logs are triggered by computer systems in order to save an event\u2019s history. Thus, event information, such as the executed tasks, the system involved, the severity level, or a part of the system state, are timestamped and archived. For example, the following log line was triggered by the serviceManager regarding a new process:<\/p>\n<pre style=\"font-size: 0.8em; line-height: 1em;\"><span style=\"font-weight: 300;\">2020-03-19T15:38:55,977 - serviceManager - INFO - New process started: process x92 started on port 42<\/span><\/pre>\n<p>Traditionally, logs were used to detect computer failures or to fix development bugs. In recent years, due to the growth of <em>Big Data<\/em>, logs show the potential for application latency monitoring, security audit reinforcement, or customer journey understanding for <em>Business Intelligence<\/em>.<\/p>\n<p>In our context, <em>Big Data<\/em> enables efficient <em>log<\/em> analysis because of a significant increase of logs produced\/stored in our infrastructure (75 % growth in 2018, 50 % in 2019). This is caused both by the natural growth of infrastructure and by the interest sparked by the results. Final -end users encourage higher log production when they grasp how useful the analysis is.<\/p>\n<p>To be ready for real-time analysis, logs require to be structured in order to identify comparable elements. In the example above, we observe some easily recognizable and well-structured fields such as the date (<span style=\"font-weight: 300; font-size: 0.8em;\"><code>2020-03-19T15:38:55,977<\/code><\/span>), the application (<span style=\"font-weight: 300; font-size: 0.8em;\"><code>serviceManager<\/code><\/span>) and the severity (<span style=\"font-weight: 300; font-size: 0.8em;\"><code>INFO<\/code><\/span>). However, we have a free message (<span style=\"font-weight: 300; font-size: 0.8em;\"><code>New process started: process x92 started on port 42<\/code><\/span>)\u00a0generated in the code that could change with future software versions.<\/p>\n<p>Let\u2019s see other examples of logs coming from our partners solutions:<\/p>\n<p><span style=\"font-weight: 300;\">NetApp<sup><a href=\"#netapp\">1<\/a><\/sup> <\/span><span style=\"font-weight: 300;\">:<\/span><\/p>\n<pre style=\"font-size: 0.8em; line-height: 1em;\"><span style=\"font-weight: 300;\">monitor.globalStatus.ok: The system's global status is normal.<\/span>\r\n\r\n<span style=\"font-weight: 300;\">vifmgr.portup: A link up event was received on node node1, port e0c.<\/span><\/pre>\n<p><span style=\"font-weight: 300;\">Cisco<sup><a href=\"#cisco\">2<\/a><\/sup><\/span><span style=\"font-weight: 300;\"> :<\/span><\/p>\n<pre style=\"font-size: 0.8em; line-height: 1em;\"><span style=\"font-weight: 300;\">%LINK-3-UPDOWN: Interface Port-channel1, changed state to up<\/span>\r\n\r\n<span style=\"font-weight: 300;\">%LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down<\/span>\r\n\r\n<span style=\"font-weight: 300;\">%SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)<\/span><\/pre>\n<p><span style=\"font-weight: 300;\">First, we observe that messages format depends on the solution. Second, messages coming from the same solution may match different templates (i.e., an expression such as <\/span><span style=\"font-weight: 300; font-size: 0.8em;\"><code>New process started process * started on port *<\/code><\/span><span style=\"font-weight: 300;\">, showcased earlier). Finally, variables can be numbers or words, given that a word in one template can be a variable in another.<\/span><\/p>\n<p>In OUTSCALE infrastructure, log production is provided by the different systems of the stack: our cloud operating system TINA OS, different middlewares, various physical equipment, etc. The structure of log messages is therefore beyond our control. Furthermore, we don\u2019t restrict such a rich source of information with formatting constraints that would hinder the generation. We therefore need a parsing algorithm that is robust enough to handle the behaviors of different systems and their evolution.<\/p>\n<p><span style=\"font-weight: 300;\">Log parsing deals with real time identification of the template and variables for each incoming log with a high volumetry log coming from multiple systems. Although this topic\u00a0 has already been studied, state-of-the-art solutions do not meet our requirements in terms of accuracy, robustness, latency, and scalability.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>USTEP: An Evolving Search Tree for Parsing Logs<\/b><\/p>\n<p><span style=\"font-weight: 300;\">To meet these accuracy and robustness requirements, we have developed USTEP, a <\/span><i><span style=\"font-weight: 300;\">log<\/span><\/i><span style=\"font-weight: 300;\"> parsing algorithm based on an evolving tree structure. Tree leaves store templates, and logs descend the tree to find the most suitable leaf. USTEP then selects a template that best represents the processed message. If none fits, a new template is created from current logs. In this process, two aspects are key: the tree descent rules and the evolution of the tree\u2019s structure.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">For the descent, USTEP first exploits the assumption that logs with the same template have the same number of words. The rest of the search process is determined by rules discovered by the system regarding important positions for template distinction. Our experimental evaluation supports the relevance of our assumptions, with USTEP being more accurate and robust than state-of-the-art algorithms, with an average accuracy of 93% compared to 90% for Drain<sup><a href=\"#drain\">3<\/a><\/sup><\/span><span style=\"font-weight: 300;\">, a solution provided by researchers at the Chinese University of Hong Kong in 2017 and currently considered as a reference.<\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"font-weight: 300;\"><img fetchpriority=\"high\" decoding=\"async\" class=\" wp-image-2645 alignright\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot-300x200.png\" alt=\"\" width=\"327\" height=\"219\" srcset=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot-300x200.png 300w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot-768x512.png 768w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot-585x390.png 585w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot-263x175.png 263w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/parser_boxplot.png 900w\" sizes=\"(max-width: 327px) 100vw, 327px\" \/><\/span><\/p>\n<p><span style=\"font-weight: 300;\">Our evaluation was conducted on 13 free-access data sets coming from different systems (Android, HDFS, OpenStack, etc.). State-of-the-art algorithms have a high variability depending on the data set. This variability is sensitive in a situation like ours where different <\/span><i><span style=\"font-weight: 300;\">log<\/span><\/i><span style=\"font-weight: 300;\"> systems coexist and evolve independently of our control. In the figure on the right, we present the scatter of 5 algorithms based on the studied data sets. The best possible scenario occurs when the box is of shorter length (i.e., little impacted by the data set\u2019s characteristics) and at a high position (best average accuracy). Based on the results, USTEP appears to be the most robust algorithm and the least affected by the nature of the data sets. To allow for a better reproduction of our work and pass it on to the community, the source code of UTSEP is available on <\/span><a href=\"https:\/\/github.com\/outscale-dev\/ustep-online-log-parser\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">GitHub<\/span><\/a><span style=\"font-weight: 300;\">.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>A Distributed Version for Scaling<\/b><\/p>\n<p><span style=\"font-weight: 300;\">Processing time is particularly important when there is a large volume to process in a short time, as in our case. At best, USTEP and Drain need 5 to 6 hours to parse 30 minutes of logs from our cloud infrastructure, making them impossible to use at 3DS OUTSCALE. This limitation inspired us to propose a distributed version of our work.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">USTEP-UP is a <\/span><i><span style=\"font-weight: 300;\">framework<\/span><\/i><span style=\"font-weight: 300;\"> that can run several USTEP instances simultaneously. USTEP-UP uses a <\/span><i><span style=\"font-weight: 300;\">load balancer<\/span><\/i><span style=\"font-weight: 300;\"> to distribute the workload between instances and a <\/span><i><span style=\"font-weight: 300;\">knowledge manager<\/span><\/i><span style=\"font-weight: 300;\"> to homogenize the trees of the instances. These two components prevent interactions between instances, making it possible to <\/span><i><span style=\"font-weight: 300;\">scale up<\/span><\/i><span style=\"font-weight: 300;\"> by adding new instances. In the case of a decrease of the number of instances (<\/span><i><span style=\"font-weight: 300;\">scale-down<\/span><\/i><span style=\"font-weight: 300;\">), a tree-merging method is available.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><b>Log Parsing at the Service of Anomaly Detection<\/b><\/p>\n<p><span style=\"font-weight: 300;\">Parsing is key for <\/span><i><span style=\"font-weight: 300;\">log<\/span><\/i><span style=\"font-weight: 300;\">-based applications such as search and indexing tools (<\/span><i><span style=\"font-weight: 300;\">ElasticSearch<\/span><\/i><span style=\"font-weight: 300;\">). Our research focuses on anomaly detection where only one log (or sequence) can indicate that the system is malfunctioning or that there is a software bug or a security threat.<\/span><\/p>\n<p><i><span style=\"font-weight: 300;\">DeepLog<\/span><\/i><span style=\"font-weight: 300;\"> <sup><a href=\"#deeplog\">4 <\/a><\/sup><\/span><span style=\"font-weight: 300;\">is a <\/span><i><span style=\"font-weight: 300;\">Deep Learning<\/span><\/i><span style=\"font-weight: 300;\"> algorithm based on <\/span><i><span style=\"font-weight: 300;\">Long Short-Term Memory<\/span><\/i><span style=\"font-weight: 300;\"> (LSTM) networks introduced by researchers at Utah University in 2017. Those neural networks are convenient for logs because they process log sequences like text. In their case, <\/span><i><span style=\"font-weight: 300;\">DeepLog<\/span><\/i><span style=\"font-weight: 300;\"> offers a mechanism that determines abnormal sequences both in templates and variables.\u00a0<\/span><\/p>\n<p><img decoding=\"async\" class=\" wp-image-2647 alignright\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-300x150.png\" alt=\"\" width=\"340\" height=\"170\" srcset=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-300x150.png 300w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-1024x512.png 1024w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-768x384.png 768w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-1536x768.png 1536w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-2048x1024.png 2048w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-1920x960.png 1920w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-1170x585.png 1170w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/02\/deeplog_boxplot-585x293.png 585w\" sizes=\"(max-width: 340px) 100vw, 340px\" \/><\/p>\n<p><span style=\"font-weight: 300;\">We have used <\/span><i><span style=\"font-weight: 300;\">DeepLog<\/span><\/i><span style=\"font-weight: 300;\">, frequently quoted in state-of-the-art research, to study the impact of parsing on anomaly detection. The figure on the right shows anomaly detection accuracy (AD accuracy) according to parsing accuracy (PA). We can observe a very negative impact on detection when parsing accuracy is under 80%.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">In conclusion, parsing methods are crucial to enhance anomaly detection. Methods like USTEP and USTEP-UP, which are scalable, highly accurate and robust to log system evolution, are the first step towards an anomaly detector based on<\/span><span style=\"font-weight: 300;\">\u00a0<a href=\"http:\/\/fr.outscale.com\/\" target=\"_blank\" rel=\"noopener\">3DS OUTSCALE<\/a>\u00a0logs.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">For further information, we recommend reading the research article by <\/span><i><span style=\"font-weight: 300;\">Arthur Vervaet, Raja Chiky and et Mar Callau-Zori:<\/span><\/i><i> <\/i><a href=\"https:\/\/ieeexplore.ieee.org\/document\/9679005\" target=\"_blank\" rel=\"noopener\"><i>USTEP: Unfixed Search Tree for Efficient Log Parsing. Proceedings of the 21st IEEE International Conference on Data Mining (ICDM&#8217;21).<\/i><\/a><\/p>\n<p>Also, watch Arthur&#8217;s<a href=\"https:\/\/outscale.tv\/recherche-scientifique-autour-de-la-structuration-automatique-de-logs-cloud-days\/\" target=\"_blank\" rel=\"noopener\">\u00a0vid\u00e9o<\/a> presentation to learn more!<\/p>\n<h5>References<\/h5>\n<ol>\n<li id=\"netapp\"><span style=\"font-weight: 300;\">NetApp log examples extracted from: https:\/\/docs.netapp.com\/ontap-9\/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-cmpr-930%2Fevent__log__show.html<\/span><\/li>\n<li id=\"cisco\"><span style=\"font-weight: 300;\">Cisco log examples extracted from: https:\/\/www.cisco.com\/c\/en\/us\/td\/docs\/switches\/lan\/catalyst3750x_3560x\/software\/release\/12-2_53_se\/system\/message\/3750x\/overview.html<\/span><\/li>\n<li id=\"drain\"><span style=\"font-weight: 300;\">He, J. Zhu, Z. Zheng, and M. R. Lyu, \u201cDrain: An online log parsing approach with fixed depth tree\u201d, in 2017 IEEE International Conference on Web Services\u00a0<\/span><\/li>\n<li id=\"deeplog\"><span style=\"font-weight: 300;\"> Du, F. Li, G. Zheng, and V. Srikumar, \u201cDeeplog: Anomaly detection and diagnosis from system logs through deep learning\u201d, in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security<\/span><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p><i><span style=\"font-weight: 400;\">Arthur Vervaet is a Big Data PhD student at 3DS OUTSCALE since early 2020.\u00a0<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 400;\">Raja Chiky is head of innovation and entrepreneurship at the Institut Sup\u00e9rieur d\u2019Electronique de Paris (ISEP).<\/span><\/i><\/p>\n<p><i><span style=\"font-weight: 300;\">Mar Callau-Zori has a PhD in computer science and is data lake officer at 3DS OUTSCALE.<\/span><\/i><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, we will talk about automatic log parsing and its relevance in anomaly&hellip;<\/p>\n","protected":false},"author":13,"featured_media":2625,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[],"tags":[],"class_list":["post-3621","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3621","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/comments?post=3621"}],"version-history":[{"count":3,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3621\/revisions"}],"predecessor-version":[{"id":3647,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3621\/revisions\/3647"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/media\/2625"}],"wp:attachment":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/media?parent=3621"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/categories?post=3621"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/tags?post=3621"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}