Automated Anomaly Detection in Cloud Infrastructure

    Cloud Computing | Highlights | AI | data | Infrastructure | Data Lake - Posted on 10/12/2020 by Arthur Vervaet (3DS OUTSCALE)

    In accordance with our desire to support the education and development of young talent, we regularly conduct a number of actions in schools. The visits of 3DS OUTSCALE employees to schools are a means of presenting career opportunities in the digital industry. Our awareness program destined to young women aims to inform on career opportunities in digital sector and thus promote employment diversity. In addition, our collaborative projects intend on offering students a qualitative practical framework for them to conduct research work.

    This is the case for the collaborative project between 3DS OUTSCALE and the ISEP engineering school, led by Arthur Vervaet, who unveils in this article his engaging research work subject and the benefits shared by the industry and the higher education institution.

     

    Data Lake-2

    Historically, a logbook refers to a set of registers used by ship crews to chronologically record a variety of events, such as a change of course or the loading of goods.

    These registers were then used by inspectors to trace operations in search of potential frauds or errors.

     

    Ubiquitous Traceability in IT

    Event logging also exists in the IT world, with timestamps being produced each time a significant action occurs. The thereby generated registers, called log files, make up a valuable source of information for system analysis and monitoring. Controlling these files helps retrace the different stages of a process in search of an anomaly that could potentially be at the root of a system failure, an application crash or any other event impacting quality of service, and thus improve performance and customer experience.

     

    Processing Hundreds of Thousands of Events per Second: Mission (Almost) Impossible

    In the context of Cloud computing, processing log files can be challenging. First of all, because of the substantial volumetry, represented by the generation of hundreds of thousands of log lines each second. Second of all, because of the disparity of messages, each source having its own log structure and the generated message types being likely to vary in time.

     

    A Vast Technical and Technological Challenge

    To address these issues that make solely manual control impossible, administrators look for predefined patterns corresponding to known abnormal behaviors. However, this requires prior knowledge of the said errors, which is not possible in the case of new anomalies.

    Hence the importance of developing autonomous systems capable of detecting anomalies efficiently (accuracy), in near-real time (ms order), whilst maintaining performance over time.  This last criterion is particularly crucial in guaranteeing the autonomy of the solution and its suitability in the long run. Such systems must also be able to identify the context and type of the anomaly with great accuracy in order to notify the appropriate team for action.

    As part of my dissertation, I am working on the detection of anomalies in Cloud infrastructure based on logs. My aim is to take part in the conception of an autonomous system, particularly to ensure its capacity to adapt to the high volumetry and variability constraints of 3DS OUTSCALE’s logs.

     

    Deep Learning and Neural Networks at the Service of Anomaly Detection

    To achieve this, I am taking a close look at deep learning. The use of convolutional neural networks has paved the way for facial recognition and the development of autonomous cars. LSTM (long short-term memory) neural networks are fundamental in the rise of spell checkers that rely on the context of a word or a sentence.

    Deep learning for anomaly detection has shown its first results with the use of LSTM network stacks. However, the corresponding works are relatively recent and subject to improvement, the pre-processing of logs being, in my opinion, still underutilized, as is the analysis of an anomaly’s context.

     

    A Winning Collaboration for the Industry and the Scientific World

    The practical framework provided by 3DS OUTSCALE is ideal for conducting research work. Access to large volumes of data and to substantial computing power is key in deep learning research. This also means being in regular contact with domain experts and making the most of their knowledge. The work will be carried out as part of a collaboration between 3DS OUTSCALE and the LISITE-ISEP research laboratory. The latter had already made anomaly detection one of its main concerns and will be able to leverage the partnership to develop its research activity.

     

    Author: Arthur Vervaet (3DS OUTSCALE)

    Arthur Vervaet has been a Big Data doctoral student at 3DS OUTSCALE from the beginning of 2020. In concert with the LISITE-ISEP research laboratory, he conducts research work on the automated detection of anomalies in massive data flows. His position enables him to serve as a bridge between the world of research and that of business.

    Comments