{"id":3512,"date":"2022-07-28T10:45:19","date_gmt":"2022-07-28T08:45:19","guid":{"rendered":"https:\/\/blog.outscale.com\/?p=3512"},"modified":"2022-08-11T14:56:13","modified_gmt":"2022-08-11T12:56:13","slug":"improving-virtualized-storage-performance","status":"publish","type":"post","link":"https:\/\/blog.outscale.com\/en\/improving-virtualized-storage-performance\/","title":{"rendered":"Improving Virtualized Storage Performance at the Scale of a Datacenter"},"content":{"rendered":"<p><strong>At the end of 2021, I started a collaborative PhD, based on a partnership between 3DS OUTSCALE and\u00a0 the \u00c9cole Normale Sup\u00e9rieure de Lyon (ENS de Lyon), represented by my advisor Alain TCHANA, who was awarded a <a href=\"http:\/\/www.ens-lyon.fr\/actualite\/recherche\/alain-tchana-recoit-le-prix-de-la-francophonie-pour-jeunes-chercheurs\" target=\"_blank\" rel=\"noopener\">prize dedicated to young French-speaking researchers in 2022<\/a> and head of the Computer Science Department at the ENS de Lyon. In this article, I would like to give an introduction to\u00a0 my thesis topic on storage virtualization.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p><strong>What is Computer Storage?<\/strong><\/p>\n<p><span style=\"font-weight: 300;\">There are mainly two types of memory in computers: <\/span><i><span style=\"font-weight: 300;\">volatile<\/span><\/i><span style=\"font-weight: 300;\"> and <\/span><i><span style=\"font-weight: 300;\">persistent<\/span><\/i><span style=\"font-weight: 300;\"> memory. Volatile memory (or just <\/span><i><span style=\"font-weight: 300;\">memory<\/span><\/i><span style=\"font-weight: 300;\">) is extremely fast; thus, it stores temporary data while running a program: the program itself, the manipulated data or some metadata. Persistent memory (or <\/span><i><span style=\"font-weight: 300;\">storage<\/span><\/i><span style=\"font-weight: 300;\">)<\/span> <span style=\"font-weight: 300;\">is slower, but it is recoverable after your computer is stopped. Files stored in persistent memory include both user files (documents, pictures, etc.) and system files, for example the operating system and all installed programs.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">In the case of this PhD thesis, we focus on the last one, persistent storage. Let\u2019s see why.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>How Do We Know if a System Needs Improvement?<\/strong><\/p>\n<p><span style=\"font-weight: 300;\">Before even talking about improving a system (here, storage at a datacenter scale), we need to make sure that the system requires improvement. This can be done by running a <\/span><i><span style=\"font-weight: 300;\">benchmark<\/span><\/i><span style=\"font-weight: 300;\"> (or multiple ones), a program that will put the studied system under stress , and then measure how well the system behaved under that stress. Comparing the performance of different systems (different hardware, applications, operating systems, versions of the same application, etc.) within one benchmark allows us to determine which one is the more performant.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">There are two kinds of benchmarks: <\/span><i><span style=\"font-weight: 300;\">micro-<\/span><\/i><span style=\"font-weight: 300;\"> and <\/span><i><span style=\"font-weight: 300;\">macro<\/span><\/i><span style=\"font-weight: 300;\">-benchmarks. The first category includes relatively basic programs that will focus on stressing a single component of the system; for instance, <\/span><a href=\"https:\/\/fio.readthedocs.io\/en\/latest\/fio_doc.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">fio<\/span><\/a><span style=\"font-weight: 300;\"> is a simple program that generates stress on the system\u2019s storage. The second category, macro-benchmarks, aims at testing the overall performance of the system and therefore tends to offer a more realistic picture. They generally consist of a real application that is run under a reproducible workload.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">Thus, macro-benchmarks can help make sure that the overall system works fine, and micro-benchmarks help point out the problems more precisely. One of the problems we tackle in this PhD research is the lack of realistic benchmarks for storage in datacenters. The only one to our knowledge is <\/span><a href=\"https:\/\/www.usenix.org\/system\/files\/fast21-merenstein.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">CNSBench<\/span><\/a><span style=\"font-weight: 300;\">, but it was designed for clients of a cloud who want to compare providers. Though, in order to improve our systems, we need to benchmark from the provider\u2019s point of view.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">Now let\u2019s dive into the core of the topic: storage performance in modern systems.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Storage is a Bottleneck in Modern Systems<\/strong><\/p>\n<p><span style=\"font-weight: 300;\">In recent years, tremendous progress has been made in computer systems performance. <\/span><a href=\"https:\/\/www.businessinsider.com\/infographic-how-computing-power-has-changed-over-time-2017-11?op=1&amp;r=US&amp;IR=T\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">This article <\/span><\/a><span style=\"font-weight: 300;\">provides a great insight into how computing power has evolved as of 2017. By looking at data provided by Intel about their Core processors, one can guess that the growth in computing power is not even close to stopping. For instance, in the graph below, we see that the most performant (thus expensive) processors launched by Intel in 2013 are slower than the less performant ones released recently (late 2021 \/ early 2022). Note that the data used for this graph only includes the frequency of a single core, and does not even take into account the fact that the number of cores in processors has also been increasing.<\/span><\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone wp-image-3513 aligncenter\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/Max-Frequency-GHz-of-Intel-processors-over-the-years-300x186.png\" alt=\"\" width=\"414\" height=\"257\" srcset=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/Max-Frequency-GHz-of-Intel-processors-over-the-years-300x186.png 300w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/Max-Frequency-GHz-of-Intel-processors-over-the-years-585x362.png 585w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/Max-Frequency-GHz-of-Intel-processors-over-the-years.png 657w\" sizes=\"(max-width: 414px) 100vw, 414px\" \/><\/p>\n<p><span style=\"font-weight: 300;\">Unfortunately, modern cloud systems are not able to fully benefit from these improvements. In fact, due to the overhead of virtualization, there is a slowdown when it comes to running a benchmark in the cloud compared to the same benchmark run on bare metal (which means on the computer without virtualization). In the following figure, we show the results of an experiment we made to get inspiration for our work. We ran different micro-benchmarks in 4 different environments: bare metal, a custom private cloud (VPC), a Microsoft Azure VM and an AWS VM. We express the results of the three latter in terms of slowdown compared to the bare metal experiment. The benchmarks are NPB (CPU intensive), Stream (memory intensive), Netperf (network intensive), dd and fio (both storage intensive, but more precisely throughput and latency intensive, respectively). For fairness, the Microsoft Azure and AWS virtual machines specifications were chosen to be as close as possible to our server\u2019s specifications.<\/span><\/p>\n<p><img decoding=\"async\" class=\" wp-image-3515 aligncenter\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/perfomance-slowdown-1-300x180.png\" alt=\"\" width=\"401\" height=\"241\" srcset=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/perfomance-slowdown-1-300x180.png 300w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/perfomance-slowdown-1-768x461.png 768w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/perfomance-slowdown-1-585x351.png 585w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/perfomance-slowdown-1.png 1000w\" sizes=\"(max-width: 401px) 100vw, 401px\" \/><\/p>\n<p><span style=\"font-weight: 300;\">The interpretation of the results is quite straightforward: in public clouds, the slowdown of CPU and memory intensive benchmarks is almost negligible, while storage intensive benchmarks are between 10 and 1000 times slower.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">Therefore, as soon as an application is <\/span><i><span style=\"font-weight: 300;\">storage intensive<\/span><\/i><span style=\"font-weight: 300;\">, its overall performance is limited by storage capabilities. From our perspective, it can be quite problematic when you realize that many applications you would want to run in the cloud are storage intensive: web servers (Apache, Nginx, \u2026), mail servers (Zimbra, Roundcube, \u2026), data analysis (MapReduce, Spark, Kafka, \u2026), databases (MySQL, MongoDB, RocksDB, Redis, \u2026), etc.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">Now let\u2019s see how storage is handled in modern Clouds?<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>The Storage Stack<\/strong><\/p>\n<p><img decoding=\"async\" class=\" wp-image-3517 aligncenter\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/storage_stack_public.drawio-300x220.png\" alt=\"\" width=\"371\" height=\"272\" srcset=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/storage_stack_public.drawio-300x220.png 300w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/storage_stack_public.drawio-585x430.png 585w, https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/storage_stack_public.drawio.png 641w\" sizes=\"(max-width: 371px) 100vw, 371px\" \/><\/p>\n<p><span style=\"font-weight: 300;\">The storage stack is defined as the set of elements that take action one after the other when processing an I\/O request (I\/O stands for storage Input\/Output). In the case of a bare metal computer, the storage is basically composed of a program that requests the I\/O operation, the operating system, the filesystem, and the hardware device. In the cloud, the structure is much more complex, as you can see in the figure above. Requests issued by the <\/span><i><span style=\"font-weight: 300;\">guest OS<\/span><\/i><span style=\"font-weight: 300;\"> (the OS of the VM) are intercepted by the hypervisor (the program responsible for the virtual machines management), then are treated by the <\/span><i><span style=\"font-weight: 300;\">host OS<\/span><\/i><span style=\"font-weight: 300;\"> (the OS of the server on which the VMs are running), and then are sent to the host filesystem to be written on the drive. However, the storage devices are located in a different server, which means that the request must go through an extra step, the network, before finally reaching the storage server (in our case, <\/span><a href=\"https:\/\/www.netapp.com\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">NetApp<\/span><\/a><span style=\"font-weight: 300;\"> storage array) and the drive on which it will be written.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">On the other hand, when a VM uses its CPU, it actually uses a virtual CPU (vCPU) provided by the hypervisor. The hypervisor can emulate a CPU, even one that is different from the actual CPU of the host OS. In that case, the emulation adds an extra overhead. If we are looking for more efficiency, the vCPU can also be directly <\/span><a href=\"https:\/\/qemu.readthedocs.io\/en\/latest\/system\/qemu-cpu-models.html#two-ways-to-configure-cpu-models-with-qemu-kvm\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">mapped to the actual CPU<\/span><\/a><span style=\"font-weight: 300;\">, without emulation or extra steps. This is what we call <\/span><i><span style=\"font-weight: 300;\">passthrough<\/span><\/i><span style=\"font-weight: 300;\">, because it acts as if the CPU were directly used by the VM. In that case, the vCPU has almost the same speed as the actual CPU.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">However, for storage, passthrough is often not desirable. First, you might not want the VM to have access to anywhere in the storage, so it requires extra isolation. Second, virtual disks are often stored in files using a special format: such as QCOW2 for Qemu. This format allows lightweight snapshotting, which improves efficiency and eases storage management. But it also has drawbacks, and one of them is the need for remapping the addresses from the guest OS to addresses in the QCOW2 file. Thus, Qemu\u2019s intervention is necessary to operate the translation. Furthermore, the storage is distant (not on the same server) and shared, so storage passthrough is in fact not even possible in this type of datacenter.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">This difference mostly explains why the storage virtualization is so much harder and less efficient than the CPU or memory.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Then, How do We Improve Storage Performance?<\/strong><\/p>\n<p><span style=\"font-weight: 300;\">As you can guess, the high complexity of the storage stack in a cloud provider\u2019s architecture leaves some space for improvement. We can choose to focus on any part of this stack, for instance, the hypervisor, or the storage server. We can also try to be more disruptive and propose changes that could be applied to the whole stack, while imagining new architectures. For instance, a new datacenter architecture is becoming popular: the <\/span><a href=\"https:\/\/www.gflesch.com\/elevity-it-blog\/hci-hyperconverged-infrastructure\" target=\"_blank\" rel=\"noopener\"><i><span style=\"font-weight: 300;\">hyper-converged<\/span><\/i><\/a><span style=\"font-weight: 300;\"> architecture, in which the storage is placed closer to the processor. Still, the reason why hyper-converged architectures are gaining popularity is its maintenance which is easier, although there is still no evidence of its positive or negative impact on the overall system&#8217;s performance.<\/span><\/p>\n<p><span style=\"font-weight: 300;\">In the specific context of my PhD, I will be working on both the current architecture and existing alternatives. The three main components of my PhD would be:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 300;\" aria-level=\"1\"><span style=\"font-weight: 300;\">Create a benchmarking system to measure the performance of a cloud storage infrastructure. This should allow us to shed light on the different bottlenecks that can exist in the whole storage stack.<\/span><\/li>\n<li style=\"font-weight: 300;\" aria-level=\"1\"><span style=\"font-weight: 300;\">Focus on the problems we have detected in Qemu. We have already noticed that Qemu, and the QCOW2 format (which is used to store the drives of the VMs), are not made to be performant at the scale of a cloud infrastructure.\u00a0<\/span><\/li>\n<li style=\"font-weight: 300;\" aria-level=\"1\"><span style=\"font-weight: 300;\">Explore and compare different storage stack architectures, to see if one performs better than the others.\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 300;\">These three parts are far from being isolated from each other: the benchmarking system will be useful for detecting issues in Qemu, and vice versa searching for issues in Qemu will give us ideas for improving the benchmark. The third part will require a deep understanding of existing storage types of architecture, and the first two parts of the PhD will help build expertise.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>The Final Word<\/strong><\/p>\n<p><span style=\"font-weight: 300;\">In conclusion, we noticed through our experiments that storage is a bottleneck in many applications running in the cloud. The goal of this project is to find how to optimize storage performance in a cloud environment, and to provide the tools required to carry out this analysis beforehand.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 300;\">I presented this project\u00a0 at the <\/span><a href=\"https:\/\/sites.google.com\/view\/eurodw22\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 300;\">Eurosys Doctoral Workshop<\/span><\/a><span style=\"font-weight: 300;\">, where I discussed it with other PhD students from all over the world and their supervisors, as well as other experts on the topic. You can find the <a href=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/EURODW22-Paper12-Elevator-Pitch-Slides.pdf\" target=\"_blank\" rel=\"noopener\">slides<\/a> of the presentation here attached <\/span><span style=\"font-weight: 300;\">and the video presentation <a href=\"https:\/\/outscale.tv\/these-la-virtualisation-du-stockage-cloud-days\/\" target=\"_blank\" rel=\"noopener\">in French<\/a> and in English (<\/span><span style=\"font-weight: 300;\">below).<\/span><\/p>\n<div style=\"width: 840px;\" class=\"wp-video\"><!--[if lt IE 9]><script>document.createElement('video');<\/script><![endif]-->\n<video class=\"wp-video-shortcode\" id=\"video-3512-1\" width=\"840\" height=\"480\" poster=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/EURODW22-Presentation-Storage-Virtualization-PhD.png\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/EURODW22-Paper12-Elevator-Pitch-Video.m4v?_=1\" \/><a href=\"https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/EURODW22-Paper12-Elevator-Pitch-Video.m4v\">https:\/\/blog.outscale.com\/wp-content\/uploads\/2022\/07\/EURODW22-Paper12-Elevator-Pitch-Video.m4v<\/a><\/video><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At the end of 2021, I started a collaborative PhD, based on a partnership between 3DS&hellip;<\/p>\n","protected":false},"author":23,"featured_media":3535,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[],"tags":[],"class_list":["post-3512","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/comments?post=3512"}],"version-history":[{"count":8,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3512\/revisions"}],"predecessor-version":[{"id":3543,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/posts\/3512\/revisions\/3543"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/media\/3535"}],"wp:attachment":[{"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/media?parent=3512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/categories?post=3512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.outscale.com\/en\/wp-json\/wp\/v2\/tags?post=3512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}