This week was mostly dedicated to deploy a two-nodes OpenStack testing environment. I wanted to evaluate this famous cloud platform so many people are talking about. My first impressions were quite positive, unfortunately I am relatively disappointed with the network part of the IaaS platform due to a large number of open issues.
This article relate a number of issues the author experienced with the 'Grizzly' release of OpenStack. Since this article was written, a large number of fixes and improvements has been made to the project. Therefore, please take the opinions outlined in this article with care.
If you already know a little bit of OpenStack, you may be aware that it glues together of number of projects to create an open, flexible and at the same time, very complex Infrastructure as a Service (IaaS) platform. If you will ever deploy it, you will likely encounter some piece of software like RabbitMQ (or any other AMQP messaging server), a relational database like MySQL, network protocols like GRE and NAT, and also probably hypervisors like Xen or KVM.
OpenStack is very flexible, as it allow you to only run the services you need, where you need them, so you can get the most of your infrastructure in term of features, performance and scalability. You can run Apache, nginx or any other WSGI web server for your dashboard, as well as not running any dashboard at all if you're by example allergic to GUIs like me.
From a network engineer perspective, OpenStack is as much attractive, because the plugins based system of the Neutron project (previously known as Quantum), provides a wide variety of networking options and features. At this time of writing, the Neutron project already hosts no less than 16 plugins. Among these plugins, we can note the proprietary Cisco UCS/Nexus plugin, the Native Linux Bridge and the Open vSwitch plugin. The latter seem to be considered as the current standard for production installations, and do provide some GRE tunneling and basic VLAN support.
For the sake of evaluating OpenStack, I deployed both a Controller and a Compute node on a Supermicro 1U Twin server. Each nodes run a Ubuntu 12.04 (Precise Pangolin) Linux distribution on SSD based disks. Each nodes host 2x two-ports Intel Gigabit I350 Gigabit Network interface cards, thus providing four independent ethernet ports to the operating systems (or hypervisors). All ports are wired to a Cisco Catalyst 3750 series switch. The first two ports are part of an active/backup ethernet bond for the management network. The lab switch's trunk ports were configured to allow all VLANs, as it's a common practice nowadays for cloud deployment.
The Linux distribution installations went without a glitch. I quickly setup the LVM volume groups and the logical volumes to host the nodes' root partitions. The install of all OpenStack 'Grizzly' components using the Ubuntu Cloud Archive went as smooth.
The manual configuration and tuning of the OpenStack's services were not so trivial though. In fact, the official project wiki has a large collection of configuration guides, unfortunately some of them contains deprecated information. So if you ever want to follow them, be careful to understand what you enter in your config files and in your client's CLIs. Browsing some developer's notes may also prove quite useful, in particular to understand how each pieces fit together. I actually regret there's no high-level documentation that covers the most important platform's components, so system administrators and even network engineers can know in a matter of minutes, what is the purpose of each and how they closely work together. In the meantime, you can also figure that out by yourself and optionally by doing a few errors.
During my evaluation, I encountered no major issues at all, until I launched my first instance. The latter got an IP connectivity to the VLAN I provisioned through the Open vSwitch plugin, however I was unable to SSH into it. I rapidly figured out that the guest instance was unable to auto-provision itself using the metadata server. The metadata server allow the guests to download provisioning data from a REST API (e,g. public SSH keys, iptables rules and so on..). The Ubuntu cloud images use a special IP address to communicate with the metadata server. They don't share native connectivity mostly to allow for overlapping IP addresses between tenants. Network Address Translation (NAT) is used here by the compute nodes to bring IP connectivity between the metadata server and the local instances, which is unfortunately not always supported depending of the plugin's configuration used. In fact, I chose to use a direct L2 connectivity instead of the GRE tunneling overlay network, because I didn't want to circumvent myself out of the process. The VLAN tenant network type also fitted pretty well my requirements.
With such a configuration, the traffic over the OVS bridge was not NATed, despite the iptables rules were present. I tested various combinations of workaround rules using both DNAT and REDIRECT targets, without any success. It looks like there's actually an Open vSwitch limitation with virtual interfaces and NAT. My last entempt was to use the native Linux Bridge, but I faced another bug that prevented me to configure Quantum for a physical network. These two issues has been widely reported by OpenStack's users, and they will be probably fixed in a near future. In the meantime, I am left with no more options than removing Quantum from the evaluation environment, which's quite disappointing.
I also think the Quantum client's CLI is hard to get a grip on, even for a network engineer with a decade of experience in multi-vendor equipments. The terminology used is not really what we are used to in the industry, and it may also be a bit intimidating for a system administrator with limited knowledge of network technologies. I would be happy to ear what people think about it, and if they felt comfortable with it. A few months ago, I had the pleasure to evaluate VMware vSphere along with Cisco Nexus 1000v. I was quite impressed by the features of the VSM, and how Cisco envisioned the way the cloud must be administrated. I would really love to see Cisco or any other network vendor releasing a similar product as a replacement to the Quantum's OVS plugin. Don't take me wrong, I am not against the use of NAT, but it's usage sound like an hack to me. If would have rather preferred to see a native connectivity (which is already technically possible with some tweaks) between the guests and the metadata server(s), even if it brings new challenges to support overlapping addresses. The good news is that the development of all those plugins may drives some innovations over the years. I keep dreaming about a MPLS/VPN Provider Edge running at the hypervisor or compute node level.
I will probably continue my tests in the forthcoming days or weeks. OpenStack still a young project, and also a quite immature one, but it open plenty of opportunities for testing new protocols, APIs and services.