ACROFAN

[OSS Austin 2016] Walmart Application Case Announcement

Published : Friday, April 29, 2016, 11:17 pm
ACROFAN=Yong-Man Kwon | yongman.kwon@acrofan.com | SNS
OpenStack foundation holds ‘OpenStack Summit Austin 2016’ from April 25 to 29 at Austin Convention Center in Austin, Texas. In this event, recent OpenStack version ‘Mitaka’ and various cases of application related to OpenStack are introduced.

More than 7,500 cloud computing developers and users from more than 60 countries are going to attend. At the keynote speech, it covers the connection of the next generation cloud-native application and the need for managing traditional business application as well as the use of OpenStack as an ‘integrated engine’ for a new cloud technology. Also, it highlights four main markets such as private cloud of enterprise, public cloud supplier, communication and NFV, and university research field as well as government.

In this session, OpenStack Neutron application case of Walmart is introduced. Although in the situation that L2 networks are separated by racks, BGP-EVPN and VXLAN tunneling are used with ToR switch so that the network is extended to the majority of racks. It enabled to make 300 nodes and more than 30,000 VM configurations in a simple OpenStack environment, and implement native bridge configuration of bear metal host at single provider’s network.

 
▲ This case is a unique deployment case for meeting fairly complicated security requirements.

The problem, which is generally faced when the scale becomes as big as Walmart, is that composing a new infra takes too long time. After completing from find solution to facility and security examination, it normally takes 6-12 months due to excessively complicated and wasteful processes. The existing model is traditional core, distribute, access model that passed nearly fifteen years. In addition, it has a large spanning tree domain and requires a lot of workforce at management, etc.

As requirements for new environment, reducing problems about humans, supporting virtualization and bear metal for automation, removing spanning tree structure, PCI for security in the connection of East-West, standard control plain for improving recovery time, resolving cost aspects and vendor lock-in, and supporting jumbo frame at network for solving technological problems are mentioned. Moreover, as reasons for selecting CLOS, favorable structure for extension, high bandwidth with 320Gb per cabinet, less restrictions in vLAN, and effective in multipass are introduced.

In high-level topology, the underlay part used OSPF and fairly high level of automation was made. In overlay, MP-VGP was used for control plain of VXLAN. Fabric has 12,288 computes and maximum 12 OpenStack fault domains. L2 and L3 traffic are fundamentally blocked in rack units, and they are the configurations that considered security aspects. At that time, L2 traffic is blocked at the top of rack, and L3 is moved outside of VRF or fabric by following inner boundary of SVI.

 
▲ Network hierarchical in a single hardware is consisted of three classes depending on security, etc.

In security and PCI requirements, deploying multiple security classes on same physical infra in the similar form of public cloud for strengthening flexibility aspect of infra, fitting the security policy to application lifecycle process, and integration of tools related to appreciation and logging are mentioned. In network segmented, by using VRF as a security boundary, it is divided into three classes for application, detail controls, etc.; a class for physical and virtual firewall, a class for security groups of IaaS layer, and hosts, agents-based approach control class.

Depending on traffic types, the network hierarchical is divided into management, inside, and web class application traffic, and they play different roles according to the security policy even in the same hardware. Neutron is applied to provider or tenant network so that it can be used in connecting their networks or VLAN with suitable security layer. These options enabled precise approach control which is similar to the physical firewall, and hosts, agents-based control is used in achieving several security policies apart from approach control.

As challenging tasks, three tasks are pointed out and they are all brought up from management aspects. The first one is the method of managing general environment including the environment where IPs are rapidly changing. Also, in aspect of traffic visibility, the aspect of centralized tool, which includes both application and security class, is pointed out. The last one is the aspect of policy appreciation about general infra. As expecting technologies in the future, FWaaS(Firewall as a service), TaaS(Tap as a service), IDS/IPS and VNF for service chaining are mentioned.

 
▲ The composition of BGP-EVPN applied provider networking

There are several reasons that Walmart chose BGP-EVPN and VXLAN. The first one is standard-based protocol that can select various vendors. The points that it has a characteristic of control plain where network flooding is low and it has hardware acceleration and high extensibility are also introduced. Through Neutron provider and external network, spanning among racks by L3 CLOS is possible.

Moreover, the instance passes port tab and VLAN sub instance through Linux bridge to access network interface inside the compute node. The reason of using Linux bridge in Neutron provider network is already tested. That VXLAN traffic is high in performance by being off-loaded at ToR switch, that data plain code in compute node can be minimized, and that distribution routing is possible at ToR, L3 agent not required are introduced.

At the test results, we could see a positive image. The performance of L2 VXLAN was almost similar to that of the original L2. Only the delay time was increased 10% due to the increase of network hop. We could see so outstanding performance that there was almost no difference between the performance of L3 VXLAN and that of L2 VXLAN. The performance of Inter-VRF was also not much different from L3 VXLAN, but the delay time is increased 20% due to the increase of network hop. Apart from this, it is added that AppMix and 50K table size test is not influenced the performance.

 
▲ The actual test diagram is presented at Walmart technology.


Copyright © acrofan All Right Reserved


    Acrofan     |     Contact Us : guide@acrofan.com     |     Contents API : RSS

Copyright © Acrofan All Right Reserved