NSX Design Sessions – Lessons Learned – Two Separate Design Groups – Functional before Physical

This may not be for everyone, but if you’re facilitating NSX design sessions and you haven’t solved these situations by other means, it’s been very evident to me that having separate design sessions may be more effective than “let’s get everyone together” strategies.

I do a fairly large number of NSX designs a year, for a pretty wide client base and regardless of the industry, no two organizations are alike. The vast majority of clients have similar infrastructure, but often times very different functional and operational requirements, both in current and future design goals.

After experiencing this for a number of years and at times, allowing the client to drive who the stakeholders are and who would be contributing, I came to the realization that there really needs to be two key design stakeholders and two separate design groups. Each of the design sessions needs to be performed in order to facilitate the actual business outcomes and to ensure that design features and functional requirements are met.

The infrastructure teams of compute, networking and storage, need physical design meetings for connectivity, capacity and resiliency planning, while the application stakeholders need functional use-case design sessions to determine NSX feature use, options and functional requirements.

Performing functional before physical design is probably the single most important factor in determining the success of your client design and how long the design process will take.

It’s extremely important to have the client spell out any current and future functional use-cases. Each use-case should be documented, with dependencies and requirements, including the outcomes desired. Going through each scenario in detail with each dependency will likely add requirements that were unknown or undiscovered.

As to who should be in each group, the answer is simple, but against many organization’s natural instincts to include everyone in each. Requirements gathering needs to be done without interjection and competition between groups. The vast majority of enterprises have histories of clashes and competing agendas between infrastructure (compute, network, storage) and application owners, developers, and even business units in larger organizations. To avoid these unproductive scenarios, we remove the infrastructure “power players” from the Functional Design Group and then the inverse from the Physical Design Group.The Functional Design Group should include developer system architects, at least one hands-on technical developer, the application owner or owners, a security architect and security owner, and any business sponsors who are responsible for the intended business outcomes. To support the Functional Design Group, infrastructure should provide a technical engineer from compute, networking and storage, with senior level knowledge of current operations and architecture for their respective domains to answer technical questions on current capabilities and operations only. The business sponsors are the key stakeholders for the Functional Design Group.

The Physical Design Group attendees are the infrastructure architects, a technical engineer from compute, networking and storage, with senior level knowledge of current operations and architecture for their respective domains and the stakeholders for overall infrastructure. The key stakeholders for infrastructure are preferably not the managers from compute, network and storage, but rather an overarching director, technical directors or even a vice-president of infrastructure, if that role exists. In support of the infrastructure team, the application team should provide a senior devops member, a security architect and a technical representative from the application team who are to provide answers on current and future design operations only. Again, we don’t want application “power players” in this group to avoid any historically combative scenarios.

Stakeholders need only attend the first and last of their respective design sessions and workshops to ensure that they’re getting what they asked for in the business outcomes they owe back to the business. Exposing stakeholders to “how the sausage is made” or any design challenges, can affect their confidence in the overall initiative and be counter productive.

Again, these are recommendations only and different engagements and clients will at times call for adjustments to group membership, but again, the idea is to keep the infrastructure teams from telling the application teams “they can’t do that” and the like for the infrastructure groups.

Lesson 1: Keep each design group’s membership weighted so they can express their desires and concerns without conflict. The majority of infrastructure and app teams enjoy defeating each other too much and already have a taste for blood. Be the peacekeeper, the broker, the communicator for the groups and ensure any supporting staff from other groups understand their roles in only providing answers, without adding push-back.

Lesson 2: Don’t try to hold a physical design session first. Even though most infrastructure teams are anxious for their connectivity, capacity and redundancy requirements and options, explain the need for an NSX functional use-case design first and how those NSX functional use-cases will supply the physical design requirements for it. If you start with physical design, you may very likely (WILL) end up redesigning (and rebuilding) it again after the functional use-cases are determined.

It’s not easy to tell infrastruture teams that they can’t start with connectivity and physical design, but everyone will be better off, in the end, once you know what the functional use-case requirements are.

Lastly, always listen to exactly what the client is telling you and don’t make assumptions, ask questions. I had this beaten into my head by Paul Mancuso at VMware, a few years back in a mock NSX design defense and I’ve never forgotten it since. There is probably not more sage advice in regards to requirements gathering, than that one point alone.

Special Guest Appearance on the official VMware Community Podcast

That’s right! I’ll be appearing on this Wednesday’s episode of the VMware Community Podcast, March 6th at 12:00 PST, to talk about NSX-T 2.4, some of the recent gotchas and trends that I’m seeing in the field, and all things NSX.

You can watch the live video stream on the VMTN community page via Facebook, so make sure to join us there! Until then, #runNSX!

VMware Community Podcast Livestream: https://www.facebook.com/VMTNcommunity/


VMware NSX-T and Kubernetes Overview

As a Networking and Security Technical Account Specialist for VMware, I get a lot of questions regarding NSX and container integration with Kubernetes. Many network and security professionals are not aware of the underlying Kubernetes container architecture, services and communications paths, so before I start into how NSX works with containers, let’s examine why container development simplifies inter-process and application programming and how containers communicate via Kubernetes services.

The development of containers is driven greatly by the programmatic advantages it has versus server-based application development. A Kubernetes Pod is a group of one or more containers. Containers in a pod share an IP address, port space and have localhost communications within the pod. Containers also communicate with each other using standard inter-process communications like SystemV semaphores or POSIX shared memory space. These capabilities provide developers with much tighter and quicker development methods, and a large amount of abstraction from what they have to contend with in server-based application development.

Containers in a pod also share data volumes. Colocation (co-scheduling), shared fate, coordinated replication, resource sharing and dependency management are managed automatically in a pod.

One of the key internal Kubernetes services for NSX-T integration is the kube-proxy service. The kube-proxy service watches the Kubernetes master for the addition and removal of Service Endpoint objects. For each service, it creates an iptables rule that captures traffic to the Kubernetes Service’s back-end sets. Also, for each Endpoint object, it creates iptables rules which select a back-end Pod.

With NSX-T and the NSX Container Plugin (NCP), we leverage the NSX Kube-Proxy, which is a daemon running on each of the Kubernetes Nodes, which most refer to as “Minions” or “Workers”. It replaces the native distributed east-west load balancer in Kubernetes (kube-proxy & iptables) with Open vSwitch (OVS) load-balancing services.

Now that we’ve covered east-west communications in Kubernetes, I’ll address ingress and egress to Kubernetes clusters.

The Kubernetes Ingress is an API object that manages external access to the services cluster. By default, and in typical scenarios, Kubernetes services and pods have IPs only routable by the cluster network. All traffic destined for an edge router is dropped or forwarded elsewhere. An Ingress is a collection of rules that allow inbound connections to reach the cluster services.

Kubernetes Ingress can be configured to give services externally-reachable URLs, load-balance traffic, terminate SSL and offer name based hosting. The most common Open-source software used in non-NSX environments are Nginx and HAProxy, which many of you may be familiar with, supporting server-based application operations. There’s also an external load balancer object, not to be confused with the Ingress object.

When creating a Kubernetes service, you have the option to automatically create a cloud network load balancer. It provides an external-accessible IP address that forwards traffic to the correct port on the assigned minion / cluster node.

Now, let’s add NSX to the picture… When we install NSX-T in a Kubernetes environment, we replace the Kubernetes Ingress object with the NSX-T native layer-7 load balancer that performs these functions.

Since we’ve reviewed how traffic gets into a Kubernetes cluster, let’s take a look at how network security is handled.

A Kubernetes Network Policy is a security construct of how groups of pods are allowed to communicate among themselves and other network endpoints. Kubernetes Network Policies are implemented by the network plugin, so you must use a networking solution which supports NetworkPolicy, as simply creating the resource without a controller to implement it will not work. By default, Kubernetes pods are non-isolated and accept any traffic, from any source. Pods become isolated by having a Kubernetes Network Policy assigned to them. Namespaces that have a Network Policy or more assigned to them allow and reject any traffic per the policy.

And finally, let’s review NSX-T networking components for Kubernetes. As you can see in the graphic below, NSX-T components are deployed to support both the Kubernetes Master management network and the Kubernetes Minion nodes. This diagram depicts the use of a non-routable, “black-holed” network on a logical switch that is not connected to any logical router.

The Kubernetes Master management network and logical switch are uplinked to the NSX-T tier-0 router in the center of the diagram. The tier-0 router is also providing NAT to the Kubernetes Cluster. eBGP, being the only dynamic routing protocol supported by NSX-T at this time, will be configured with peer routes to the top or rack or even back to the core.

NSX tier-1 routers are instantiated in each Kubernetes Cluster Node and one is deployed for ingress services and load balancing we discussed previously.

For those who are unfamiliar with the difference between NSX-v Edges and NSX-T Edges see the “High-Level View of NSX-T Edge Within a Transport Zone” at docs.vmware.com. If you’re an NSX engineer or work with NSX as part of a team, I would highly recommend the VMware NSX-T: Install, Configure, Manage [V2.2] course. NSX-T is a major architecture change from NSX-v and there are too many changes in each component to even begin to list here. With that being said, in a simplistic view, NSX-T tier-0 routers serve as provider routers and NSX-T tier-1 routers serve as tenant routers. Each has different capabilities, so ensure to read up on features and the Service Router (SR) and Distributed Router (DR) components for a better understanding if needed.

Wrapping it up, Kubernetes with VMware NSX-T provides a much richer set of networking and security capabilities than are provided with native Kubernetes. It simplifies and provides automation and orchestration via a CMP or REST API, for K8s DevOps engineers and container developers alike. And to add that NSX-T is hybrid-cloud and multi-cloud capable, with greatly simplified networking and security, Kubernetes users should be very excited once they see what happens when they #runNSX.