Tech | 


Software defined networking with BGP and Quagga

At some point we needed to give the ability to our apps to talk to each other using overlay networking instead of the usual Docker port binding

Arnaud Bawol

Site Reliability Engineer @ Batch

What we needed

Most of our infrastructure runs on Docker, on bare metal clusters. At some point we needed to give the ability to our apps to talk to each other using overlay networking instead of the usual Docker port binding. We could've used service discovery and port binding, but it would've been more complicated than we needed: we would've had to bind several ports for a single app on a single host if we wanted to have several replica per host and change our load balancing configuration each time to match the topology change. Plus, the setup of an overlay network is a nice preamble to a migration on a Kubernetes cluster.

Why did we chose BGP?

Our software choice comes from a simple observation:Internet relies on a commonly unknown protocol, a.k.a. BGP. The very principle of this protocol is to provide dynamic routing based on some basic principles. A lot of network defining softwares are relying on either OSPF or BGP. We chose BGP for its simplicity and proven robustness. A lot of open-source initiatives software made similar design choices.

Why did we chose Quagga?

Quagga is quite reliable and often seen in networking stacks with proprietary hardware such as Cisco or Juniper. It is still in active development and has had a lot of updates over the time even though Zebra (the routing program behind Quagga) seems to be a bit outdated from an outside perspective.

A bit of tech

Since Kubernetes requires overlay networking, the documentation suggest to use a various set of tools to achieve this. A lot of those tools are either blackboxes or unpractical in our use case. We thought that using this network abstraction would also be nice for our "not yet kubernetes compliant" applications, so we did a bit of testing and got a satisfying result.

A cluster, a lot of networks.

The very principle of the overlay network is to provide a network for each node inside a cluster accessible by every other nodes. Our stack looked like this from a logical point of view :

As you can see, each docker container needs a individual port binding to work properly. Those ports then have to be mapped in our load-balancers to receive some trafic.Then with the overlay network we created a basic abstraction of this bare-metal entanglement :

Now each container has its own routable IP address and can use the same port even on the same host.

This topology change is given by BGPd on Quagga, a sample configuration snippet could look like this:

1! Ansible managed
2log file /var/log/quagga/bgpd.log
3!debug bgp events
4!debug bgp filters
5!debug bgp fsm
6!debug bgp keepalives
7!debug bgp updates
8router bgp 65500
9  bgp router-id
10! # This is the ipaddress of the observed host
11  timers bgp 30 90
12  redistribute static
13! # we want to send away our static routes
14  network mask 
15! # This is the docker0 network, so we need to append 
16! # the "bip": "" flag to docker's daemon.json
18! # Following : a description of our neighbors in the same AS
19  neighbor remote-as 65500
20  neighbor route-map foo out
21  neighbor route-map foo in
22  neighbor activate
23  neighbor remote-as 65500
24  neighbor route-map bar out
25  neighbor route-map bar in
26  neighbor activate
30! # We set the same preference to each router
31route-map foo permit 10
32  set local-preference 222
34route-map bar permit 10
35  set local-preference 222

The resulting routing table is quite straightforward :

1   $ ip r|grep -i zebra
2 via dev eth1  proto zebra 
3 via dev eth1  proto zebra 

We now have a fully operational overlay network. At this point, you may think that a classic SDN tool like calico would be easier to manage. Upon a certain scale it could be true, but we also need to take account of the main constraint in our environment : we are not on a public cloud. Therefore, we need to manually manage some stuff. Fortunately for us, a long time ago, ansible appeared in our world to make our life easier. At this moment, rolling a topology change in our overlay stack is a matter of seconds, with no service interruption whatsoever.

Quagga being not self sufficient, we added a home-brewed service discovery software to ensure that all of our live apps' capability to communicate with eachother and receive trafic from our load-balancers. We subsequently have been able with this networking feature to enable automated gossiping between our apps and do a lot of other fun stuff.

For Kubernetes : Load balancers, ingresses and stuff

Since there is a lot to read on the Internet about those ones, I think it's better to point out the "good ones", rather than poorly paraphrasing :

Our load-balancers are aware of every app on every container, they are able to send packets to each application based on our ACLs. Ingresses will be coming in an other article to be written.

Paving the way to multihomed infrastructure

Since we have a reproductible network model, why not apply it to a L2/L3 interconnection? You can check why and how here.


Even though the setup is quite simple while it is running, starting it from scratch was quite a ride. We have to acknowledge here the help provided by Paul Jakma on some steps of debugging our BGP setup.