Tuesday, 30 November 2021

My Fiber7-X VyOS Config

Updated Jul 2022: Following an exchange on Twatter it was clear to me that my explaination around the IPv6-PD usage was not very good, so I updated this section to clarify the prefix usage. I also feel that there is a gap here where the VyOS config should be able to pool the PD assignment, instead of me assigning it somewhere stupid like I did here.

As I discussed in my 10/25Gbit internet at home post, I recently moved away from an appliance based router at home, to instead use a fancy(ish) NIC passed through directly to a VM. The point of this was to try and increase the throughput, whilst maintaining a low footprint in terms of power, complexity and cost. I guess we can argue about the complexity bit, but it wasn't complex for me I guess. In this post I will break down the various sections of config to explain how they work and what they do. 

The VM

First, we have to deal with the VM itself. My hypervisor is VMware ESXi (free licence) running on an HP z620 tower workstation. There are ten-a-penny guides on how to setup ESXi and all that, plus I am sure someone is yelling PROXMOX at their screen right now. I have that running in a hyperconverged PVE/Ceph NUC cluster btw, I just don't use it for this.

In ESXi I have a Mellanox ConnectX-4 dual port 25Gbit NIC for VyOS, and an Intel x710-DA2 10G card for the Hypervisor vSwitch uplinks. The Mellanox NIC is handed over fully to the VyOS VM. For that I had to also reserve the memory assignment for the VyOS VM as well. Given I have a lot of spare resource, I chose to assign 4x single core sockets and 8GB RAM. I am pretty certain I can cut this to just 1GB based on the historical consumption figures, but the CPU seems about right - during heavy downloading the CPU spikes up to at least one full core, and heavily multi threaded downloads at line rate can push it to the limit. I chose single core virtual sockets following a discussion with the Netgate performance engineers. They specifically advise people with bare metal installs to disable Hyper Threading and to fix recieve queues to specific cores as well. More on that later.

For VyOS, I chose to build my own "production image" using the docker based build tool. There are a few different opinions on this, but since they started offering the Enterprise Edition, obtaining the free version became a little janky. The website offers the so called "rolling" release, which is built on every day when a newer commits to master are made. This means it is bleeding edge feature wise, and should be test complete, but not necessarily stable. Rather than try a bunch and work out what my stable would be, I chose to follow the community recommendation and build an image from the "current" 1.4 branch. The README was all the instructions I needed. I have since done a bunch of upgrades by building newer images and uploading them via scp, and simply activating them  it’s very easy to maintain.

So having configured the Mellanox NIC for passthrough and rebooted my ESXi box, then configuring the VM and adding a basic VMXNET3 for MGMT plus the PCI device for the WAN and inside trunk interface, I booted my ISO and ran the basic install process. 

The Initial Config

Trying to type in loads of things to the VMware Remote Console without copy/paste and all that stuff is sort of annoying, so I always go with a basic config that allows me to SSH in and complete the config in a terminal. Firstly, I double checked the MAC addresses on the interfaces so that I knew which one was which. Thankfully the MGMT interface came up as eth0, and the CX4 ports were eth1 and eth2. Easy peasy.

Here are the basic commands I ran to sort out remote access:

set interfaces ethernet eth0 address ''
set interfaces ethernet eth0 description 'MGMT'
set system domain-name 'mydomain.co.uk'
set system host-name 'vyos01'
set service ssh listen-address
set service ssh port 22

Commit and Save and then swap to a terminal attached to the MGMT LAN

The Network Design

My home network is quite simple really (for a network specialist). There is a MGMT interface (eth0), a WAN interface (eth1) and an inside interface (eth2). WAN is a routed port and MGMT is an untagged member of vlan10. eth2 is a dot1q trunk and carries two VLANs. All VLANs are represented on the switch, and most host ports are untagged members of VLAN9 for Home use. VLAN8 is there for the Lab to be able to use the 10G directly. This isn't possible outside of the Hypervisor yet, but i'm looking into alternatives to that CSS610 with at least 4 SFP+ ports to help with this. Since the hardware side of the lab is in the basement, and I presently only have Powerline access there, its all moot.

Note: I like to use asciiflow for markdown friendly diagrams. Blogger seems to have a fixed width here, and this broke the ASCIIArt, so here you have a screenshot.

Physically, there is quite the sprawl of stuff. I laid out all the assets that use a cabled connection only. The Wifi attached stuff (notably the tons of IoT things for HomeAssistant) would just clutter it all up even further.

When you see it on a picture, it's kinda obvious that whilst logically simple, my network is far from simple in practical terms. A lot of it is accumulated crap I should probably destroy, but for now, this is what I am working with.

The Core Config

Here we look at the minimum viable product configs. This is what you need to make VyOS work as a firewall/router in the home setting.

WAN Interface

set interfaces ethernet eth1 address 'dhcp'
set interfaces ethernet eth1 description 'Init7'
set protocols static route dhcp-interface 'eth1'
set system name-server 'eth1'

That is all you need to have the router itself talk to the internet. Now we need to setup the client side.

LAN Interface

My config has the inside configured as a trunk port and then a subinterface (vif) is configured for the VLAN:

set interfaces ethernet eth2 vif 9 address ''

Comically that is pretty much all you need to make the most simple router of all time. We don't want that tho. We want something a bit more useful than this!

Router Services

In my network, I delegate the internal DHCP Server role to a PiHole that runs on a Pi3 in the closet. I do use DHCP in the Management Zone tho, and that lives here on the VyOS box.

set service dhcp-server listen-address ''
set service dhcp-server shared-network-name mgmt authoritative
set service dhcp-server shared-network-name mgmt description 'MGMT'
set service dhcp-server shared-network-name mgmt name-server ''
set service dhcp-server shared-network-name mgmt name-server ''
set service dhcp-server shared-network-name mgmt ping-check
set service dhcp-server shared-network-name mgmt subnet default-router ''
set service dhcp-server shared-network-name mgmt subnet range scope1 start ''
set service dhcp-server shared-network-name mgmt subnet range scope1 stop ''

Lets assume you want some specific IPs on things in that management zone so you can have polling in the future. Here is the couple of lines you need for a reservation in that segment

set service dhcp-server shared-network-name mgmt subnet static-mapping core-sw ip-address ''
set service dhcp-server shared-network-name mgmt subnet static-mapping core-sw mac-address '2c:c8:1b:6a:c8:8d'

Logging is important, so lets ensure its all enabled

set system syslog global facility all level 'info'
set system syslog global facility protocols level 'debug' 

Next up, to ensure our logs are helpful, we setup NTP to ensure the clock is synced at all times

set system ntp server 0.ch.pool.ntp.org pool
set system ntp server 1.ch.pool.ntp.org pool

And to ensure that LLDP works both ways, we set this up on our internal interfaces:

set service lldp interface eth0
set service lldp interface eth2.9
set service lldp management-address ''

I, for reasons of lazyness would also like SSH to work over the Home interface

set service ssh listen-address ''

Finally, we want the OS to be tuned for performance, cos we are going to use a fast feed (may require reboot). A description of what this does is in the docs.

set system option performance 'throughput'

NAT Setup

There is not a lot of point having an IPv4 firewall/router if it doesnt do NAT. Here we do all the fun things and stuff to make IPv4 source NAT work. Here I use 77 as a rule ID 'prefix' and then have dedicated rules for each source subnet. There are other ways to do this, but this is more surgical and verbose is my preference when it comes to things like NAT and Firewalls.

set nat source rule 771 outbound-interface 'eth1'
set nat source rule 771 source address ''
set nat source rule 771 translation address 'masquerade'
set nat source rule 772 outbound-interface 'eth1'
set nat source rule 772 source address ''
set nat source rule 772 translation address 'masquerade'

Now assuming we want to allow some inbound services from the internet, we need some destination NAT as well... I picked the rule number for a reason, but I guess its obvious this idea doesnt scale well. I kinda don't care that much tho ;)

set nat destination rule 443 description 'HTTPS to Ingress'
set nat destination rule 443 destination port '443'
set nat destination rule 443 inbound-interface 'eth1'
set nat destination rule 443 protocol 'tcp_udp'
set nat destination rule 443 translation address ''
set nat destination rule 443 translation port '443'

Zone Based Firewall

This is going to be a bit more chunky. ZBF is universally agreed to be more scalable, but it is also universally agreed to suck balls to configure. The first build is annoyingly slow, but once it is in, adding interfaces into zones is trivial and policy can be inherited/deduplicated, which I think is a big win. I spent a little bit of time on automating this in Nornir, which I will publish soon. I think I could probably write an entire essay on ZBF concepts, so I will save you my ramblings and refer you to the Docs instead: VyOS Zone-Based Firewall Guide

ZBF - Basics

A lot of this is probably default config, but since VyOS configdb is idempotent, if you paste a command that already exists, it will just skip on past. 

set firewall all-ping 'enable'
set firewall broadcast-ping 'disable'
set firewall config-trap 'disable'
set firewall ip-src-route 'disable'
set firewall log-martians 'enable'
set firewall receive-redirects 'disable'
set firewall receive-redirects 'disable'
set firewall send-redirects 'enable'
set firewall source-validation 'disable'
set firewall syn-cookies 'enable'
set firewall twa-hazards-protection 'disable'
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp 

I also defined a group for inside networks that can be used later on. Feel free to define as many network and service groups as you like to clean up your config.

set firewall group network-group inside-nets network ''
set firewall group network-group inside-nets network ''

ZBF - Policies

So we need policies before we can apply those to zones. The Zone assignments refer to the policy names, so lets make them first. 

In each policy I use a default-action drop, which is a sort of best practice. In more trusted settings, you might want a default permit (outbound from LAN to WAN for example), and typically people will use a default-action of accept. I prefer to put a drop policy in and then a single rule to permit all at rule 1, since if something horrible happens and I need to block something outbound (malware C2 or somesuch), I can insert a rule above to drop something specific. I can also change that rule 1 to only permit certain ports or protocols and drop all others. Basically, better set a sane default and override it, than have to change the whole policy in an emergency, which might have undesired consequences, thus compounding an already shitty situation. 

I like to put a description in but lets face it, its pretty self describing.

I also enable the default log statement so all firewall rule hits are logged. This is again a good best practice. You wont care until you need this and then future you will thank you for this.

ZBF - Policies - Local

Local policies are those that originate or terminate on the VyOS instance directly. We need traffic in both directions (inbound and outbound) from and to the router, so we have four policies here.

set firewall name lan-local default-action 'drop'
set firewall name lan-local description 'LAN to This Router IPv4'
set firewall name lan-local enable-default-log
set firewall name lan-local rule 1 action 'accept'
set firewall name lan-local rule 1 description 'Better this than default allow and change later!'

set firewall name local-lan default-action 'drop'
set firewall name local-lan description 'This Firewall to LAN IPv4'
set firewall name local-lan enable-default-log
set firewall name local-lan rule 1 action 'accept'
set firewall name local-lan rule 1 description 'Better this than default allow and want to change later!'

set firewall name wan-local default-action 'drop'
set firewall name wan-local description 'WAN to This Device IPv4'
set firewall name wan-local enable-default-log
set firewall name wan-local rule 1 action 'accept'
set firewall name wan-local rule 1 state established 'enable'
set firewall name wan-local rule 1 state related 'enable'
set firewall name wan-local rule 2 action 'drop'
set firewall name wan-local rule 2 state invalid 'enable'
set firewall name wan-local rule 3 action 'accept'
set firewall name wan-local rule 3 description 'DHCP Replies'
set firewall name wan-local rule 3 destination port '67,68'
set firewall name wan-local rule 3 protocol 'udp'
set firewall name wan-local rule 3 source port '67,68'
set firewall name local-wan default-action 'drop'
set firewall name local-wan description 'This Router to WAN IPv4'
set firewall name local-wan enable-default-log
set firewall name local-wan rule 1 action 'accept'

Inbound policies (lan-local and wan-local) are all about things talking to the router directly. LAN side I allow anything and WAN side we block everything not related to an existing "inflight" outbound conn, but I also had to enable the inbound DHCP flows since these are stateless. 

Outbound I again just permit all the things.

ZBF - Policies - Transit 

Transit policies are ones where the flow is designed to transit through the router. In iptables world these are FORWARD rules.

set firewall name lan-wan default-action 'drop'
set firewall name lan-wan description 'LAN to WAN IPv4'
set firewall name lan-wan enable-default-log
set firewall name lan-wan rule 1 action 'accept'
set firewall name lan-wan rule 1 description 'better this than default accept and then you change your mind!'

set firewall name wan-lan default-action 'drop'
set firewall name wan-lan description 'WAN to LAN IPv4'
set firewall name wan-lan enable-default-log
set firewall name wan-lan rule 1 action 'accept'
set firewall name wan-lan rule 1 state established 'enable'
set firewall name wan-lan rule 1 state related 'enable'
set firewall name wan-lan rule 2 action 'drop'
set firewall name wan-lan rule 2 state invalid 'enable'
set firewall name wan-lan rule 443 action 'accept'
set firewall name wan-lan rule 443 description 'HTTPS to ingress'
set firewall name wan-lan rule 443 destination address ''
set firewall name wan-lan rule 443 destination port '443'
set firewall name wan-lan rule 443 protocol 'tcp_udp'

As well as the default log and description, I have a rule that matches our destination NAT rule we defined previously. Notice how we use the "real" IP of the host on the inside. This can catch some people out who are used to firewall rules referring to the mapped IP on the outside of the firewall. I chose the same rule ID as the NAT rule ID, but that is my personal choice, there is no requirement to syncronise these rule IDs.

ZBF - Zones

Now that we have policies, we can assign these policies to zones. Zones and interfaces are a one-to-many mapping, i.e. one Zone can contain many interfaces, but for the avoidance of doubt, an interface can only exist in one Zone ;)

set zone-policy zone wan default-action 'drop'
set zone-policy zone wan from lan firewall name 'lan-wan'
set zone-policy zone wan from local firewall name 'local-wan'
set zone-policy zone wan interface 'eth1'

set zone-policy zone lan default-action 'drop'
set zone-policy zone lan from local firewall name 'local-lan'
set zone-policy zone lan from wan firewall name 'wan-lan'
set zone-policy zone lan interface 'eth2.9'
set zone-policy zone lan interface 'eth0'
set zone-policy zone local default-action 'drop'
set zone-policy zone local from lan firewall name 'lan-local'
set zone-policy zone local from wan firewall name 'wan-local'
set zone-policy zone local local-zone

Here we created our three zones called wan, lan and local. We then assigned the firewall policies "inbound", and finally assigned interfaces to the zones. Note the special "local-zone" flag for that local zone.

A lot of people then go on to ask, what happens to traffic between interfaces within one zone, in our example. eth0 and eth2.9. These flows are unrestricted - think of them as a plain routed interface. 

Note: MGMT and User traffic? Routed? No FW? WHATTTT? - This is my house, I don't care. Don't @ me.

At this point you should have a fully operational internet connection on eth0 and eth2.9, providing you legacy internet access. Whats legacy internet access? IPv4 only of course!

Enabling IPv6

First, I should point out that my approach here is rather opinionated. Some will argue it is literally stupid. I also do not care about this ;)

The point of IPv6 was to deliver enough internet addressing that we could never run out like we have with IPv4. A side benefit of this was that NAT was essentially deprecated, since we dont need to share one public IP with many clients inside a LAN zone. For many, the obvious benefit is that all devices in your LAN get a public routable IPv6 address, and you control the flows on the border with a firewall, just like normal. In my workplace we have dual internet feeds, and we get IPv6 addressing from both. Which IP address pool should we use to assign IPv6 addresses to clients? If both, which should a client machine use to route outbound? 

In some settings, the SLAAC addressing system makes a lot of sense, but in an enterprise, I still need concrete services, which means predictable, reliable DNS/IPs. Ultimately, I will find myself using some form of static addressing. Now, what happens when I change ISP? That /48 I got from the first one is now not mine, and I have to re-address my entire LAN. This is unsustainable. 

Thankfully, someone thought of all of these use cases and came up with Unique Local Addressing, and Prefix Translation (sometimes known as NAT66) as a solution. With ULA, I can generate (and optionally register) a ULA prefix that should be globally unique for my site, and use that for all internal addressing purposes. I then configure NAT66 to swap the first 48 or 64 bits of my 128 bit IPv6 address to one from our IPv6 Prefix delegation we recieved from upstream. If the ISP assigns us something new, or we change ISP (via failover or physically replacing a supplier), the prefix substitution just does the work for me. So inside my LAN, I have a private address, from a single site specific prefix, and then as traffic heads outside my LAN I have an otherwise identical address, but with an internet routable prefix. 

e.g. Inside my LAN my PC looks like this:

% ifconfig
        ether 00:3e:e1:c9:28:10
        inet6 fe80::ca2:34c4:8b00:daa8%en0 prefixlen 64 secured scopeid 0x4
        inet6 fda4:7911:df45:9:4cc:2a5a:8ee5:a917 prefixlen 64 autoconf secured
        inet netmask 0xffffff00 broadcast
        nd6 options=201<PERFORMNUD,DAD>
        media: autoselect (1000baseT <full-duplex>)
        status: active

but on the internet, i am seen as:

% curl -6 https://ifconfig.co

All that happened is we swapped that first 48 bits.

Now some gamers might be screaming at me right now, I have no idea if this is good or bad for you guys, but given it is a 1:1 mapping, it is probably ok tbh. We will see in the comments I guess.

IPv6 - Interfaces

Init7 offer a /48 and in my case, whilst it is assigned via DHCPv6, this assignment is reserved for me in their system. It's logically static. Your mileage may vary, and I know of at least two other init7 customers who had their /48 change, but randomly. This in essence is why I follow this NAT process btw. So first we need to enable dhcpv6 on our internet interface and request an IPv6 prefix delegation that we will anchor to an inside interface for operational reasons. If there were a better way to do this, i would, since we dont actually use this IP on this interface in daily use, but we need to anchor it somewhere. If you are concerned, you could bind this to a non existent VLAN maybe.

set interfaces ethernet eth1 ipv6 address autoconf 
set interfaces ethernet eth1 address 'dhcpv6'
set interfaces ethernet eth1 dhcpv6-options pd 0 interface eth2.9 address '9'
set interfaces ethernet eth1 dhcpv6-options pd 0 length '48'

So our WAN interface uses SLAAC for the interface IP, getting an address in a /64 owned and operated by Init7. We then use DHCPv6 to request an IPv6 /48 that we can assign to the inside interfaces of our router. 

Here I am cheating and statically assigning the :9::1/64 from our site /48 to the vlan 9 under the eth2 interface. I follow this scheme for all VLANs since I am unimaginative. 

I assume you have gone and got your own ULA prefix and registered it. Dont use mine!

set interfaces ethernet eth2 vif 9 address 'fda4:7911:df45:9::1/64' 

When you commit this, you can query your eth2.9 interface to see the PD you got. Here you see I got 2a02:168:4047::9/64. I am hopeful there is a better way to do this. Right now I cant see it tho.

[email protected]:~$ show interfaces ethernet eth2.9
[email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 50:6b:4b:1c:09:fb brd ff:ff:ff:ff:ff:ff
inet brd scope global eth2.9
valid_lft forever preferred_lft forever
inet6 2a02:168:4047::9/64 scope global
valid_lft forever preferred_lft forever
inet6 fda4:7911:df45:9::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::526b:4bff:fe1c:9fb/64 scope link
valid_lft forever preferred_lft forever

IPv6 - NAT66

Here we do the magic prefix switcheroo. I have a rule for each vlan ID so I can swap the /64 in a granular way. Realistically, I could compress this to a /48 on a single rule, but again, I like to be verbose so people know exactly what is happening at each stage.  

set nat66 destination rule 9 destination address '2a02:168:4047:9::/64'
set nat66 destination rule 9 inbound-interface 'eth1'
set nat66 destination rule 9 translation address 'fda4:7911:df45:9::/64'
set nat66 source rule 9 outbound-interface 'eth1'
set nat66 source rule 9 source prefix 'fda4:7911:df45:9::/64'
set nat66 source rule 9 translation address '2a02:168:4047:9::/64'

IPv6 - Router Advertisements

Setup Router Advertisements on your inside interface, ensuring internal clients get a random SLAAC address from my ULA pool, and DNS Servers.

set service router-advert interface eth2.9 name-server '2606:4700:4700::1111'
set service router-advert interface eth2.9 name-server '2606:4700:4700::1001'
set service router-advert interface eth2.9 prefix fda4:7911:df45:9::/64

IPv6 - Firewalling

Here it is essentially a copy paste of the v4 rules from earlier but edited to a v6 approach.

set firewall ipv6-receive-redirects 'disable'
set firewall ipv6-src-route 'disable'
set firewall ipv6-name lan-local-6 default-action 'drop'
set firewall ipv6-name lan-local-6 description 'LAN to This Router IPv6'
set firewall ipv6-name lan-local-6 enable-default-log
set firewall ipv6-name lan-local-6 rule 1 action 'accept'
set firewall ipv6-name local-lan-6 default-action 'drop'
set firewall ipv6-name local-lan-6 description 'This router to LAN IPv6'
set firewall ipv6-name local-lan-6 enable-default-log
set firewall ipv6-name local-lan-6 rule 1 action 'accept'
set firewall ipv6-name local-lan-6 rule 1 description 'better this than default allow and want to change later!'

set firewall ipv6-name
local-wan-6 default-action 'drop'
set firewall ipv6-name local-wan-6 description 'This Router to WAN IPv6'
set firewall ipv6-name local-wan-6 enable-default-log
set firewall ipv6-name local-wan-6 rule 1 action 'accept'

set firewall ipv6-name wan-local-6 default-action 'drop'
set firewall ipv6-name wan-local-6 description 'WAN to This Device IPv6'
set firewall ipv6-name wan-local-6 rule 1 action 'accept'
set firewall ipv6-name wan-local-6 rule 1 state established 'enable'
set firewall ipv6-name wan-local-6 rule 1 state related 'enable'
set firewall ipv6-name wan-local-6 rule 2 action 'accept'
set firewall ipv6-name wan-local-6 rule 2 protocol 'icmpv6'
set firewall ipv6-name wan-local-6 rule 3 action 'accept'
set firewall ipv6-name wan-local-6 rule 3 description 'DHCPv6 Replies'
set firewall ipv6-name wan-local-6 rule 3 destination port '546'
set firewall ipv6-name wan-local-6 rule 3 protocol 'udp'
set firewall ipv6-name wan-local-6 rule 3 source port '547'

set firewall ipv6-name wan-lan-6 default-action 'drop'
set firewall ipv6-name wan-lan-6 description 'WAN to LAN IPv6'
set firewall ipv6-name wan-lan-6 enable-default-log
set firewall ipv6-name wan-lan-6 rule 1 action 'accept'
set firewall ipv6-name wan-lan-6 rule 1 state established 'enable'
set firewall ipv6-name wan-lan-6 rule 1 state related 'enable'
set firewall ipv6-name wan-lan-6 rule 2 action 'accept'
set firewall ipv6-name wan-lan-6 rule 2 protocol 'icmpv6'
set firewall ipv6-name lan-wan-6 default-action 'drop'
set firewall ipv6-name lan-wan-6 description 'LAN to WAN IPv6'
set firewall ipv6-name lan-wan-6 enable-default-log
set firewall ipv6-name lan-wan-6 rule 1 action 'accept'
set firewall ipv6-name lan-wan-6 rule 1 description 'better this than default accept and then you change your mind!'

I think most of this is self explanatory with the one small exception being we had to swap our DHCPv4 rule for a DHCPv6 rule in the wan-local-6 policy. Same problem, different execution. 

IPv6 - Zones

set zone-policy zone lan from local firewall ipv6-name 'local-lan-6'
set zone-policy zone lan from wan firewall ipv6-name 'wan-lan-6'
set zone-policy zone local from lan firewall ipv6-name 'lan-local-6'
set zone-policy zone local from wan firewall ipv6-name 'wan-local-6'
set zone-policy zone wan from lan firewall ipv6-name 'lan-wan-6'
set zone-policy zone wan from local firewall ipv6-name 'local-wan-6'

As before, we assign our policies to our zones. 

At this point you should now have a fully operational dual-stack router.

tv7 - Multicast

Last on my list here is the multicast work to allow TV7 to work on wired clients in vlan9. You can add other vlans if you like.

First we need an IGMP Proxy. I translated these settings from the edgerouter configs shared by Init7 themselves. The EdgeRouter is a fork of vyatta, just like VyOS, so it should be identical. I found I didnt need the alt-subnet on the downstream side tho.

set protocols igmp-proxy interface eth1 alt-subnet ''
set protocols igmp-proxy interface eth1 role 'upstream'
set protocols igmp-proxy interface eth2.9 role 'downstream'

Secondly we need firewall rules. The rules from init7 are pretty broad, so I went more specific.

set firewall name wan-local rule 771 action 'accept'
set firewall name wan-local rule 771 description 'Allow tv7 streams'
set firewall name wan-local rule 771 destination address ''
set firewall name wan-local rule 771 destination port '5000'
set firewall name wan-local rule 771 protocol 'udp'
set firewall name wan-local rule 772 action 'accept'
set firewall name wan-local rule 772 description 'Allow tv7 IGMP'
set firewall name wan-local rule 772 protocol 'igmp'

And that is it. A fully operational VyOS config for Init7. This gist contains the full config (less user accounts, and some more specific things I have just in my environment)

Please let me know your comments and thoughts. I am always willing to learn.

Sunday, 28 November 2021

10/25 Gbit Internet at Home - a very 21st Century Problem

When I first arrived in Switzerland in 2017, aside from the clean air, beautiful countryside, and the promise of a highly paid job with which to support my family, I was transfixed by something that seemed completely alien to me - Fibre to the Home. Yes, I am a nerd, and Yes somewhat parallel to a comfortable living space, and good local schools, I made sure each house viewing included a search for an OTO.  No Fibre, no way.

When choosing an ISP, I had been recommended by colleagues to only consider Init7, since it was an ISP by nerds, for nerds. No brainer I guess? On top of the promise of "internet as it should have always been", coupled with a highly technical service desk that talks my language, they also offer something they call the "MaxFix guarantee". In short, they will offer you a service for a fixed price (77CHF/month), and for this miserly sum, you get the fastest speeds possible to your location. If you are in a weird offnet place, this might mean DSL or PPPoE, which as a Brit I am extremely used to already. In FTTH locations, that means 1 Gbit symmetrical, which I only ever observed in Datacentre settings for thousands of pounds per month. 

Words cannot describe the childish levels of excitement on that first day when I ran a speedtest at 980mb/s up and down. Its entirely pathetic, and I sincerely do not care either. 

In the buildup to that moment, I had a few weeks where I was shopping around for 1Gbit capable hardware, and I suppose I wasn't that shocked to find it was all pretty expensive. Four years ago there was basically nothing that handled gigabit symmetric plus FW/NAT and IGMP. Ubiquiti had their Edgerouter4 which had varying reviews at these speeds, and Init7 themselves recommended the Turris Omnia (OpenWRT) for tinkerers and the Mikrotik CCR for the more business end of the market. I flirted with the idea of pfsense/opnsense and even raw debian with nftables, but on the basis it was for "production" use at home, I felt this was likely to fail the wife test. If it went wrong, and I wasn't home, it was likely not something they could fix alone. 

Since I wanted to make use of the tv7 offering as well, which means multicast, I chose to stick to the examples on the Init7 site, which were focussed on the Mikrotiks. I bought a CCR1009 for about 400 bucks and the Media Converter plus optic/fibre set from Init7 for 111. This way if the MT didn't work out, I always had a copper connection option to fall back on.

I last worked with Mikrotiks maybe 15 years ago in a Rural Wireless ISP setting where they were the main option for people doing community funded projects like ours was. Whilst the feature list has grown with all the modern bells and whistles, the interface is still trash, and whilst it has an API now, that is also pretty garbage. I disliked it greatly, but I put up with it because it was relatively cheap, and did the job I needed it to. I was able to use the 7 copper ports to connect the rooms of my house over the built in CAT6 patching, and use the single 10G SFP+ port to connect to my ESXi hypervisor. 

So, for the last 4 years of so this has been my setup - the CCR is the core, ESXi for all the other things, and then a few Netgear GS108Ev3 switches out in the rooms to connect all the various crap I have around the house. Wifi comes from old Cisco 1702 (802.11ac) APs at each end of the Apartment. This managed to survive Covid WFH, my wife with her UK TV obsession, and my daughter growing up to be a Netflix addict - always providing a reliable, stable, performant internet service. 

Then as the Fresh Prince of Belair once said; my life got flip turned upside down. On Init7's 7th birthday, they decided to drop the option of 10 and 25 Gbit upgrades to their existing 1G fiber7 service, all within MaxFix guarantee, and thus no additional monthly cost. So as a reminder, I am a nerd, but specifically, I am a networking nerd. I run global scale network infrastructure for a major privacy focussed email provider. I live and breathe huge connectivity, and this was too tantalising to ignore. And so began, the race to 25Gbit at home.


Firstly, I have a hardware lab in the basement that I collated during the Covid period for testing random things without having to visit a DC. Among three servers I found Mellanox ConnectX-4 dual port cards, which means I was all good for 25G at a physical layer. These can be bought for anything between 100 and 350 CHF, depending on the generation and source (CX-4s can be found for 100ish bucks online, CX-5s can be bought brand new from fs.com). In the 10G spec I found a very desirable Intel X710-DA2 and a bunch of X520-DA2s as well. I was spoiled for choice I guess you could say. 

Secondly, Init7 use Bidi optics so that they only occupy one single fibre in the ground. 10G bidis are about 100 bucks, and 25G bidis are about 350 bucks. This ended up being the top cost for me. 

Lastly, I needed some compute. I'm a big fan of the HP Z series workstations because they are powerful (Xeon), but quiet. For this job, I found the z220 SFF box on Ricardo for 139 CHF which appears to have two PCI-e x16 slots. When I started to dig into the details however, I found that one is "limited" to GFX workloads, and the second is a x16 "mechanical" but only x4 "electrical". In other words, its a lie. I must have then spend dozens of evenings digging into the spec sheets of god knows how many systems, hoping to find a SFF box that provided two x8+ slots. The best options I found were Supermicro boxes that were hard to source second hand and new, cost over 1000CHF. I also tend to find Supermicros are noisy since they expect to be in a computer room, not the home.

At this point I decided that a dedicated "router" system was not going to pass the wife test, so I decided to utilise PCI passthrough and place my routing system on a VM within my existing HP Z620 tower running ESXi free edition. This beast of a box now has the second CPU riser hosting a pair of E5-2670s meaning it boasts 32 cores @ 2.6Ghz and 192GB ECC RAM in a very quiet case that generates relatively little heat in my hall closet. Compared to the DL360s I have of equal power that sound like Jet fighters most of the time, its fabulous. A similar system can be obtained from Ricardo for around 400CHF. Mine was much less, and the second CPU unit was maybe 150 on eBay in the UK. I don't need all this grunt for this job, and realistically all you need here is two PCI-e x8 slots. I have this system also running TrueNAS for all my media, and tons of networking lab stuff.

I also did an Init7 referral recently, which meant I had a credit outstanding with them for 111CHF, which is exactly the one off upgrade price for the service. In the end, my total out of pocket expense for this upgrade was that Bidi Optic. Everything else was covered*.
(* except for a switch change, but I will cover that later)

25G - an attempt was made

As any networking nerd can attest, MOAR SPEED is always important, even when it isn't. We have simple pleasures in life and for me, getting 25Gb connectivity into my house was a badge of honour I wanted to wear. Of course there are nigh on ZERO internet properties that can and will service you as a client at 25Gbit. SO WHAT? I WANT IT!

Over the following 3 months I literally went over every possible permutation for delivering 25Gbit into the house, WITHOUT failing the wife test. I failed. The problem is not the routing part, or even the compute to handle the NAT/FW features. I was able to prove that all of that is relatively trivial if you pass the 25Gb NIC into the VM, mainly because the NIC does most of the work, and the CPU is now so fast, and the linux kernel is so quick, that you can do this with modest CPU/RAM. I was able to do all this in VyOS with 4 cores and 8GB RAM. 

I should also do a shout out to Netgate, who as part of this reddit thread, reached out to me directly and offered me a Lab licence for their VPP enhanced router called TNSR, and even spent a bunch of ticket time with me tuning the shit out my install. Unfortunately their product is still targeting the DC/Edge use cases, and right now its not really able to do all the things a home user wants. and no native support for IGMP/Multicast was a deal breaker for me. Their base is CentOS8, so I could have fixed that myself, but I just felt it was too snowflakey to last. I will track the project and may even use it in the work setting one day too. 

Ultimately the problem with all these VM solutions, was getting that 25Gb out of the VM, and back to the Hypervisor where other 25GB connected VMs can consume it. Without a 25 Gbit switch, I was screwed. 

As it stands today, there is only one consumer friendly device that offers at least two 25GB ports - the same device that Init7 and other customers have field tested for 25 Gbit connections, and verified to NOT be able to handle 25G flow rates: the Mikrotik CCR2004. At 500 bucks, to deploy this in a simple switching role would actually be really expensive, since I would need a bunch of 1G copper SFPs as well. 

I looked around and an interesting option was this HPE 4x25G QFSP28 card which seems to lurk online at £65(!). What put me off was my experience of cards like this (the CX5-100G specifically) where the breakout feature doesn't work "outbound" and so I decided against buying one. I still hold a candle for the hope to setup this with a 4 way passive breakout box. I might revisit this at somepoint when I have more fun money to spare tho. Outside of this, the only options were ex-DC switching which is still quite expensive both to buy and to run 24x7, and obviously extremely noisy. 

I will return to this when some more 25G gear appears on the market, and my hope is I can configure something like this breakout card with something like an ONL/OVS to switch 25G again via a VM with a passthrough card. Until then....

10G line rate for shits and giggles

Having realised 25G was going nowhere, and this 111CHF credit was burning a hole in my pocket, I switched off the 25G obsession and focussed on something more plausible. I had 10G working in the house already, and I could see the 10G optics were in stock at the supplier. I pulled the trigger and it arrived 3 days later. I was finally ready. 

I decided the best route forward was to migrate my existing config off the CCR1009 to a VyOS VM, and then move the 1G bidi over so that it could be tested and left to run in situ, such that on activation day, all I had to do was swap the optic over. It then gave the added benefit of proving anything that was broken was broken by the optic swap and not the swap from MT to VyOS. As it happens, this was wise because I had to fix a bunch of things in the process. 

The day before I wanted to swap I had a realisation that I needed a second 10G port. The CX4 card is passed in fully to VyOS, and the x710 sits as the Hypervisor uplinks. Originally I wanted to use a VMXNET3 to connect the "inside" to the rest of the network (via the hypervisor uplinks). This way the VMs would get the full 25G and the rest of the network (which is mostly 1G connected) gets to share the 10G port between the H/V and the MT1009. Thats when the story gets weird. 

During my 25G tests I was able to switch within the ESXi box, VM to VM, within a single VLAN at 22-24Gbit/s. That is to say VM1 could iperf3 client to VM2 as a server within the same vSwitch Port Group. Unfortunately, when you try to route over a VM from a passed NIC or any other VMXNET3 NIC, your singlethread performance was limited somewhere between 6 and 8Gbits/s. That is to say the exact same VMs running iperf3 being separated by another VyOS VM using VMXNET3s/X710 or a mixture of those over a 10G switch (a borrowed MT CRS305 in this situation), would always get stuck around the 6-8Gbits/s range. 

Now. Thats not awful, but it felt like I was leaving too much on the shelf here. So I found a very very interesting MT switch, known as the CSS610, which has 8x 1G copper ports and 2x 10G ports, for 100 bucks. This carbon copy replaced the CRS1009, and was cheeeeeeeeeap. 

Therefore, my final hardware installation now is a CSS610 in the middle of the network, with copper out to the rooms as before, one 10G port faces the Hypervisor vSwitch, and one faces the VyOS Passthrough NIC as an "inside" trunk port. The other port of that NIC faces Init7. Having it arrive as a greenfield box also meant I could do a lot of config in parallel, without bothering the family who were using the network all day every day. This was also a good thing, because if I thought the RouterOS interface was crap, the SwitchOS is raw sewage...

The Build

As a somewhat seasoned VyOS user, and coming from a Juniper shop, the configuration started off quickly, and I was able to deploy the core config within an hour. We had a stateful FW, NAT, DHCP and two internal interfaces on the trunk port - "MGMT" and "Home". Having spent another hour shouting at the firewall policies, I took a hint from the VyOS subreddit and swapped all of that stateful FW stuff for Zone-Based Firewalling. At first I hated it mostly because its very cumbersome to frame it all up, but having now added another internal interface for my lab space, and a few rules, all with little to no effort, i'm already appreciating it all. It reminds me of the configs we use on our SRXs in the office, and generally, it feels easier to use now I figured out the syntax. 

When I took the plunge and moved the optic over, the internet worked instantly, but I had some weird issues with my firewall policies that needed a tickle, and IPv6 was just dead. I eventually figured that out and now we have a nice, clean, basic config. 

Over the next couple of days I tuned out some of the tv7/IGMP things, and then had a minimum viable product that far exceeds my old config. I was able to do an iperf3 towards the Init7 speedtest box (which is now 100g attached - thanks Pascal!), and got line rate 10G both ways. 

I then tried a bunch of tother things and typically I see 6-8Gbits on most internet properties. This was bugging me and I went back to some of my tests. The ones where I saw 22Gbits/s were also capping out at 6ish. It then dawned on me. I started all this work around the early summer time. I then spent much of the late summer in the UK since they lifted all the travel restrictions, and when I returned, one thing I brought back was that second CPU riser for the z620. To fit that riser in, I had to remove an NVME card I had in there, and move all the VMs over to the built in SSD. That SSD is connected with 6Gbit SATA. Invariably I now see that download and write to disk caps out at 6ish GBit/s. I will probably have to buy a dedicated device for either the TrueNAS or the router so that I can put the NVMe back into the hypervisor, and restore the fast writes. My concern is that moving the NAS out will require me to add more 10G ports to the mix, and this makes things even more costly. 

So here we are at the end of the story. I now have a theoretical 10Gbit capability in my home, which far exceeds quite a few of the DCs I have worked in over the years. You can argue with me about contention rates and blah, but I trust the information I have from Init7 about their NewCore7 scheme, which for at least a very short period of time I studied quite closely, since I actually applied to work there during the early phases of the project. Yes its contended, but likely not like mainstream ISPs are. The loop design they have and the way they deploy their backhaul 100G links mean I have confidence it will take a while before I see contention in my service caused by something on the Init7 side of things. Couple that with their peering being immaculate, most of the congestion I experience is likely at the remote end. As a guy who builds those networks for a living, I know acutely how the compromises are made and in many cases, I don't care that things go "slow" now and again. Everything I experience here is orders of magnitudes faster than I used to see in the UK, where the best I ever got was 300Mbit from Virgin Media, which at best, was 50mbit usable, and regularly in the low ADSL service bracket. 

Somewhat unrelatedly, having observed the battles that Fredy and the team have taken on with Swisscom and the monopolistic behaviours here in Switzerland, I appreciate their efforts, just like ours at Proton, to make the internet a better place. Its great to have a job, but its even better if your job can be doing something bigger for the greater good. Net Neutrality is of critical importance, and the init7 team are fighting the good fight. We should support them however we can. For me, I give them my money, and I advocate that anyone else here in CH do the same if they can. 

For those that are interested, I will shortly follow this post with a breakdown of the VyOS config, and when I get time, I will clean up the nornir project I wrote to do the config management to opensource it for the community. Maybe its to niche for some people, but i'll try to keep it as generic as possible and easy to extend. I am also hoping to add VyOS support to Google's Capirca project, as I think generating ACLs is extremely boring, and capirca abstracts the domain specific nature (i.e. vendor) away to make a more readable source material for firewall rules.

node_exporter in VyOS 1.4

So it turns out that if you want metrics from VyOS, your two options are SNMP or Telegraf (towards InfluxDB).  SNMP is one of those things t...