As I discussed in my 10/25Gbit internet at home post, I recently moved away from an appliance based router at home, to instead use a fancy(ish) NIC passed through directly to a VM. The point of this was to try and increase the throughput, whilst maintaining a low footprint in terms of power, complexity and cost. I guess we can argue about the complexity bit, but it wasn't complex for me I guess. In this post I will break down the various sections of config to explain how they work and what they do.
First, we have to deal with the VM itself. My hypervisor is VMware ESXi (free licence) running on an HP z620 tower workstation. There are ten-a-penny guides on how to setup ESXi and all that, plus I am sure someone is yelling PROXMOX at their screen right now. I have that running in a hyperconverged PVE/Ceph NUC cluster btw, I just don't use it for this.
In ESXi I have a Mellanox ConnectX-4 dual port 25Gbit NIC for VyOS, and an Intel x710-DA2 10G card for the Hypervisor vSwitch uplinks. The Mellanox NIC is handed over fully to the VyOS VM. For that I had to also reserve the memory assignment for the VyOS VM as well. Given I have a lot of spare resource, I chose to assign 4x single core sockets and 8GB RAM. I am pretty certain I can cut this to just 1GB based on the historical consumption figures, but the CPU seems about right - during heavy downloading the CPU spikes up to at least one full core, and heavily multi threaded downloads at line rate can push it to the limit. I chose single core virtual sockets following a discussion with the Netgate performance engineers. They specifically advise people with bare metal installs to disable Hyper Threading and to fix recieve queues to specific cores as well. More on that later.
For VyOS, I chose to build my own "production image" using the docker based build tool. There are a few different opinions on this, but since they started offering the Enterprise Edition, obtaining the free version became a little janky. The website offers the so called "rolling" release, which is built on every day when a newer commits to master are made. This means it is bleeding edge feature wise, and should be test complete, but not necessarily stable. Rather than try a bunch and work out what my stable would be, I chose to follow the community recommendation and build an image from the "current" 1.4 branch. The README was all the instructions I needed.
So having configured the Mellanox NIC for passthrough and rebooted my ESXi box, then configuring the VM and adding a basic VMXNET3 for MGMT the PCI device for the WAN and inside trunk interface, I booted my ISO and ran the basic install process.
The Initial Config
Trying to type in loads of things to the VMware Remote Console without copy/paste and all that stuff is sort of annoying, so I always go with a basic config that allows me to SSH in and complete the config in a terminal. Firstly, I double checked the MAC addresses on the interfaces so that I knew which one was which. Thankfully the MGMT interface came up as eth0, and the CX4 ports were eth1 and eth2. Easy peasy.
Here are the basic commands I ran to sort out remote access:
set interfaces ethernet eth0 description 'MGMT'
set system domain-name 'mydomain.co.uk'
set system host-name 'vyos01'
set service ssh listen-address 10.31.74.1
set service ssh port 22
Commit and Save and then swap to a terminal attached to the MGMT LAN
The Network Design
Physically, there is quite the sprawl of stuff. I laid out all the assets that use a cabled connection only. The Wifi attached stuff (notably the tons of IoT things for HomeAssistant) would just clutter it all up even further.
When you see it on a picture, it's kinda obvious that whilst logically simple, my network is far far from simple in practical terms. A lot of it is accumulated crap I should probably destroy, but for now, this is what I am working with.
The Core Config
Here we look at the minimum viable product configs. This is what you need to make VyOS work as a firewall/router in the home setting.
That is all you need to have the router itself talk to the internet. Now we need to setup the client side.
My config has the inside configured as a trunk port and then a subinterface (vif) is configured for the VLAN:
Comically that is pretty much all you need to make the most simple router of all time. We don't want that tho. We want something a bit more useful than this!
In my network, I delegate the internal DHCP Server role to a PiHole that runs on a Pi3 in the closet. I do use DHCP in the Management Zone tho, and that lives here on the VyOS box.
Lets assume you want some specific IPs on things in that management zone so you can have polling in the future. Here is the qouple of lines you need for a reservation in that segment
Logging is important, so lets ensure its all enabled
Next up, to ensure our logs are helpful, we setup NTP to ensure the clock is synced at all times
And to ensure that LLDP works both ways, we set this up on our internal interfaces:
I, for reasons of lazyness would also like SSH to work over the Home interface
Finally, we want the OS to be tuned for performance, cos we are going to use a fast feed (may require reboot). A description of what this does is in the docs.
There is not a lot of point having an IPv4 firewall/router if it doesnt do NAT. Here we do all the fun things and stuff to make IPv4 source NAT work. Here I use 77 as a rule ID 'prefix' and then have dedicated rules for each source subnet. There are other ways to do this, but this is more surgical and verbose is my preference when it comes to things like NAT and Firewalls.
Now assuming we want to allow some inbound services from the internet, we need some destination NAT as well... I picked the rule number for a reason, but I guess its obvious this idea doesnt scale well. I kinda don't care that much tho ;)
Zone Based Firewall
This is going to be a bit more chunky. ZBF is universally agreed to be more scalable, but it is also universally agreed to suck balls to configure. The first build is annoyingly slow, but once it is in, adding interfaces into zones is trivial and policy can be inherited/deduplicated, which I think is a big win. I spent a little bit of time on automating this in Nornir, which I will publish soon. I think I could probably write an entire essay on ZBF concepts, so I will save you my ramblings and refer you to the Docs instead: VyOS Zone-Based Firewall Guide
ZBF - Basics
A lot of this is probably default config, but since VyOS configdb is idempotent, if you paste a command that already exists, it will just skip on past.
I also defined a group for inside networks that can be used later on. Feel free to define as many network and service groups as you like to clean up your config.
ZBF - Policies
So we need policies before we can apply those to zones. The Zone assignments refer to the policy names, so lets make them first.
In each policy I use a default-action drop, which is a sort of best practice. In more trusted settings, you might want a default permit (outbound from LAN to WAN for example), and typically people will use a default-action of accept. I prefer to put a drop policy in and then a single rule to permit all at rule 1, since if something horrible happens and I need to block something outbound (malware C2 or somesuch), I can insert a rule above to drop something specific. I can also change that rule 1 to only permit certain ports or protocols and drop all others. Basically, better set a sane default and override it, than have to change the whole policy in an emergency, which might have undesired consequences, thus compounding an already shitty situation.
I like to put a description in but lets face it, its pretty self describing.
I also enable the default log statement so all firewall rule hits are logged. This is again a good best practice. You wont care until you need this and then future you will thank you for this.
ZBF - Policies - Local
Local policies are those that originate or terminate on the VyOS instance directly. We need traffic in both directions (inbound and outbound) from and to the router, so we have four policies here.
Inbound policies (lan-local and wan-local) are all about things talking to the router directly. LAN side I allow anything and WAN side we block everything not related to an existing "inflight" outbound conn, but I also had to enable the inbound DHCP flows since these are stateless.
Outbound I again just permit all the things.
ZBF - Policies - Transit
Transit policies are ones where the flow is designed to transit through the router. In iptables world these are FORWARD rules.
As well as the default log and description, I have a rule that matches our destination NAT rule we defined previously. Notice how we use the "real" IP of the host on the inside. This can catch some people out who are used to firewall rules referring to the mapped IP on the outside of the firewall. I chose the same rule ID as the NAT rule ID, but that is my personal choice, there is no requirement to syncronise these rule IDs.
ZBF - Zones
Now that we have policies, we can assign these policies to zones. Zones and interfaces are a one-to-many mapping, i.e. one Zone can contain many interfaces, but for the avoidance of doubt, an interface can only exist in one Zone ;)
Here we created our three zones called wan, lan and local. We then assigned the firewall policies "inbound", and finally assigned interfaces to the zones. Note the special "local-zone" flag for that local zone.
A lot of people then go on to ask, what happens to traffic between interfaces within one zone, in our example. eth0 and eth2.9. These flows are unrestricted - think of them as a plain routed interface.
Note: MGMT and User traffic? Routed? No FW? WHATTTT? - This is my house, I don't care. Don't @ me.
At this point you should have a fully operational internet connection on eth0 and eth2.9, providing you legacy internet access. Whats legacy internet access? IPv4 only of course!
First, I should point out that my approach here is rather opinionated. Some will argue it is literally stupid. I also do not care about this ;)
The point of IPv6 was to deliver enough internet addressing that we could never run out like we have with IPv4. A side benefit of this was that NAT was essentially deprecated, since we dont need to share one public IP with many clients inside a LAN zone. For many, the obvious benefit is that all devices in your LAN get a public routable IPv6 address, and you control the flows on the border with a firewall, just like normal. In my workplace we have dual internet feeds, and we get IPv6 addressing from both. Which IP address pool should we use to assign IPv6 addresses to clients? If both, which should a client machine use to route outbound?
In some settings, the SLAAC addressing system makes a lot of sense, but in an enterprise, I still need concrete services, which means predictable, reliable DNS/IPs. Ultimately, I will find myself using some form of static addressing. Now, what happens when I change ISP? That /48 I got from the first one is now not mine, and I have to re-address my entire LAN. This is unsustainable.
Thankfully, someone thought of all of these use cases and came up with Unique Local Addressing, and Prefix Translation (sometimes known as NAT66) as a solution. With ULA, I can generate (and optionally register) a ULA prefix that should be globally unique for my site, and use that for all internal addressing purposes. I then configure NAT66 to swap the first 48 or 64 bits of my 128 bit IPv6 address to one from our IPv6 Prefix delegation we recieved from upstream. If the ISP assigns us something new, or we change ISP (via failover or physically replacing a supplier), the prefix substitution just does the work for me. So inside my LAN, I have a private address, from a single site specific prefix, and then as traffic heads outside my LAN I have an otherwise identical address, but with an internet routable prefix.
e.g. Inside my LAN my PC looks like this:
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet6 fe80::ca2:34c4:8b00:daa8%en0 prefixlen 64 secured scopeid 0x4
inet6 fda4:7911:df45:9:4cc:2a5a:8ee5:a917 prefixlen 64 autoconf secured
inet 192.168.99.111 netmask 0xffffff00 broadcast 192.168.99.255
media: autoselect (1000baseT <full-duplex>)
but on the internet, i am seen as:
% curl -6 https://ifconfig.co
All that happened is we swapped that first 48 bits.
Now some gamers might be screaming at me right now, I have no idea if this is good or bad for you guys, but given it is a 1:1 mapping, it is probably ok tbh. We will see in the comments I guess.
IPv6 - Interfaces
So first we need to enable dhcpv6 on our internet interface and request an IPv6 prefix delegation that we will refer to later on our inside interface. Init7 offer a /48 and in my case, whilst it is assigned via DHCPv6, this assignment is reserved for me in their system. It's logically static.
So WAN uses SLAAC for the interface IP, and over that it requests a /48 from init7's dhcpv6 servers.
I assume you have gone and got your own ULA prefix and registered it. Dont use mine!
Here I am cheating and statically assigning the :9::1/64 from our site /48 to the vlan 9 interface. I follow this scheme for all VLANs since I am unimaginative.
IPv6 - NAT66
Here we do the magic prefix switcheroo. I have a rule for each vlan ID so I can swap the /64 in a granular way. Realistically, I could compress this to a /48 on a single rule, but again, I like to be verbose so people know exactly what is happening at each stage.
IPv6 - Router Advertisements
Setup Router Advertisements on your inside interface, ensuring internal clients get a random SLAAC address from my ULA pool, and DNS Servers.
IPv6 - Firewalling
Here it is essentially a copy paste of the v4 rules from earlier but edited to a v6 approach.
set firewall ipv6-name local-wan-6 default-action 'drop'
I think most of this is self explainatory with the one small exception being we had to swap our DHCPv4 rule for a DHCPv6 rule in the wan-local-6 policy. Same problem, different execution.
IPv6 - Zones
As before, we assign our policies to our zones.
At this point you should now have a fully operational dual-stack router.
tv7 - Multicast
Last on my list here is the multicast work to allow TV7 to work on wired clients in vlan9. youc an add other vlans if you like.
First we need an IGMP Proxy. I translated these settings from the edgerouter configs shared by Init7 themselves. The EdgeRouter is a fork of vyatta, just like VyOS, so it should be identical. I found I didnt need the alt-subnet on the downstream side tho.
Secondly we need firewall rules. The rules from init7 are pretty broad, so I went more specific.
And that is it. A fully operational VyOS config for Init7.
Please let me know your comments and thoughts. I am always willing to learn.