Sunday, 28 November 2021

10/25 Gbit Internet at Home - a very 21st Century Problem

When I first arrived in Switzerland in 2017, aside from the clean air, beautiful countryside, and the promise of a highly paid job with which to support my family, I was transfixed by something that seemed completely alien to me - Fibre to the Home. Yes, I am a nerd, and Yes somewhat parallel to a comfortable living space, and good local schools, I made sure each house viewing included a search for an OTO.  No Fibre, no way.

When choosing an ISP, I had been recommended by colleagues to only consider Init7, since it was an ISP by nerds, for nerds. No brainer I guess? On top of the promise of "internet as it should have always been", coupled with a highly technical service desk that talks my language, they also offer something they call the "MaxFix guarantee". In short, they will offer you a service for a fixed price (77CHF/month), and for this miserly sum, you get the fastest speeds possible to your location. If you are in a weird offnet place, this might mean DSL or PPPoE, which as a Brit I am extremely used to already. In FTTH locations, that means 1 Gbit symmetrical, which I only ever observed in Datacentre settings for thousands of pounds per month. 

Words cannot describe the childish levels of excitement on that first day when I ran a speedtest at 980mb/s up and down. Its entirely pathetic, and I sincerely do not care either. 

In the buildup to that moment, I had a few weeks where I was shopping around for 1Gbit capable hardware, and I suppose I wasn't that shocked to find it was all pretty expensive. Four years ago there was basically nothing that handled gigabit symmetric plus FW/NAT and IGMP. Ubiquiti had their Edgerouter4 which had varying reviews at these speeds, and Init7 themselves recommended the Turris Omnia (OpenWRT) for tinkerers and the Mikrotik CCR for the more business end of the market. I flirted with the idea of pfsense/opnsense and even raw debian with nftables, but on the basis it was for "production" use at home, I felt this was likely to fail the wife test. If it went wrong, and I wasn't home, it was likely not something they could fix alone. 

Since I wanted to make use of the tv7 offering as well, which means multicast, I chose to stick to the examples on the Init7 site, which were focussed on the Mikrotiks. I bought a CCR1009 for about 400 bucks and the Media Converter plus optic/fibre set from Init7 for 111. This way if the MT didn't work out, I always had a copper connection option to fall back on.

I last worked with Mikrotiks maybe 15 years ago in a Rural Wireless ISP setting where they were the main option for people doing community funded projects like ours was. Whilst the feature list has grown with all the modern bells and whistles, the interface is still trash, and whilst it has an API now, that is also pretty garbage. I disliked it greatly, but I put up with it because it was relatively cheap, and did the job I needed it to. I was able to use the 7 copper ports to connect the rooms of my house over the built in CAT6 patching, and use the single 10G SFP+ port to connect to my ESXi hypervisor. 

So, for the last 4 years of so this has been my setup - the CCR is the core, ESXi for all the other things, and then a few Netgear GS108Ev3 switches out in the rooms to connect all the various crap I have around the house. Wifi comes from old Cisco 1702 (802.11ac) APs at each end of the Apartment. This managed to survive Covid WFH, my wife with her UK TV obsession, and my daughter growing up to be a Netflix addict - always providing a reliable, stable, performant internet service. 

Then as the Fresh Prince of Belair once said; my life got flip turned upside down. On Init7's 7th birthday, they decided to drop the option of 10 and 25 Gbit upgrades to their existing 1G fiber7 service, all within MaxFix guarantee, and thus no additional monthly cost. So as a reminder, I am a nerd, but specifically, I am a networking nerd. I run global scale network infrastructure for a major privacy focussed email provider. I live and breathe huge connectivity, and this was too tantalising to ignore. And so began, the race to 25Gbit at home.

Hardware

Firstly, I have a hardware lab in the basement that I collated during the Covid period for testing random things without having to visit a DC. Among three servers I found Mellanox ConnectX-4 dual port cards, which means I was all good for 25G at a physical layer. These can be bought for anything between 100 and 350 CHF, depending on the generation and source (CX-4s can be found for 100ish bucks online, CX-5s can be bought brand new from fs.com). In the 10G spec I found a very desirable Intel X710-DA2 and a bunch of X520-DA2s as well. I was spoiled for choice I guess you could say. 

Secondly, Init7 use Bidi optics so that they only occupy one single fibre in the ground. 10G bidis are about 100 bucks, and 25G bidis are about 350 bucks. This ended up being the top cost for me. 

Lastly, I needed some compute. I'm a big fan of the HP Z series workstations because they are powerful (Xeon), but quiet. For this job, I found the z220 SFF box on Ricardo for 139 CHF which appears to have two PCI-e x16 slots. When I started to dig into the details however, I found that one is "limited" to GFX workloads, and the second is a x16 "mechanical" but only x4 "electrical". In other words, its a lie. I must have then spend dozens of evenings digging into the spec sheets of god knows how many systems, hoping to find a SFF box that provided two x8+ slots. The best options I found were Supermicro boxes that were hard to source second hand and new, cost over 1000CHF. I also tend to find Supermicros are noisy since they expect to be in a computer room, not the home.

At this point I decided that a dedicated "router" system was not going to pass the wife test, so I decided to utilise PCI passthrough and place my routing system on a VM within my existing HP Z620 tower running ESXi free edition. This beast of a box now has the second CPU riser hosting a pair of E5-2670s meaning it boasts 32 cores @ 2.6Ghz and 192GB ECC RAM in a very quiet case that generates relatively little heat in my hall closet. Compared to the DL360s I have of equal power that sound like Jet fighters most of the time, its fabulous. A similar system can be obtained from Ricardo for around 400CHF. Mine was much less, and the second CPU unit was maybe 150 on eBay in the UK. I don't need all this grunt for this job, and realistically all you need here is two PCI-e x8 slots. I have this system also running TrueNAS for all my media, and tons of networking lab stuff.

I also did an Init7 referral recently, which meant I had a credit outstanding with them for 111CHF, which is exactly the one off upgrade price for the service. In the end, my total out of pocket expense for this upgrade was that Bidi Optic. Everything else was covered*.
(* except for a switch change, but I will cover that later)

25G - an attempt was made

As any networking nerd can attest, MOAR SPEED is always important, even when it isn't. We have simple pleasures in life and for me, getting 25Gb connectivity into my house was a badge of honour I wanted to wear. Of course there are nigh on ZERO internet properties that can and will service you as a client at 25Gbit. SO WHAT? I WANT IT!

Over the following 3 months I literally went over every possible permutation for delivering 25Gbit into the house, WITHOUT failing the wife test. I failed. The problem is not the routing part, or even the compute to handle the NAT/FW features. I was able to prove that all of that is relatively trivial if you pass the 25Gb NIC into the VM, mainly because the NIC does most of the work, and the CPU is now so fast, and the linux kernel is so quick, that you can do this with modest CPU/RAM. I was able to do all this in VyOS with 4 cores and 8GB RAM. 

I should also do a shout out to Netgate, who as part of this reddit thread, reached out to me directly and offered me a Lab licence for their VPP enhanced router called TNSR, and even spent a bunch of ticket time with me tuning the shit out my install. Unfortunately their product is still targeting the DC/Edge use cases, and right now its not really able to do all the things a home user wants. and no native support for IGMP/Multicast was a deal breaker for me. Their base is CentOS8, so I could have fixed that myself, but I just felt it was too snowflakey to last. I will track the project and may even use it in the work setting one day too. 

Ultimately the problem with all these VM solutions, was getting that 25Gb out of the VM, and back to the Hypervisor where other 25GB connected VMs can consume it. Without a 25 Gbit switch, I was screwed. 

As it stands today, there is only one consumer friendly device that offers at least two 25GB ports - the same device that Init7 and other customers have field tested for 25 Gbit connections, and verified to NOT be able to handle 25G flow rates: the Mikrotik CCR2004. At 500 bucks, to deploy this in a simple switching role would actually be really expensive, since I would need a bunch of 1G copper SFPs as well. 

I looked around and an interesting option was this HPE 4x25G QFSP28 card which seems to lurk online at £65(!). What put me off was my experience of cards like this (the CX5-100G specifically) where the breakout feature doesn't work "outbound" and so I decided against buying one. I still hold a candle for the hope to setup this with a 4 way passive breakout box. I might revisit this at somepoint when I have more fun money to spare tho. Outside of this, the only options were ex-DC switching which is still quite expensive both to buy and to run 24x7, and obviously extremely noisy. 

I will return to this when some more 25G gear appears on the market, and my hope is I can configure something like this breakout card with something like an ONL/OVS to switch 25G again via a VM with a passthrough card. Until then....

10G line rate for shits and giggles

Having realised 25G was going nowhere, and this 111CHF credit was burning a hole in my pocket, I switched off the 25G obsession and focussed on something more plausible. I had 10G working in the house already, and I could see the 10G optics were in stock at the supplier. I pulled the trigger and it arrived 3 days later. I was finally ready. 

I decided the best route forward was to migrate my existing config off the CCR1009 to a VyOS VM, and then move the 1G bidi over so that it could be tested and left to run in situ, such that on activation day, all I had to do was swap the optic over. It then gave the added benefit of proving anything that was broken was broken by the optic swap and not the swap from MT to VyOS. As it happens, this was wise because I had to fix a bunch of things in the process. 

The day before I wanted to swap I had a realisation that I needed a second 10G port. The CX4 card is passed in fully to VyOS, and the x710 sits as the Hypervisor uplinks. Originally I wanted to use a VMXNET3 to connect the "inside" to the rest of the network (via the hypervisor uplinks). This way the VMs would get the full 25G and the rest of the network (which is mostly 1G connected) gets to share the 10G port between the H/V and the MT1009. Thats when the story gets weird. 

During my 25G tests I was able to switch within the ESXi box, VM to VM, within a single VLAN at 22-24Gbit/s. That is to say VM1 could iperf3 client to VM2 as a server within the same vSwitch Port Group. Unfortunately, when you try to route over a VM from a passed NIC or any other VMXNET3 NIC, your singlethread performance was limited somewhere between 6 and 8Gbits/s. That is to say the exact same VMs running iperf3 being separated by another VyOS VM using VMXNET3s/X710 or a mixture of those over a 10G switch (a borrowed MT CRS305 in this situation), would always get stuck around the 6-8Gbits/s range. 

Now. Thats not awful, but it felt like I was leaving too much on the shelf here. So I found a very very interesting MT switch, known as the CSS610, which has 8x 1G copper ports and 2x 10G ports, for 100 bucks. This carbon copy replaced the CRS1009, and was cheeeeeeeeeap. 

Therefore, my final hardware installation now is a CSS610 in the middle of the network, with copper out to the rooms as before, one 10G port faces the Hypervisor vSwitch, and one faces the VyOS Passthrough NIC as an "inside" trunk port. The other port of that NIC faces Init7. Having it arrive as a greenfield box also meant I could do a lot of config in parallel, without bothering the family who were using the network all day every day. This was also a good thing, because if I thought the RouterOS interface was crap, the SwitchOS is raw sewage...

The Build

As a somewhat seasoned VyOS user, and coming from a Juniper shop, the configuration started off quickly, and I was able to deploy the core config within an hour. We had a stateful FW, NAT, DHCP and two internal interfaces on the trunk port - "MGMT" and "Home". Having spent another hour shouting at the firewall policies, I took a hint from the VyOS subreddit and swapped all of that stateful FW stuff for Zone-Based Firewalling. At first I hated it mostly because its very cumbersome to frame it all up, but having now added another internal interface for my lab space, and a few rules, all with little to no effort, i'm already appreciating it all. It reminds me of the configs we use on our SRXs in the office, and generally, it feels easier to use now I figured out the syntax. 

When I took the plunge and moved the optic over, the internet worked instantly, but I had some weird issues with my firewall policies that needed a tickle, and IPv6 was just dead. I eventually figured that out and now we have a nice, clean, basic config. 

Over the next couple of days I tuned out some of the tv7/IGMP things, and then had a minimum viable product that far exceeds my old config. I was able to do an iperf3 towards the Init7 speedtest box (which is now 100g attached - thanks Pascal!), and got line rate 10G both ways. 

I then tried a bunch of tother things and typically I see 6-8Gbits on most internet properties. This was bugging me and I went back to some of my tests. The ones where I saw 22Gbits/s were also capping out at 6ish. It then dawned on me. I started all this work around the early summer time. I then spent much of the late summer in the UK since they lifted all the travel restrictions, and when I returned, one thing I brought back was that second CPU riser for the z620. To fit that riser in, I had to remove an NVME card I had in there, and move all the VMs over to the built in SSD. That SSD is connected with 6Gbit SATA. Invariably I now see that download and write to disk caps out at 6ish GBit/s. I will probably have to buy a dedicated device for either the TrueNAS or the router so that I can put the NVMe back into the hypervisor, and restore the fast writes. My concern is that moving the NAS out will require me to add more 10G ports to the mix, and this makes things even more costly. 

So here we are at the end of the story. I now have a theoretical 10Gbit capability in my home, which far exceeds quite a few of the DCs I have worked in over the years. You can argue with me about contention rates and blah, but I trust the information I have from Init7 about their NewCore7 scheme, which for at least a very short period of time I studied quite closely, since I actually applied to work there during the early phases of the project. Yes its contended, but likely not like mainstream ISPs are. The loop design they have and the way they deploy their backhaul 100G links mean I have confidence it will take a while before I see contention in my service caused by something on the Init7 side of things. Couple that with their peering being immaculate, most of the congestion I experience is likely at the remote end. As a guy who builds those networks for a living, I know acutely how the compromises are made and in many cases, I don't care that things go "slow" now and again. Everything I experience here is orders of magnitudes faster than I used to see in the UK, where the best I ever got was 300Mbit from Virgin Media, which at best, was 50mbit usable, and regularly in the low ADSL service bracket. 

Somewhat unrelatedly, having observed the battles that Fredy and the team have taken on with Swisscom and the monopolistic behaviours here in Switzerland, I appreciate their efforts, just like ours at Proton, to make the internet a better place. Its great to have a job, but its even better if your job can be doing something bigger for the greater good. Net Neutrality is of critical importance, and the init7 team are fighting the good fight. We should support them however we can. For me, I give them my money, and I advocate that anyone else here in CH do the same if they can. 

For those that are interested, I will shortly follow this post with a breakdown of the VyOS config, and when I get time, I will clean up the nornir project I wrote to do the config management to opensource it for the community. Maybe its to niche for some people, but i'll try to keep it as generic as possible and easy to extend. I am also hoping to add VyOS support to Google's Capirca project, as I think generating ACLs is extremely boring, and capirca abstracts the domain specific nature (i.e. vendor) away to make a more readable source material for firewall rules.

No comments:

Post a Comment

My Fiber7-X VyOS Config

As I discussed in my 10/25Gbit internet at home post, I recently moved away from an appliance based router at home, to instead use a fancy(...