Tuesday, 20 October 2020

The even-ended number problem in Go and Python

 During the Go Essential Training course on LinkedIn, the instructor sets up a problem for you to solve. The solution is in the next slide of the course, and mine was ever so slightly different anyways, so I doubt that this needs to come with a spoiler alert, but anyways...

An even-ended number is one that starts and ends with the same digit. That is to say 1, 11, 121, and 10103921 are all even ended numbers as they all start and end with the same digit. 

The problem posed, is how many even-ended numbers can be found in the multiplication of all combinations of two 4 digit numbers.

In other words, if you multiple all the numbers in the range 1000 - 9999, by all the numbers in the range 1000-9999, how many of the results are even ended numbers. 

The Go solution I came up with first generated about 6million. This was when I started to wonder if I was double counting, since 1001x1011 and 1011x1001 are the even-ended, but also the same result, meaning I counted 2 when I only needed to count 1. Rather than just cheat and divide by 2, I instead made the second loop seed the number from the first loop to ensure that we only count pairs once. 

Here is the Go Solution I wrote:

Checking that with the solution in the course, its pretty much identical,  and the result was the same. I was then telling a friend how much quicker it was without all the debug fmt.println() crap I had used to validate each stage of the loop. His response was that golang is mad fast and to compare it to Python3.

Challenge accepted. Given I knew the logic I needed already I sort of shocked myself that I was able to port this to Python3 in the time it took me to retype it essentially. Save for one Typo, it ran first time. 

Here is the Python3 Solution I wrote: 

What was a real eye opener for me, was the runtime stats. 

% time go run even-numbers.go
We found 3184963 even-ended numbers
go run even-numbers.go  6.09s user 0.50s system 106% cpu 6.202 total

% time python3 even_numbers.py
3184963
python3 even_numbers.py  59.78s user 0.20s system 99% cpu 1:00.03 total

Literally 10x the speed difference between python3 and Golang. Bonkers.

I can see why people love Go as an alternative to python. Both are completely portable between operating systems and architectures (caveat emptor not withstanding), and both are very readable, and approachable. Go is just MUCH faster. 

I'm enjoying the learning experience so far and apart from endlessly using pythonic syntax within the go space almost every single day, I'm making a lot of progress. 

I still expect my first "real" golang project will be an API service replacing something old and crusty I already have in python. When I am ready to go there (geddit!), I will be sure to write it up here of course.

A Modern NetDevOps learning plan

Now that I have a load of free time I have set up a sort of a learning plan to plug some gaps in my knowledge that I always wanted to plug, but didn't have the time to do properly.

Mindful that I have an opportunity for learning without external influences on my time, but also that my Daughter only has Kindergarten until lunchtime ever day, I really only get the morning before the family time kicks in.

So, every Morning I do my German spoken practice while walking the dog. This clearly amuses various folks, but maybe in a few weeks i'll be able to explain to them in language beyond that of a 5 yo child what it is I am actually doing. 

I then put aside 2-3h to do some programming - Python and Go on alternating days. This is a major boost for me since I have had periods in my career when I have been able to get into programming and found it to be immeasurably useful in my day to day work, particularly in speeding up weird and wonderful tasks, but I never really got the time to round out that skill beyond point based problem solving. There was a little period maybe 4 years ago when we re starting the major ACI rollouts that I was writing raw python interacting with the ACI Model Object tree, but it never really got to what I would call Advanced levels. 

Typically my python work starts out as a simple script to do one thing, then I refactor it to functions and workflows and eventually I will either modularise it, or hand it off to someone more knowledgeable in the team to improve.

Flask is an area that really interests me since making my random CLI scripts into something presentable in a Web UI or more realistically, exposing it as a REST API.  Presenting my artisinal hand crafted nonsense as an API endpoint would probably increase the takeup of my "little shortcuts" since most of the team understood how to talk to a REST endpoint, but had little interest in hacking up my python abortions. 

Go was never really part of my approach until I got into Kubernetes, and found myself needing to at least understand what code did. The more I read, the more I came to realise Go is at the intersection of old school compiled C and modern interpreted Python. Having tried and failed a number of times to learn C to a respectable level, Go seems like a unique opportunity to revisit this.

As a PacktPub subscriber, I started with Learning Go Progamming which was a little dry, but got me going, and to reinforce that beginner knowledge, I am half way through the Go Essential Training course on LinkedIn, which is working out quite well. 

I use a pomodoro technique for the learning time, splitting the sessions into three 45 minute blocks with breaks inbetween to let the brain cool off. I had done this for the original 25/5 mins approach ever since a colleague back in the UK showed me, and whilst it worked, a well known security researcher I follow had a thread on this topic recently, which resonated with me. Having moved to three sessions of 45 min on, 10 mins off, I find I take more in and retain it. 

One very interesting thing I hit this week was a simple challenge in the Go training, which I will write up separately, but from problem statement to working Go code took me about 40 minutes. To then translate that to python3 took me about 4 minutes. That was a major surprise to me. Firstly I thought that was pretty decent for writing in a language I didn't know. Secondly to then take that logic and just apply it into Python3 in such a short amount of time, it really hit me that I have a lot of Python knowledge in there that I just don't credit myself with. I'm absolutely no expert, and the code I write is still a dumpster fire, but at least I can do it quickly.

Once the coding session is done I then have about 1-2h left for Network Lab time. This setup I will post also separately, but needless to say, like all the learning things I do, its sorta over the top stupid. 

On a Wednesday I substitute the programming specifically for time in my network telemetry lab. Here I have prometheus setup in my kubernetes cluster as a TSDB, and then I use Cisco's Pipeline project, and things like telegraf to funnel streaming metrics from whatever I can get that runs it into prom so that I can then build out dashboards and the like in Grafana. Once I reach a critical mass on that, I will probably pivot slightly to then build multi metric alerting rules in alert manager. My hope is to find some set of metrics that provide value here to then write a Python/Go app that constantly monitors historical metrics to generate new moving markers for alerting. Idea being, you can use the past to track your baselines and do some basic anomaly detection, without having to go full ML, or buy an expensive product.

At lunchtime it will then be time to pick up my daughter and spend the afternoon entertaining her. When I get my new job I expect that I will be very focussed on that for a little while. It therefore makes sense to use as much of this current free time to top up the good will piggy bank, knowing that I will have to borrow some of that back for a short while as I settle in again.

My first hope at the end of this is to finally get my German up to a level that is more communicative than basics in the shops. My second hope is that I can really embed some of this programming knwoeldge into the front of my brain rather than leaving it busied at the back, covered in cobwebs. The final hope is that I can put all of this together into a compelling package for a company to hire me. Time will tell.

Monday, 19 October 2020

End of an Era: The Solera Years

Two weeks ago I was made redundant. COVID and "shifting business priorities" meant a re-org of my department and for whatever reason, that is that. Over the last year or so I had assumed this day might come and the question I had internally was if I would jump or if I would be pushed. Since I was slap bang in the middle of a key project, I had assumed I might still have 9-12 months or so left to close that out. Turns out I was wrong. Thankfully my documentation is average enough for them to cope without me.

Given I was somewhat mentally cognisant of the risk I wasn't overly upset when I got the news, and the Company have treated me very well in the exit negotiations etc. I hold no ill will for them, and nor they me. It's as good of a break-up as you can expect I suppose. 

As part of looking for a new job I had to dig out and dust off my CV, which was about 10 years old. I had sort of kept it up to date, but not really, so I found myself savaging old and crusty "skills" and "achievements" and trying to boil down the last 10 years into a few paragraphs. It's a lot harder than it seems. 

What's worse is there are two or three directions in which I could head from here, and each would probably require different versions of my CV, that focus on areas that would be of keen interest to certain employers. It is only really now that I can appreciate just how lucky I was for the last few years to be able to spread my wings in so many different directions. 

I've done Network Architecture work as we designed, then brownfield built out new ACI fabrics in eight of our major DCs in the world. I led that team in the design phase and then lead the rollout team as well to ensure we got it done right. During that rollout phase we made sure that all of our 11 Engineers were fully gitops trained as well. All our changes are now done in the gitlab repos, approved by other team members and then merged to be run by a gitlab pipeline or Ansible Tower. This was then updated in the last year or so with a highly distributed, Infoblox backed BIND DNS infrastructure, saving us a ton of money on licencing overpriced hardware, but still allowed us to utilise the high quality DDI front end of Infoblox, and its very decent RPZ based DNS Firewall. Finally, we migrated our legacy BGP confederation on Cisco hardware, to a completely software based BGP routing tier for our internet border, which I designed and then shepherded into operation with support from two outstanding engineers.  This allowed us to come away from vendor blended internet services, which are a complete nightmare to live with, and replaced them with our own homebrew blended service.  Here we used commodity Transit full tables and added additional prefixes from direct links into IXP peering LANs at a front end level, aggregating all these paths in a route reflector tier, where we used BGP Traffic Engineering principles to then assemble locally significant full tables which we present to Debian VMs with FRR to operate as backend gateways to our border firewalls. Our ability to do traffic engineering on the fly was a significant improvement to our customer experience, and access the IXps meant we could drop latency to some key locations in Europe without having to play the blended internet ticket dance with whichever DC it may be.

This highly optimised DC infrastructure is probably my proudest achievement of the last 4 years.

More recently i've done Cloud Engineering in AWS and Azure, from the basics of On-Prem network integration, all the way through to personally designing and implementing a Terraform & Ansible driven, Gitlab CI controlled application stack on Windows IIS/ASP.NET/SQL Server. That customer facing deployment replaced a legacy DC setup in a rather remote location and saved the company 6 figure sums in just 5 weeks.

I've also done RedHat Openstack and Openshift with Ceph on our own bare metal. I've then trashed that all and replaced it with my own homegrown gitlab pipeline; terraforming VMware VMs, then handing them to Ansible to install Opensource Kubernetes, with integrated vsphere storage-classes, and then using helm to deploy a ton of things for a minimum viable product. I've then had to adapt all of that to use VMware PKS instead of open source. That was a fun 6 months...

Lastly I have then had a chance to really shake things up and build out a complete opensource DC design. This included Cumulus Linux on Mellanox Spectrum, Penguin Computing Servers for compute running Kubernetes directly, Ceph storage again, and an enhanced version of that software based internet BGP routing stack. The half rack pod costs $400,000 to buy, can operate environments that deliver 10x that in revenue, and sits under 10kW in peak consumption.

Throughout the last 5 years I have been able to push the limits of what I know, and what the business was comfortable with. I have made a ton of mistakes, most of which thankfully didn't affect our customer experience, and learned an absolute ton about not just the technology here, but myself as well.

Much of that is thanks to the support I had from my CIO at the time who was quite the disruptive influence. He knew that the right thing wasn't always the easy thing, and he always pushed us to be the best we could be. 

That in and of itself wasn't always the best thing, and I have to acknowledge that at times I was difficult to live with and I didn't always give my family the best of me.  Perhaps the hardest lesson to learn was how to draw the line between what I need to do and when I need to get it done. We always need "another 5 minutes" to finish, but we also know we never really finish either. I also think in retrospect that my desire to push the envelope has placed me in uncomfortable positions on the Dunning-Kruger curve at times as well. Having the support of people can inflate the ego, and sometimes that ego can drive you to arrogant cul-de-sacs of isolation. One hopes I am a little more humble now than I was just a few years ago.

As I look now to the future, I have to choose whether I want to remain in a hands on role, or move upwards to the executive suites. Up there in the ivory towers, the money and risk is higher, and the skills are used less often, more as a balance to BS than anything operational. That's OK, but i'm not 100% sure i'm ready for that yet. My people skills have improved immeasurably since I started running engineers all those years ago, but I don't get the same buzz from fixing the budget as I do from fixing a problem. I love training and inspiring the next iteration of engineers, but I tend to do that mostly by showing them how to do something, not by fighting the business to get them training time with someone else. End result, I think I need to be near the action still.

So then, given my exposure, and moderate successes within the cloud and devops world, I have the option to go full time in that direction. I seem to know more about kubernetes than a bunch of people who claim to be experts in it, although real experts like those at Heptio tell me that is very, very common... I think that would be a lot of fun for me, and as technology has already changed so much in that direction, it's a great option for career growth. The thing that puts me off that slightly though, is the fact that every man, woman and their dog is off in that same direction, and standing out in a sea of 10,000 CVs is always a challenge. Never one to shy a challenge, I think I will still try, but the pessimist in me things that the competition is high there.

I also have remained very close to the security space for the last 10 years, and whilst I am no Pentester, and I am not likely to enjoy a job in GRC any time soon, I think I have a lot to offer the Secops realm. I have worked as a sysadmin for many a year, and in network infra and design. If you speak to any Security "rockstar", this is the exact heritage that they want people to enter the space with.  My greatest concern is that since movign to Switzerland in 2017, I have been the main breadwinner in the family. and for me to enter that security space, I will probably have to start in a more junior role, and then qualify back up to the salary I hold today. This is probably a bit of a strain for everyone, and so absent of extra income, it possibly a bit folly to expect that right now.

So realistically, my best bet is to play to my strengths and focus on my key competencies of Networking and Modernisation. There are many businesses out there that are keen to move beyond the "hello world" and into proper CI/CD style operations on their networks. I have lost track of how many people I have spoken to and observed in the community who are happy to write an Ansible play to update the NTP servers on their fleet of Cisco Switches and Routers, but wouldn't dare use it to add or remove a local user after someone leaves the business. They're lacking confidence and they're worried it will break things. This whole topic also hastily avoids the conversation that they haven't deployed tacplus or freeradius for the same reasons. 

Sometimes doing the basics are sort of boring, but by the same token, getting the boring stuff out of the way opens the door to doing something interesting, well. Before, I used to save the interesting stuff for my free time, and as already noted, this took time away from my family too. What I look to do next is to help my next business do the basics very well with automation and then use that new found freedom, to look to see what we need to do better.

If you know some place that needs someone like me, let me know using the contact details at the side.

Thursday, 2 July 2020

Doing YANG Wrong: Part 5 - Manufacturer/Model deviations

Part 5: Deviations

So we can talk to the device, and we can use candidate configs to stage and then apply configs in aggregate, but we still can't make a CSR1000v take a simple openconfig IP address. At the beginning I deliberately called out I wanted to use generic models, and avoid the deviations. This is because the python model binding i then generate only works on that vendor's box now. This isn't terrible, but it's not really what we want. Lets see if it works tho.

Lets look at the hello statement for the ip address model again, and then fetch in the deviation it describes.
netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --hello | grep openconfig | grep ip
<nc:capability>http://openconfig.net/yang/interfaces/ip?module=openconfig-if-ip&amp;revision=2018-01-05&amp;deviations=cisco-xe-openconfig-if-ip-deviation,cisco-xe-openconfig-interfaces-deviation</nc:capability>
So there are actually two:
cisco-xe-openconfig-if-ip-deviation
cisco-xe-openconfig-interfaces-deviation
Lets pull these off the box like the others.
netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --get-schema cisco-xe-openconfig-if-ip-deviation | xml_grep 'data' --text_only > cisco-xe-openconfig-if-ip-deviation.yang

netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --get-schema cisco-xe-openconfig-interfaces-deviation | xml_grep 'data' --text_only > cisco-xe-openconfig-interfaces-deviation.yang
Lets rebuild our python module with these deviations in the bundle:
clear && pyang --plugindir $PYBINDPLUGIN -f pybind -o interface_setup.py *.yang --deviation cisco-xe-openconfig-interfaces-deviation.yang cisco-xe-openconfig-if-ip-deviation.yang

cisco-xe-openconfig-interfaces-deviation.yang:11: warning: imported module "openconfig-if-ethernet" not used
cisco-xe-openconfig-interfaces-deviation.yang:23: warning: imported module "ietf-yang-types" not used
INFO: encountered (<pyang.error.Position object at 0x7fb3adcc8370>, 'UNUSED_IMPORT', u'openconfig-if-ethernet')
INFO: encountered (<pyang.error.Position object at 0x7fb3adcc6cd0>, 'UNUSED_IMPORT', u'ietf-yang-types')
Hmm. So its saying that in one deviation file (the interfaces one) there are two models that are defined, but not used. This is annoying, and goes to show how flaky some of these modules can be. I have to find the module definitions in the interfaces yang file at lines 11 and 23, and then comment them out. We can then rerun to get a compiled python module.

We then re-run our script:
python3 push_inside_if.py
Successfully configured IP on 192.168.70.21
Oooooh. Sensecheck please....
in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description Inside Interface
 ip address 109.109.109.2 255.255.255.0
 negotiation auto
end
Yay. Lets see if promise theory can be used with our models too. We will change the IP and the description.
python3 ./push_inside_if.py
Failed to configure interface: illegal reference /oc-if:interfaces/interface[name='GigabitEthernet4']/subinterfaces/subinterface[index='0']/oc-ip:ipv4/addresses/address[ip='50.60.70.1']/ip
Womp Womp :o(

Ok lets try everything except the IP itself (desc and mask):

python3 push_inside_if.py
Successfully configured IP on 192.168.70.21


in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description People are stupid
 ip address 109.109.109.2 255.255.254.0
 negotiation auto
end
What is interesting therefore, is that I can change everything except the IP... Subnet mask - fine, Description - fine. So, looking at that earlier error, it could well be that the existing interface model states subinterface[0] with IP address 50.60.70.1 doesn't match the existing IP address 109.109.109.2 when we push the model to the box. Thus, we cant reference the IP address we want in the interface model since it doesn't exist in the device's copy of the model.

So to round this out, lets get the IP off the interface instead.

we edit line 66 and make that add(inside_ip_addr) into delete()

inside_sub_if.ipv4.addresses.address.delete(inside_ip_addr)

fail:
python3 ./models/interface/push_inside_if.py
Traceback (most recent call last):
  File "/home/jhow/.local/lib/python3.8/site-packages/pyangbind/lib/yangtypes.py", line 848, in delete
    del self._members[k]
KeyError: '109.109.109.2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 66, in <module>
    inside_sub_if.ipv4.addresses.address.delete(inside_ip_addr)
  File "/home/jhow/.local/lib/python3.8/site-packages/pyangbind/lib/yangtypes.py", line 852, in delete
    raise KeyError("key %s was not in list (%s)" % (k, m))
KeyError: "key 109.109.109.2 was not in list ('109.109.109.2')"
This makes sense because the model in the python memory is independent of the config of the device. You cant delete something that doesnt exist in memory yet, and adding it then removing it leaves you with an empty model to push, which has no effect on the device. We would have to pull the config and parse it into a model before we could know what we have to delete, or know that what we have injected into our python model matches the state exactly on the device. The latter is fine if you have a good source of truth.

It is at this point I spot that we have to add a netconf operation "delete" to the modeled interface to delete it.
nc:operation="delete"
...needs to be put into the xml wrapper around the ipv4 address of the subinterface
 
I can't see how this pyangbind xml serialiser supports providing netconf operations. To validate this theory, I first try to export the serialised XML for a create operation, and manually add the operation="delete" to the ipv4 tag.

manual = '''
<interfaces xmlns="http://openconfig.net/yang/interfaces">
<interface>
<name>GigabitEthernet4</name>
<config>
<enabled>true</enabled>
</config>
<subinterfaces>
<subinterface>
<index>0</index>
<config>
<description>Deleted</description>
</config>
<ipv4 xmlns="http://openconfig.net/yang/interfaces/ip">
<addresses>
<address operation="delete">
<ip>109.232.176.2</ip>
<config>
<ip>109.232.176.2</ip>
<prefix-length>23</prefix-length>
</config>
</address>
</addresses>
</ipv4>
</subinterface>
</subinterfaces>
</interface>
</interfaces>
'''

We then alter the send_to_device function exactly as we did before, and send this formatted XML in the <config> brackets, instead of our modeled object.

I save this as pull_inside_if.py to keep the files separate, then push the interface in, before pulling it back out.

python3 ./models/interface/push_inside_if.py
Successfully configured IP on 192.168.70.21

python3 ./models/interface/pull_inside_if.py
Successfully configured IP on 192.168.70.21
Checking the running config on the box and it did the trick.

in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description Deleted
 no ip address
 negotiation auto
end
So now we have to ask ourselves. Do we care about idempotency? When it comes to pushing a change out, there are 3 states we will observe:
1) the IP on the interface doesn't exist and should exist. <- works
2) the IP on the interface exists and we need to change something that isn't the IP itself <- works
3) the IP on the interface exists and we want to remove it <- works.

What we cannot do in one step is an in situ replacement of the IP, since that is a key value in the model itself. Considering this sort of operation would be bound to the operational hooks of a CMDB, the events we would expect are add a new thing, change something on that existing thing that isnt the thing itself, or delete the thing. On that basis, it shouldn't be unreasonable to codify that you should decommission an interface, and then commission your new one as a new config. Thus you cannot "update" keyed values in situ, but must instead remove/commit/add/commit in order.

Given the likely use case for a CMDB hook response, I guess we can live with that.

Wednesday, 1 July 2020

Doing YANG Wrong: Part 4 - Config stores

Part 4: Config stores 

As we head down this rabbit hole, we start to get ever closer to something useful, but ever more deep into the weeds of NETCONF/YANG. For the uninitiated, config stores are a place where you can put config chunks either singularly, or in aggregate over a series of netconf pushes, to generate a new config that you will apply in one hit. In the SP world you might want to setup some interfaces, some BGP and then some overlay like an MPLS service. You may not want to do all this in one script since you might not need all of those things in one script. better to make a script per thing, and then call the ones your CMDB thinks it needs, place them all in a candidate config, validate that as a whole and then push that in one go. If anything fails, you can then throw it all away and not touch the operational running config.  At this point juniper folks are shrugging their shoulders - this is very old hat to them, but for enterprise people, particularly the type who lived on the CLI for years, this is unimaginable.

There be dragons here still however. There is invariably only one candidate config store per device, and there is always the chance that someone did something and their script didnt clear out in error handling, or the chance that some other script or person hops onboard whilst your session is in progress. Either way, you could end up with a poisoned candidate and a mess.

To approach this risk, there are two things to help treat this. First, is config locking. When you start a session it's a good idea to lock the target datastore (candidate), and the indented datastore (running). This prevents any out of band changes to the running place, and no inflight changes from other agents on the candidate config. All scripts should have error handling in from the start to discard config and release locks.

So. Lets tickle our script a bit. We need to add support for the config candidates. All this is handled in our function send_to_device()

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
# we must handle errors nicely so always work in a try loop
try:
assert(":candidate" in m.server_capabilities)
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


You'll see we added stuff into our try loop, added a discard_changes into our exception loop and a finally section to always unlock the running config as we said we would. Lets give it a spin.
python3 ./models/interface/push_inside_if.py
Failed to configure interface:
Lame. The fact we have no errors here makes me think we have actually got raised by the Assert failing (Exception != AssertionError). Lets add a new handler for this.

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
# we must handle errors nicely so always work in a try loop
try:
assert(":candidate" in m.server_capabilities)
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except AssertionError:
print('Go and enable candidate configs pls')
m.discard_changes()
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


python3 ./models/interface/push_inside_if.py
Go and enable candidate configs pls
Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 20, in send_to_device
    assert(":candidate" in m.server_capabilities)
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 32, in send_to_device
    m.discard_changes()
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 292, in __init__
    self._assert(cap)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 367, in _assert
    raise MissingCapabilityError('Server does not support [%s]' % capability)
ncclient.operations.errors.MissingCapabilityError: Server does not support [:candidate]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 64, in <module>
    send_to_device(dev=device_ip, user=username, password=password, py_obj=ocintmodel.interfaces)
  File "./models/interface/push_inside_if.py", line 38, in send_to_device
    m.unlock(target='running')
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 49, in request
    return self._request(node)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: {'type': 'protocol', 'tag': 'access-denied', 'severity': 'error', 'info': None, 'path': None, 'message': None}
Puke.

As we dig over that, we can see we were right, and the AssertionError wasn't being picked up, and now it is, but then after that it barfs everywhere with additional exceptions.

Looks like the first one is the m.discard_changes() in the AssertionError handler and the second is the m.unlock('running') in the finally block.

The first one is sorta obvious, if you don't have support for candidates, you didn't even try to lock the config, cos that assert is first. We can delete that.

The second is in the finally: block which always runs, is failing because we don't support it still.

My first thought was to wrap the m.unlock in another try block, with an except: pass concept, but that's a hack that could hurt you later. What I did instead was two try blocks, one for the candidate check and one for the config changes. If we don't support candidates, we bomb out don't try to do any config.

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
try:
candidate_supported = True
assert(":candidate" in m.server_capabilities)
except AssertionError:
print('Go and enable candidate configs pls')
candidate_supported = False
if candidate_supported:
# we must handle errors nicely so always work in a try loop
try:
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


Try again:
python3 ./models/interface/push_inside_if.py
Go and enable candidate configs pls
Nice and clean. Now where were we? Ah yes. We need to actually enable the candidate datastore on the router...
netconf-yang feature candidate-datastore
and try again...

python3 ./models/interface/push_inside_if.py
Failed to configure interface: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure
Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 31, in send_to_device
    with m.locked(target='candidate'):
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 67, in __enter__
    Lock(self.session, self.device_handler, raise_mode=RaiseMode.ERRORS).request(self.target)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 35, in request
    return self._request(node)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 68, in <module>
    send_to_device(dev=device_ip, user=username, password=password, py_obj=ocintmodel.interfaces)
  File "./models/interface/push_inside_if.py", line 39, in send_to_device
    m.discard_changes()
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/edit.py", line 192, in request
    return self._request(new_ele("discard-changes"))
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure

Lol? One thing that could be happening is that the candidate config is a copy of running, and if we lock running, it cant make a copy of running for the candidate store? like a race condition if you will... Lets move the running lock into the with loop for locking candidate:

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
try:
candidate_supported = True
assert(":candidate" in m.server_capabilities)
except AssertionError:
print('Go and enable candidate configs pls')
candidate_supported = False
if candidate_supported:
# we must handle errors nicely so always work in a try loop
try:
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
# lock running manually as well
m.lock(target='running')
# scrap any unknown pending changes in that buffer
m.discard_changes()
# apply our edits
m.edit_config(target='candidate', config=rpc_body)
# commit them
m.commit()
# tell us
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


and try again:
python3 ./models/interface/push_inside_if.py
Failed to configure interface: illegal reference /oc-if:interfaces/interface[name='GigabitEthernet4']/subinterfaces/subinterface[index='0']/oc-ip:ipv4/addresses/address[ip='109.
109.109.2']/ip
So we are back where we were at the end of the last attempt, and config candidates aren't helping us.

Where to next then? Google doesn't have a lot to be honest, but there is this forum post about an illegal reference in the OC BGP Model against the CSV1000v. Parsing the response from the Cisco guy a bit, I'm left with two paths to follow.
  1. Version incompatibility with my IOS-XE (16.9.5) and the OC model here.
  2. The missing Cisco IOS-XE deviations from the OC Interfaces model need to be here.
I will start with option 2, since its easy enough to proove, although I expect I might have to upgrade anyways. That said, it's better to fix this in situ than blindly upgrade with no visibility of other bugs I might hit outside of YANG...



Friday, 26 June 2020

Doing YANG Wrong: Part 3 - Using the python bindings

Part 3: Using the python bindings to push a config

Given we generated that python file locally in a machine, we assume here that you are still in that subdirectory.

The below code was stolen fully from the YANG Book. Much love and all credit to them for their work.

from interface_setup import openconfig_interfaces
from pyangbind.lib.serialise import pybindIETFXMLEncoder
from ncclient import manager
# device settings
username = 'yangconf'
password = 'my_good_password.'
device_ip = '192.168.70.21'
# config settings
inside_interface = 'GigabitEthernet4'
inside_ip_addr = '109.109.109.2'
inside_ip_prefix = 24

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host-kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
try:
m.edit_config(target='running', config=rpc_body)
print('Successfully configured IP on {}'.format(kwargs['dev']))
except Exception as e:
print('Failed to configure interface: {}'.format(e))


if __name__ == '__main__':
# instanciate the openconfig model
ocintmodel = openconfig_interfaces()
# create an instance of the interfaces
ocinterfaces = ocintmodel.interfaces
# create a new interface instance in that parent object
inside_if = ocinterfaces.interface.add(inside_interface)
# even a routed interface required a subinterface, its just at index 0
inside_if.subinterfaces.subinterface.add(0)
# create an instance of that subinterface object to edit
inside_sub_if = inside_if.subinterfaces.subinterface[0]
# apply an IP to that object
inside_sub_if.ipv4.addresses.address.add(inside_ip_addr)
# read that ip object into an ip object
ip = inside_sub_if.ipv4.addresses.address[inside_ip_addr]
# set the IP and the subnet mask properly
ip.config.ip = inside_ip_addr
ip.config.prefix_length = inside_ip_prefix
send_to_device(dev=device_ip, user=username, password=password, py_obj=ocinterfaces)


When I run this, it fails.

<pyangbind.lib.yangtypes.YANGBaseClass object at 0x7f3e5fac6170>
<pyangbind.lib.yangtypes.YANGBaseClass object at 0x7f3e5f9a69e0>
Traceback (most recent call last):
  File "<stdin>", line 19, in <module>
  File "<stdin>", line 2, in send_to_device
NameError: global name 'pybindIETFXMLEncoder' is not defined


Turns out the serialiser that the book code uses, relies on a library function that isn't in the pip version 0.8.1 of the pyangbind code. Bit of googling says build from the github repo here. Visiting that repo and alarm bells are ringing - the last commits are 2 years ago, and somehow the pip version is still out of date? Why? Anyways.
pip install --upgrade git+https://github.com/robshakir/pyangbind.git
...

python ./push_inside_if.py 
Traceback (most recent call last): 
  File "./push_inside_if.py", line 43, in <module>
    send_to_device(dev=device_ip, user=username, password=password, py_obj=ocinterfaces)
  File "./push_inside_if.py", line 16, in send_to_device 
    rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>' 
  File "/home/gns3/.local/lib/python2.7/site-packages/pyangbind/lib/serialise.py", line 380, in serialise
    doc = cls.encode(obj, filter=filter)
  File "/home/gns3/.local/lib/python2.7/site-packages/pyangbind/lib/serialise.py", line 375, in encode 
    return cls.generate_xml_tree(obj._yang_name, obj._yang_namespace, preprocessed) 
AttributeError: 'YANGBaseClass' object has no attribute '_yang_namespace'
What fresh hell is this?

So a github issue now tells us that we generated the binding against the old version of pyangbind, so we have to redo our export for the ENV var and then rebuild the python module....

pyang --plugindir $PYBINDPLUGIN -f pybind -o interface_setup.py *.yang                          
[email protected]:346: warning: node "openconfig-interfaces::state" is config false and is not part of the accessible tree                          
[email protected]:84: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning  
[email protected]:100: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:102: warning: the escape sequence "\*" is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:121: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:123: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:125: warning: the escape sequence "\*" is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:130: warning: the escape sequence "\*" is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:131: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
[email protected]:133: warning: the escape sequence "\." is unsafe in double quoted strings - pass the flag --lax-quote-checks to avoid this warning 
INFO: encountered (<pyang.error.Position object at 0x7fdbf28fb820>, 'XPATH_REF_CONFIG_FALSE', (u'openconfig-interfaces', u'state'))                                      
FATAL: pyangbind cannot build module that pyang has found errors with.

Oh my word - so much rage.

I tried the --lax-quote-checks and that didn't work, so I edited each of those lines in the openconfig-vlan yang file to swap double quotes in the regexes to single quotes. These warnings went away.

pyang --plugindir $PYBINDPLUGIN -f pybind -o interface_setup.py *.yang
[email protected]:346: warning: node "openconfig-interfaces::state" is config false and is not part of the accessible tree
INFO: encountered (<pyang.error.Position object at 0x7fe9b551f2d0>, 'XPATH_REF_CONFIG_FALSE', (u'openconfig-interfaces', u'state'))
FATAL: pyangbind cannot build module that pyang has found errors with.
This one had be stumped. Google had nothing.  I was going around in circles until I broke the cycle by working on my laptop instead of my workstation. During the first time setup of the tools I found myself looking at all the repos again in github, and so I thought I would take a look at the blame on the affected file here. The error stood out like a saw thumb.

In my downloaded model, it referred to oc-if:state and in the repo model it referred to oc-if:config. The error now stands to reason since the state model is more for telemetry - its a read only view of the interface state, not the config. I edited the field and we now have a compiled module again.

Back to running the script...
python push_inside_if.py
Failed to configure interface: expected tag: name, got tag: subinterfaces
WAT? Lets dump out what we generated prior to send...

we add print(pybindIETFXMLEncoder.serialise(ocinterfaces)) just above the send_to_device call, and then run again.
python push_inside_if.py

<interfaces xmlns="http://openconfig.net/yang/interfaces">
  <interface>
    <subinterfaces>
      <subinterface>
        <ipv4 xmlns="http://openconfig.net/yang/interfaces/ip">
          <addresses>
            <address>
              <config>
                <prefix-length>24</prefix-length>
              </config>
              <ip>109.109.109.2</ip>
            </address>
          </addresses>
        </ipv4>
        <index>0</index>
        <config>
          <description>Inside IP Address</description>
        </config>
      </subinterface>
    </subinterfaces>
    <config>
      <enabled>true</enabled>
      <description>Inside Interface</description>
    </config>
    <name>GigabitEthernet4</name>
  </interface>
</interfaces>
Well it looks correct, but maybe it doesn't like the fact the name tag is the bottom? Seems like a dumb complaint to have - its a machine readable structure and the positioning in that structure is technically accurate (it's in the correct layer of the XML?)

Only way to prove this is to make a manual copy of this as a string var and then push it directly instead of rendering it with this tool.

First I comment the existing rendering of the rpc_body in the function to just use the kwargs['py_obj'] verbatim (I provide valid XML in my string) and then I make a multiline string in the main function with a human ordered XML envelope.

python push_inside_if.py 

original
<interfaces xmlns="http://openconfig.net/yang/interfaces">
  <interface>
    <subinterfaces>
      <subinterface>
        <ipv4 xmlns="http://openconfig.net/yang/interfaces/ip">
          <addresses>
            <address>
              <config>
                <prefix-length>24</prefix-length>
              </config>
              <ip>
109.109.109.2</ip>
            </address>
          </addresses>
        </ipv4>
        <index>0</index>
        <config>
          <description>Inside IP Address</description>
        </config>
      </subinterface>
    </subinterfaces>
    <config>
      <enabled>true</enabled>
      <description>Inside Interface</description>
    </config>
    <name>GigabitEthernet4</name>
  </interface>
</interfaces>


ordered
<interfaces xmlns="http://openconfig.net/yang/interfaces">
  <interface>
    <name>GigabitEthernet4</name>
    <config>
      <enabled>true</enabled>
      <description>Inside Interface</description>
    </config>
    <subinterfaces>
      <subinterface>
        <index>0</index>
        <config>
          <description>Inside IP Address</description>
        </config>
        <ipv4 xmlns="http://openconfig.net/yang/interfaces/ip">
          <addresses>
            <address>
              <ip>
109.109.109.2</ip>
              <config>
                <prefix-length>24</prefix-length>
              </config>
            </address>
          </addresses>
        </ipv4>
      </subinterface>
    </subinterfaces>
  </interface>
</interfaces>
  
Successfully configured IP on 192.168.70.21


Ugh. That's so lame. Clearly the problem here is the XML Serialiser is not rendering the objects in an order that the netconf agent on the CSR likes. Kill me now.

But wait. It gets better.

Having returned to my workstation, I decide that sending commands straight from VSCode to the GNS3 simulated CSR via the GNS3 simulated Ubuntu box with a simple NAT on the Ubuntu VM.
sudo sysctl -w net.ipv4.ip_forward=1
sudo iptables -t nat -A POSTROUTING -o ens3 -j MASQUERADE
I then fire up code, pull in the changes from the laptop via my git repo, and fire off the request as seen to see whats what.
python3 ./models/interface/push_inside_if.py
Successfully configured IP on 192.168.70.21
Eh? This is not the same ordered code. This is the standard generated XML blob. Only difference, is Python3.8 is default on my workstation.

If nothing else, what this has taught me is that when it comes to YANG modelling in Python - environment matters - a lot. I get the feeling this is also why the developer of pyangbind let it die on the vine a bit, since moving over to Golang in his day job probably translates better to this use case as well. Golang for the initiated, generates C-like (in speed and performance) binary files that are all inclusive - no dependencies, no libraries. Build an app in Go, and its ready to rock and roll anywhere

At this point, i have been able to build a model saying what I want, and push it to the box, and it "made it so". What happens if i make a change out of band and then push something back to the box?

in1rt001#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
in1rt001(config)#int gi4
in1rt001(config-if)#ip address 109.109.109.3 255.255.254.0
in1rt001(config-if)#^Z

in1rt001#sh run int gi 4 
Building configuration...

Current configuration : 203 bytes
!
interface GigabitEthernet4
 description Inside IP Address
 ip address 109.109.111.3 255.255.254.0 secondary
 ip address 109.109.109.3 255.255.254.0
 negotiation auto
 no mop enabled
 no mop sysid
end

So I hacked up the subnet mask, oh and btw there is a secondary IP there too...

python3 ./models/interface/push_inside_if.py
Failed to configure interface: /native/interface/GigabitEthernet[name='4']/ip/address/secondary[address='109.
109.109.2']/secondary is not configured
woooommmmpp whomp...

maybe i need to make sure that secondary isnt confusing things?

in1rt001#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
in1rt001(config)#int gi 4
in1rt001(config-if)#no  ip address 109.109.110.3 255.255.254.0 secondary
in1rt001(config-if)#^Z
in1rt001#sh run int gi 4
Building configuration...

Current configuration : 153 bytes
!
interface GigabitEthernet4
 description Inside IP Address
 ip address 109.109.109.3 255.255.254.0
 negotiation auto
 no mop enabled
 no mop sysid
end

Try again then...
python3 ./models/interface/push_inside_if.py
Failed to configure interface: /native/interface/GigabitEthernet[name='4']/ip/address/secondary[address='109.109.109.2']/secondary is not configured
 big fat nope.

At this point, I think we need to consider the use of config candidates and the push many, apply once concept. Time for a new post...

The even-ended number problem in Go and Python

 During the Go Essential Training course on LinkedIn, the instructor sets up a problem for you to solve. The solution is in the next slide o...