Thursday, 2 July 2020

Doing YANG Wrong: Part 5 - Manufacturer/Model deviations

Part 5: Deviations

So we can talk to the device, and we can use candidate configs to stage and then apply configs in aggregate, but we still can't make a CSR1000v take a simple openconfig IP address. At the beginning I deliberately called out I wanted to use generic models, and avoid the deviations. This is because the python model binding i then generate only works on that vendor's box now. This isn't terrible, but it's not really what we want. Lets see if it works tho.

Lets look at the hello statement for the ip address model again, and then fetch in the deviation it describes.
netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --hello | grep openconfig | grep ip
<nc:capability>http://openconfig.net/yang/interfaces/ip?module=openconfig-if-ip&amp;revision=2018-01-05&amp;deviations=cisco-xe-openconfig-if-ip-deviation,cisco-xe-openconfig-interfaces-deviation</nc:capability>
So there are actually two:
cisco-xe-openconfig-if-ip-deviation
cisco-xe-openconfig-interfaces-deviation
Lets pull these off the box like the others.
netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --get-schema cisco-xe-openconfig-if-ip-deviation | xml_grep 'data' --text_only > cisco-xe-openconfig-if-ip-deviation.yang

netconf-console --host 192.168.70.21 --port 830 -u yangconf -p my_good_password. --get-schema cisco-xe-openconfig-interfaces-deviation | xml_grep 'data' --text_only > cisco-xe-openconfig-interfaces-deviation.yang
Lets rebuild our python module with these deviations in the bundle:
clear && pyang --plugindir $PYBINDPLUGIN -f pybind -o interface_setup.py *.yang --deviation cisco-xe-openconfig-interfaces-deviation.yang cisco-xe-openconfig-if-ip-deviation.yang

cisco-xe-openconfig-interfaces-deviation.yang:11: warning: imported module "openconfig-if-ethernet" not used
cisco-xe-openconfig-interfaces-deviation.yang:23: warning: imported module "ietf-yang-types" not used
INFO: encountered (<pyang.error.Position object at 0x7fb3adcc8370>, 'UNUSED_IMPORT', u'openconfig-if-ethernet')
INFO: encountered (<pyang.error.Position object at 0x7fb3adcc6cd0>, 'UNUSED_IMPORT', u'ietf-yang-types')
Hmm. So its saying that in one deviation file (the interfaces one) there are two models that are defined, but not used. This is annoying, and goes to show how flaky some of these modules can be. I have to find the module definitions in the interfaces yang file at lines 11 and 23, and then comment them out. We can then rerun to get a compiled python module.

We then re-run our script:
python3 push_inside_if.py
Successfully configured IP on 192.168.70.21
Oooooh. Sensecheck please....
in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description Inside Interface
 ip address 109.109.109.2 255.255.255.0
 negotiation auto
end
Yay. Lets see if promise theory can be used with our models too. We will change the IP and the description.
python3 ./push_inside_if.py
Failed to configure interface: illegal reference /oc-if:interfaces/interface[name='GigabitEthernet4']/subinterfaces/subinterface[index='0']/oc-ip:ipv4/addresses/address[ip='50.60.70.1']/ip
Womp Womp :o(

Ok lets try everything except the IP itself (desc and mask):

python3 push_inside_if.py
Successfully configured IP on 192.168.70.21


in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description People are stupid
 ip address 109.109.109.2 255.255.254.0
 negotiation auto
end
What is interesting therefore, is that I can change everything except the IP... Subnet mask - fine, Description - fine. So, looking at that earlier error, it could well be that the existing interface model states subinterface[0] with IP address 50.60.70.1 doesn't match the existing IP address 109.109.109.2 when we push the model to the box. Thus, we cant reference the IP address we want in the interface model since it doesn't exist in the device's copy of the model.

So to round this out, lets get the IP off the interface instead.

we edit line 66 and make that add(inside_ip_addr) into delete()

inside_sub_if.ipv4.addresses.address.delete(inside_ip_addr)

fail:
python3 ./models/interface/push_inside_if.py
Traceback (most recent call last):
  File "/home/jhow/.local/lib/python3.8/site-packages/pyangbind/lib/yangtypes.py", line 848, in delete
    del self._members[k]
KeyError: '109.109.109.2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 66, in <module>
    inside_sub_if.ipv4.addresses.address.delete(inside_ip_addr)
  File "/home/jhow/.local/lib/python3.8/site-packages/pyangbind/lib/yangtypes.py", line 852, in delete
    raise KeyError("key %s was not in list (%s)" % (k, m))
KeyError: "key 109.109.109.2 was not in list ('109.109.109.2')"
This makes sense because the model in the python memory is independent of the config of the device. You cant delete something that doesnt exist in memory yet, and adding it then removing it leaves you with an empty model to push, which has no effect on the device. We would have to pull the config and parse it into a model before we could know what we have to delete, or know that what we have injected into our python model matches the state exactly on the device. The latter is fine if you have a good source of truth.

It is at this point I spot that we have to add a netconf operation "delete" to the modeled interface to delete it.
nc:operation="delete"
...needs to be put into the xml wrapper around the ipv4 address of the subinterface
 
I can't see how this pyangbind xml serialiser supports providing netconf operations. To validate this theory, I first try to export the serialised XML for a create operation, and manually add the operation="delete" to the ipv4 tag.

manual = '''
<interfaces xmlns="http://openconfig.net/yang/interfaces">
<interface>
<name>GigabitEthernet4</name>
<config>
<enabled>true</enabled>
</config>
<subinterfaces>
<subinterface>
<index>0</index>
<config>
<description>Deleted</description>
</config>
<ipv4 xmlns="http://openconfig.net/yang/interfaces/ip">
<addresses>
<address operation="delete">
<ip>109.232.176.2</ip>
<config>
<ip>109.232.176.2</ip>
<prefix-length>23</prefix-length>
</config>
</address>
</addresses>
</ipv4>
</subinterface>
</subinterfaces>
</interface>
</interfaces>
'''

We then alter the send_to_device function exactly as we did before, and send this formatted XML in the <config> brackets, instead of our modeled object.

I save this as pull_inside_if.py to keep the files separate, then push the interface in, before pulling it back out.

python3 ./models/interface/push_inside_if.py
Successfully configured IP on 192.168.70.21

python3 ./models/interface/pull_inside_if.py
Successfully configured IP on 192.168.70.21
Checking the running config on the box and it did the trick.

in1rt001#sh run int gi 4
Building configuration...

Current configuration : 122 bytes
!
interface GigabitEthernet4
 description Deleted
 no ip address
 negotiation auto
end
So now we have to ask ourselves. Do we care about idempotency? When it comes to pushing a change out, there are 3 states we will observe:
1) the IP on the interface doesn't exist and should exist. <- works
2) the IP on the interface exists and we need to change something that isn't the IP itself <- works
3) the IP on the interface exists and we want to remove it <- works.

What we cannot do in one step is an in situ replacement of the IP, since that is a key value in the model itself. Considering this sort of operation would be bound to the operational hooks of a CMDB, the events we would expect are add a new thing, change something on that existing thing that isnt the thing itself, or delete the thing. On that basis, it shouldn't be unreasonable to codify that you should decommission an interface, and then commission your new one as a new config. Thus you cannot "update" keyed values in situ, but must instead remove/commit/add/commit in order.

Given the likely use case for a CMDB hook response, I guess we can live with that.

Wednesday, 1 July 2020

Doing YANG Wrong: Part 4 - Config stores

Part 4: Config stores 

As we head down this rabbit hole, we start to get ever closer to something useful, but ever more deep into the weeds of NETCONF/YANG. For the uninitiated, config stores are a place where you can put config chunks either singularly, or in aggregate over a series of netconf pushes, to generate a new config that you will apply in one hit. In the SP world you might want to setup some interfaces, some BGP and then some overlay like an MPLS service. You may not want to do all this in one script since you might not need all of those things in one script. better to make a script per thing, and then call the ones your CMDB thinks it needs, place them all in a candidate config, validate that as a whole and then push that in one go. If anything fails, you can then throw it all away and not touch the operational running config.  At this point juniper folks are shrugging their shoulders - this is very old hat to them, but for enterprise people, particularly the type who lived on the CLI for years, this is unimaginable.

There be dragons here still however. There is invariably only one candidate config store per device, and there is always the chance that someone did something and their script didnt clear out in error handling, or the chance that some other script or person hops onboard whilst your session is in progress. Either way, you could end up with a poisoned candidate and a mess.

To approach this risk, there are two things to help treat this. First, is config locking. When you start a session it's a good idea to lock the target datastore (candidate), and the indented datastore (running). This prevents any out of band changes to the running place, and no inflight changes from other agents on the candidate config. All scripts should have error handling in from the start to discard config and release locks.

So. Lets tickle our script a bit. We need to add support for the config candidates. All this is handled in our function send_to_device()

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
# we must handle errors nicely so always work in a try loop
try:
assert(":candidate" in m.server_capabilities)
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


You'll see we added stuff into our try loop, added a discard_changes into our exception loop and a finally section to always unlock the running config as we said we would. Lets give it a spin.
python3 ./models/interface/push_inside_if.py
Failed to configure interface:
Lame. The fact we have no errors here makes me think we have actually got raised by the Assert failing (Exception != AssertionError). Lets add a new handler for this.

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
# we must handle errors nicely so always work in a try loop
try:
assert(":candidate" in m.server_capabilities)
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except AssertionError:
print('Go and enable candidate configs pls')
m.discard_changes()
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


python3 ./models/interface/push_inside_if.py
Go and enable candidate configs pls
Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 20, in send_to_device
    assert(":candidate" in m.server_capabilities)
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 32, in send_to_device
    m.discard_changes()
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 292, in __init__
    self._assert(cap)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 367, in _assert
    raise MissingCapabilityError('Server does not support [%s]' % capability)
ncclient.operations.errors.MissingCapabilityError: Server does not support [:candidate]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 64, in <module>
    send_to_device(dev=device_ip, user=username, password=password, py_obj=ocintmodel.interfaces)
  File "./models/interface/push_inside_if.py", line 38, in send_to_device
    m.unlock(target='running')
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 49, in request
    return self._request(node)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: {'type': 'protocol', 'tag': 'access-denied', 'severity': 'error', 'info': None, 'path': None, 'message': None}
Puke.

As we dig over that, we can see we were right, and the AssertionError wasn't being picked up, and now it is, but then after that it barfs everywhere with additional exceptions.

Looks like the first one is the m.discard_changes() in the AssertionError handler and the second is the m.unlock('running') in the finally block.

The first one is sorta obvious, if you don't have support for candidates, you didn't even try to lock the config, cos that assert is first. We can delete that.

The second is in the finally: block which always runs, is failing because we don't support it still.

My first thought was to wrap the m.unlock in another try block, with an except: pass concept, but that's a hack that could hurt you later. What I did instead was two try blocks, one for the candidate check and one for the config changes. If we don't support candidates, we bomb out don't try to do any config.

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
try:
candidate_supported = True
assert(":candidate" in m.server_capabilities)
except AssertionError:
print('Go and enable candidate configs pls')
candidate_supported = False
if candidate_supported:
# we must handle errors nicely so always work in a try loop
try:
# lock running manually
m.lock(target='running')
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
m.discard_changes()
m.edit_config(target='candidate', config=rpc_body)
m.commit()
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


Try again:
python3 ./models/interface/push_inside_if.py
Go and enable candidate configs pls
Nice and clean. Now where were we? Ah yes. We need to actually enable the candidate datastore on the router...
netconf-yang feature candidate-datastore
and try again...

python3 ./models/interface/push_inside_if.py
Failed to configure interface: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure
Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 31, in send_to_device
    with m.locked(target='candidate'):
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 67, in __enter__
    Lock(self.session, self.device_handler, raise_mode=RaiseMode.ERRORS).request(self.target)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/lock.py", line 35, in request
    return self._request(node)
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./models/interface/push_inside_if.py", line 68, in <module>
    send_to_device(dev=device_ip, user=username, password=password, py_obj=ocintmodel.interfaces)
  File "./models/interface/push_inside_if.py", line 39, in send_to_device
    m.discard_changes()
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/manager.py", line 231, in execute
    return cls(self._session,
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/edit.py", line 192, in request
    return self._request(new_ele("discard-changes"))
  File "/home/jhow/.local/lib/python3.8/site-packages/ncclient/operations/rpc.py", line 348, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: the configuration database is locked by session 5 syncfromdaemon tcp (system from 127.0.0.1) on since 2020-07-01 10:19:28
   IOS-XE YANG Infrastructure

Lol? One thing that could be happening is that the candidate config is a copy of running, and if we lock running, it cant make a copy of running for the candidate store? like a race condition if you will... Lets move the running lock into the with loop for locking candidate:

def send_to_device(**kwargs):
rpc_body = '<config>' + pybindIETFXMLEncoder.serialise(kwargs['py_obj']) + '</config>'
with manager.connect_ssh(host=kwargs['dev'], port=830, username=kwargs['user'], password=kwargs['password'], hostkey_verify=False) as m:
try:
candidate_supported = True
assert(":candidate" in m.server_capabilities)
except AssertionError:
print('Go and enable candidate configs pls')
candidate_supported = False
if candidate_supported:
# we must handle errors nicely so always work in a try loop
try:
# use a magic context manager for locking candidate whilst we work. it unlocks it as we exit the with: frame
with m.locked(target='candidate'):
# lock running manually as well
m.lock(target='running')
# scrap any unknown pending changes in that buffer
m.discard_changes()
# apply our edits
m.edit_config(target='candidate', config=rpc_body)
# commit them
m.commit()
# tell us
print('Successfully configured IP on {}'.format(kwargs['dev']))
# spit the error out
except Exception as e:
print('Failed to configure interface: {}'.format(e))
m.discard_changes()
# regardless of error or success, unlock running store.
finally:
m.unlock(target='running')


and try again:
python3 ./models/interface/push_inside_if.py
Failed to configure interface: illegal reference /oc-if:interfaces/interface[name='GigabitEthernet4']/subinterfaces/subinterface[index='0']/oc-ip:ipv4/addresses/address[ip='109.
109.109.2']/ip
So we are back where we were at the end of the last attempt, and config candidates aren't helping us.

Where to next then? Google doesn't have a lot to be honest, but there is this forum post about an illegal reference in the OC BGP Model against the CSV1000v. Parsing the response from the Cisco guy a bit, I'm left with two paths to follow.
  1. Version incompatibility with my IOS-XE (16.9.5) and the OC model here.
  2. The missing Cisco IOS-XE deviations from the OC Interfaces model need to be here.
I will start with option 2, since its easy enough to proove, although I expect I might have to upgrade anyways. That said, it's better to fix this in situ than blindly upgrade with no visibility of other bugs I might hit outside of YANG...



node_exporter in VyOS 1.4

So it turns out that if you want metrics from VyOS, your two options are SNMP or Telegraf (towards InfluxDB).  SNMP is one of those things t...