Semantic Systems Integration
October 23rd, 2008
I’ve done systems integration work for years. Generally these kinds of things are one-off applications that integrate a few systems with some very specific business rules that are “hard-coded” into the system. This makes them very brittle and difficult to change when new systems or business rules are desired. Now as a software developer, I know that the right way to do things is to abstract a lot of the functionality so you can plugin new functionality later without a lot of fuss. But the realities of time, money and attention span often require that you “just get it done”. Customers generally don’t care about how the nut gets cracked, just so it gets cracked on time and on budget.
As part of my day job last week, I was tasked with building a proof-of-concept that linked three systems together. Happily, I have a nicely evolving RDF toolkit at my disposal which really made things easy.
Using a nice RDF toolkit in systems integration work can help realize both the customer’s need for quick turnaround, and the developers desire for an elegant and flexible solution. Because the RDF model provides a common data model that is more-or-less infinitely flexible, it becomes trivial to mesh and mash the disparate data in interesting ways.
For example, my POC was to integrate a network element management system and a network inventory system. The EMS basically keeps track of devices on a network and knows how to communicate directly with these devices. The NIS keeps track of physical stuff ( switches, routers, COs, etc.), and also logical stuff like VPN, VoIP and POTS services. The idea of this integration was to reconcile the network “as-built” (according to the EMS) and “as-designed” (according to the NIS). In other words, does the configuration of DeviceABC as expressed through the EMS jive with the data about DeviceABC in the NIS.
Additionally, we wanted to create the ability to store extra data that neither system could natively store. Examples might include notes or commands for field technicians, device user manual, etc. So essentially, we have three systems to work with.
The first thing I need to do was to identify the entities and attributes from each system that I was going to work with. For each system, I defined a number or Classes and Properties in an OWL ontology. This was fairly straightforward…most tools provide some facility to do this. But even writing it by and isn’t that difficult. In addition to modelling the entities in the external system, I also created other entities that will be stored locally. These are the notes, command, documents, etc. that I spoke of earlier.
The EMS provides an API to fetch the list of devices names, details of each device, etc. This is a SOAP-based API that returns XML. To get it in a format I could use (RDF), I simply applied a stylesheet to turn their native XML response into Turtle. Then parsed the Turtle into an in-memory triplestore. My stylesheet of course converts the data so that it aligns with the Classes and Properties in my Ontology that I’ve defined for the EMS system.
On the NIS side, it again provides a non-RDF friendly API based on Java/J2EE. But because the data is actually stored in a database (as opposed to being fetched on demand from a network element), we can simply ignore the API and query that database directly. To do this, I simply created a RDF View over the data that I was interested in. This view is structured in such a way that the data appears as a generic RDBMS-based triplestore that we’re all familiar with…subject column, predicate column, etc. This is a bit different from some other RDF-RDBMS mapping strategies like D2RQ, but it works well for me. The idea is similar to that of Triplify, but I actually query a view rather than inspect a query result. I point another kinds of triplestore at this view, which I just call an RDF View Store. It’s a read-only store obviously, but provides the same querying capabilities as my “normal” store.
So now I have three distinct sources of data, all exposed as RDF. Mashing that data together is now trivial:
- Fetch a URI from the EMS triplestore and examine its properties
- Based on a property, query the NIS triplestore for a match
- Display the combined results to the user
- Let the user annotate the relationship by creating a note instance in the local triplestore and relating it to either the instance in the EMS or NIS systems.
- Profit!
Now I have a system that satisfies my functional requirements. But what about my non-functional requirements of abstraction and ease of change. I have that too. Since everything is exposed as RDF, my application is fairly insulated from changes in the EMS or NIS. New data in the EMS? No problem…I just change my XSLT. Database schema changes in the NIS? Again, no problem…I just change my RDF View. Even the UI (which I really didn’t mention) is defined in RDF. What if I want to add a new “field” to display, or remove one that’s no longer used? I can just change the UI RDF and be done with it.
As a nice side effect, I’ve just opened these systems to any other application that can speak RDF. If I up a web server in front of these new triplestores, I’ve now opened up a new source of data to the entire enterprise. Enterprise Linked Data, baby!
October 24th, 2008 at 12:26 pm
Very nice dog-fooding story!
This is what RDF Linked Data is all about
btw - I note your use of the term “RDF Views” and this is the first time I am seeing the term used outside OpenLink. We’ve always seen RDMS and RDF data meshing as being all about RDF Views over SQL Data.
Links:
1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSSQL2RDF
2. http://docs.openlinksw.com/virtuoso/rdfviews.html
October 24th, 2008 at 12:44 pm
Thanks Kingsley!
I may have read something of yours and subconsciously picked up the term “RDF View” from you. Hope you don’t mind my using the term.