Viewing Relational Data as RDF
July 7th, 2008
For my enterprise syndication work, I need to grab data from a lot of different data sources and put them (or link to them) in a triple store. That’s fine if the source data is in RDF, but it gets more complicated if it’s not. Probably the biggest not is data that is locked away in a relational database. Since so much information in these things, I can’t ignore them. I’ve got to be able to get at it. But I want to do it in a generic way, without having to change any table structures.
The best way I’ve found so far (which I learned about from Ashok Malhotra and Jim Melton of Oracle at the Linked Data Planet conference a few weeks ago) is to create a triple-like view of the data that I’m interested in . For example, I have a table that has a 600k rows of information about network interface cards on a big network. The data looks something like this:
ID, Name, Type, Shelf
1, A01, MD5, S:MD5-1:LJU
2, A01, MD5, S:MD5-1:LJU
...
Now to convert this relational data into triples, I simply define a new view that creates the familiar subject-predicate-object pattern:
create or replace view rdf_card_inst_view as
select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' as predicate,
'http://www.foo.com/bar#Card' as object,
NULL as meta,
'1' as type,
NULL as datatype,
NULL as language
from temp_card_inst
union
select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#type' as predicate,
card_type as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst
union
select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#name' as predicate,
card_name as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst
union
select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#shelf' as predicate,
equip_name as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst;
This gives me a nice triplified view which I can then query with RDF selects or SPARQL queries. Some of the extra colums ( meta, type, language, datatype, etc. ) are stuff I use in my RDF API, but you maybe or may not need them.
SUBJECT PREDICATE OBJECT META TYPE DATATYPE LANGUAGE
--------------------------------- ----------------------------------------------- --------------------------- ------ ---- -------- --------
http://www.foo.com/bar#card1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card10 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card1000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card10000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100001 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100002 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100003 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100004 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
If I want to pull in other data ( such as the shelves that the cards are plugged in to ), I can create a similar view for that data and then create a new view that is the union of the two.
create card_and_shelf_view as
select * from rdf_card_inst_view
union
select * from rdf_shelf_inst_view;
Pretty simple.
Now, you might be saying to yourself, “Shit, this going to take forever with those views!” And you’d be right in some cases. Doing a ’select *’ kind of query takes about 4 seconds on my box with 2.4 million rows in the view (AMD Phenom Triple Core @2Ghz with 3GB RAM, Oracle 10g, Ubuntu 7.10). But as soon as I start putting some constraints on the query, the database optimizer seems to kick in and things start moving right along. Also, limiting the results to just the first 1500 records or so is also helpful.
For example, if I use my RDF API to do simple selects (ie. give me all the cards on a given shelf), the API generates this query to execute against the view:
SELECT 1 as stype, subject, 1 as ptype, predicate, type, object, datatype, language, 1 as mtype, meta FROM rdf_card_inst_view WHERE (predicate='http://www.foo.com/bar#shelf' ) and (object='F:MEA10:DS01' ) AND ROWNUM < 1500
That query executes in 0.009 seconds. Not bad!
October 23rd, 2008 at 10:30 pm
[...] element), we can simply ignore the API and query that database directly. To do this, I simply created a RDF View over the data that I was interested in. This view is structured in such a way that the data appears as a generic [...]
November 13th, 2008 at 11:04 am
Very cool idea. This lets the analyst put together a more realistic navigation than some of the other methods I’ve looked at. Never thought to embed the URIs right in the data.
I’ll mull this around for a bit and get back to you, k?