Flex-able SPARQL

August 19th, 2008

One of the projects I’m working on is a Flex-based UI that connects to some web-services. These web services query an RDF store and return some really lame XML. The services are all built with Apache Axis, which is fine kit…but I hate web services. Hate Them. I’d rather slide down a rusty razor blade into a pool of alcohol than use them.

Yesterday I was productive enough to code up a SPARQL endpoint for my RDF store. Now, my SPARQL implementation is not complete and kinda naive, but it works. I’m lacking some of the less-used stuff like CONSTRUCT and UNION, but I’ve added some SPARUL stuff like INSERT DATA, DELETE DATA. In short, it’s enough for my purposes.

So since I’ve a got a more-or-less functional SPARQL, I decided I’d port my project to use SPARQL queries only and ditch the web services altogether. So long, WS-ohferfucksakes!

Did I mention I’m using Flex* for the front end? Right. So I needed to figure out if I could actually build a UI and that would send a query and get a result. After some tweaking of the endpoint (can’t seem to figure out how to override Accept: */* in mx:HTTPService), I got something up and running! It’s only able to parse application/sparql-results+xml right now. Tomorrow it’ll get to know application/rdf+xml so it can handle DESCRIBE without crapping all over itself.

Anyway, here’s some SPARQL-Flex porn showing a query against dbpedia

sparql-flex

* Yes, I know the Linked Data / Data Web folks HATE Flash/Flex (and javascript() apparently) since there are no URI’s to grab. But hey, just ’cause you want it doesn’t mean I have to give it to you. :P

Speaking with a colleague yesterday, he mentioned that he needs to create a LOD-based proof of concept in which two COTS products and a generic triplestore are linked together to provide a unified view of various enterprise-wide entities. The wrinkle in the problem is that we can’t build RDF Views to expose the data in one of the systems. It’s a real-time system that queries network elements on-demand and therefor we need to use their API in order to get that data.

Hum. So how can we (easily) make a proprietary API expose itself in a LOD friendly way?

My initial thinking is that we simply wrap objects returned from the API with a facade class. The Facade will be pretty dumb and will just delegate calls to the backing object. However, the methods of our facade will contain a number of annotations. For example, @Subject might indicate that this method represents the URI of the thing the object represents. @Predicate [some uri fragment] indicates that this methods return value will be a property of the object, using [some uri fragment] as it’s name.

public class NodeFacade extends Node {
@Subject
public String getUUID() {
return super.getUUID();
}

@Predicate #status
public String getStatus() {
return super.getStatus();
}

@Predicate #vendor
public String getVendor() {
return super.getVendor();
}
}
etc...

By introspecting the class and noting the annotations, we should be able to generate a set of triples for this object. We also note the server’s name and context path and include that as necessary.

<http://www.foo.com/node#7840-0932> <http://www.foo.com/node#vendor> "Cisco" .
<http://www.foo.com/node#7840-0932> <http://www.foo.com/node#status> "DOWN" .

Anyway, that’s my initial thoughts on how to do this. What have you or others done wrt this problem? I’d love to hear any feedback you have.

For my enterprise syndication work, I need to grab data from a lot of different data sources and put them (or link to them) in a triple store. That’s fine if the source data is in RDF, but it gets more complicated if it’s not. Probably the biggest not is data that is locked away in a relational database. Since so much information in these things, I can’t ignore them. I’ve got to be able to get at it. But I want to do it in a generic way, without having to change any table structures.

The best way I’ve found so far (which I learned about from Ashok Malhotra and Jim Melton of Oracle at the Linked Data Planet conference a few weeks ago) is to create a triple-like view of the data that I’m interested in . For example, I have a table that has a 600k rows of information about network interface cards on a big network. The data looks something like this:


ID, Name, Type, Shelf
1, A01, MD5, S:MD5-1:LJU
2, A01, MD5, S:MD5-1:LJU
...

Now to convert this relational data into triples, I simply define a new view that creates the familiar subject-predicate-object pattern:


create or replace view rdf_card_inst_view as

select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' as predicate,
'http://www.foo.com/bar#Card' as object,
NULL as meta,
'1' as type,
NULL as datatype,
NULL as language
from temp_card_inst

union

select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#type' as predicate,
card_type as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst

union

select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#name' as predicate,
card_name as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst

union

select
'http://www.foo.com/bar#card' || trim(to_char(temp_id)) as subject,
'http://www.foo.com/bar#shelf' as predicate,
equip_name as object,
NULL as meta,
'3' as type,
'http://www.w3.org/2001/XMLSchema#string' as datatype,
NULL as language
from temp_card_inst;

This gives me a nice triplified view which I can then query with RDF selects or SPARQL queries. Some of the extra colums ( meta, type, language, datatype, etc. ) are stuff I use in my RDF API, but you maybe or may not need them.


SUBJECT PREDICATE OBJECT META TYPE DATATYPE LANGUAGE
--------------------------------- ----------------------------------------------- --------------------------- ------ ---- -------- --------
http://www.foo.com/bar#card1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card10 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card1000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card10000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100000 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100001 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100002 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100003 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)
http://www.foo.com/bar#card100004 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.foo.com/bar#Card (null) 1 (null) (null)

If I want to pull in other data ( such as the shelves that the cards are plugged in to ), I can create a similar view for that data and then create a new view that is the union of the two.

create card_and_shelf_view as
select * from rdf_card_inst_view
union
select * from rdf_shelf_inst_view;

Pretty simple.

Now, you might be saying to yourself, “Shit, this going to take forever with those views!” And you’d be right in some cases. Doing a ’select *’ kind of query takes about 4 seconds on my box with 2.4 million rows in the view (AMD Phenom Triple Core @2Ghz with 3GB RAM, Oracle 10g, Ubuntu 7.10). But as soon as I start putting some constraints on the query, the database optimizer seems to kick in and things start moving right along. Also, limiting the results to just the first 1500 records or so is also helpful. :)

For example, if I use my RDF API to do simple selects (ie. give me all the cards on a given shelf), the API generates this query to execute against the view:


SELECT 1 as stype, subject, 1 as ptype, predicate, type, object, datatype, language, 1 as mtype, meta FROM rdf_card_inst_view WHERE (predicate='http://www.foo.com/bar#shelf' ) and (object='F:MEA10:DS01' ) AND ROWNUM < 1500

That query executes in 0.009 seconds. Not bad!

Linked Data Planet Summary

June 21st, 2008

Linked Data Planet ended on Wednesday, and I’m just now getting around to writing it up. So, here goes:

In all, I thought the conference was a big success. Lots of heavy hitters in the LOD and Semantic Web movement were there, including Tim Berners-Lee (W3C), Kingsley Idehen (Open Link), Ian Davis (Talis), Andy Seaborne (HP Labs) and Richard Cyganiak (DERI). The talks were great and it really got me pumped up to see all these really smart people promoting the technologies.

I didn’t win Bob DuCharme’s contest for best submission, but hey, I got second and my question was put up for discussion. I got about the response that I expected: “it depends”. :)

I wanted to introduce myself a little more to some of the other attendees, but being a reclusive hacker-type, I failed miserably. I especially wanted to chat with Kingsley and at least meet TimBL, but never seemed to get around to it.

I did however have a nice conversation with Andy Seaborne. I wanted to know about how he was converting SPARQL to SQL in Jena’s SDB. He very nicely pulled me aside and gave me the rundown and showed me how I could view the SQL that SDB produces using sdbprint. Wow. His SQL is miles better than mine and he’s a really, really nice guy. Thanks Andy!

I think my favorite sessions were:

  • Applying Semantic Web Technologies to Enterprise Solutions by Dean Allemang and Irene Polikoff of TopQuadrant
  • Integrating Relational Data Into the Semantic Web by Dr. Ashok Malhotra and Jim Melton of Oracle
  • How to Publish Linked Data on the Web by Tom Health of Talis

The panels were also great. The Challenges and Opportunities panel had a bit of controversy I think after Geoff Brown of m2mi declared the semantic web was dead. Gasp! Hushed silence. “Did he just say what I thought he said?”. He claimed no one would do a $50m deal in the semantic space. I guess if that’s your measurement of success, then maybe he’s right. I guess “fuck-you” money is in the tens of millions for him. For most of us though, I think it’s probably in the low millions. Time will tell I suppose.

I brough my boss along to the show and he’s totally guzzling the LOD/SemWeb Kool-Aide now. Yea! So, we already have lots product ideas that seem to be perfect fits for this technology. So I hope that means it’ll be triples for me for the through the rest of the year at least. :)

Anyway, great conference. I’m going to get some stuff together and see if I can’t present at the Fall conference in California. Perhaps I’ll LOD-ify TerminalSquad and finally get some community involvement with that. I’ve got lots of data…it’s just tucked away in my own little silo. High time to let it out!

Well, I’ve finally started working hard on the enterprise syndication product. Things are moving right along!

Since this application is based on lots of semantic web technologies, I first need some way to store my RDF and query it. There are a few open source and commercial applications out there to do this. But, having a terminal case of Not Invented Here Syndrome, I decided to roll my own. I had some old code from an earlier (abandoned) project that I used as a starting point. But I totally re-implemented the storage engine to support MySQL and created a first stab at a decent SPARQL query engine.

I also quickly coded up a little RSS crawler to go out and crawl about 40 RSS feeds. Right now it does this on demand, but I’ll soon change to poll every X minutes. I’m using Danny Ayers’ and Henry Story’s AtomOWL schema to represent the RSS data in the triple store. Not sure I’ll stick with that, but it works for the time being.

With the triple store, a crawler and a query engine in place, I can finally start to do some of the higher level work like notification, user interfaces, workflow, etc. The next two weeks should see major progress made. Hopefully I’ll have something to show to people at LinkedDataPlanet and get some feedback.

Just for fun, here’s a screenshot of the SPARQL Shell that I’ve written to run test queries it. Probably not part of the product, but a must-have for development. Here I’m just querying my triple store for articles from CruchGear. Neato!

SPARQL Shell

Until next time, stay classy inter-tubes.

My Semantic Web Obsession

May 19th, 2008

For several years now, I’ve been kind of obsessed with the work being done to fulfill TimBL’s vision of a Semantic Web. Now, I don’t know that I’m ready to drink the whole Big Giant Graph Kool-aid yet, but I totally buy into the technology. RDF(s) and OWL are really wonderful things.

A few years back, I was working on a Semantic Web In a Box called ontogon. I made some pretty decent progress and had a usable little tool to store and related pretty much anything. But I got busy traveling and working so much that I kind of shelved it, let the domain expire, etc.

Some recent discussions with friends have re-invigorated my interest in this area. Specifically in relation to how applications based on semantic web technologies can bring value to the enterprise. In the “typical” enterprise setting, there is a shit-load (technical term) of data being produced by people and systems. Some research has led me to the belief that 1) finding that data is hard, 2) making sure it’s relevant is harder, and 3) linking it together is even harder still. This is just the kind of area that semantics is really good at tackling.

So my next project is going to be a solution to this problem. Quite a step up from copy2web in terms of complexity, features and price point, no? I’ll post more frequently on this as I make progress.

copy2web 1.3

May 19th, 2008

I bumped into Ian Landsman of HelpSpot fame on Twitter the other day. I offered a free copy2web license, and he was kind enough to provide some feedback. Based on his observations, I’ve got a new feature list for release 1.3.

  • A Delete button to remove clips.
  • An OS X style “quick-look” feature instead of the tool-tip preview.
  • Simplified registration.
  • Letting you give a name to any of your non-file clips.

Thanks Ian!

I’m still toying with the idea of selling the server-side component as well. But I’m really busy with work and other projects right now, so that might be put off until 1.4

copy2web v1.2 RELEASED

April 25th, 2008

Well, it’s a bit late (at least I’m consistent!), but copy2web 1.2 is now available! New features include:

v1.2

  • Files. You will be able to store a file as a clip (up to 2MB)
  • Drag and Drop. Drag a file, image or text selection into the copy2web window. Note that what actually gets copied is browser dependent.
  • Drag text and image clips out of copy2web and into another application.
  • Drag files to the Finder, Explorer or desktop to save them to the local filesystem.
  • Compression. Clips are now GZIP compressed for faster upload/download.
  • Windows Task Tray. On Windows, minimize copy2web to the task tray.
  • Progress indicator. For big clips, an infinite progress indicator will appear to mesmerize you whilst you wait.
  • Support for Linux!

So, what’s next? I’m probably going to create purely HTML based version for those that love to live in their browser. It will probably be much more simplistic that the desktop version; no drag ‘n drop, for example.

Also I’ll probably add a facility to share your clips with your friends.

And, as always, I’m interested to hear what you features you might like, so please drop me a line!

copy2web - 1.2 In The Oven

April 15th, 2008

After a few weeks doing “work” work, I’m back to making some updates to copy2web. The feature list I mentioned earlier have all been implemented, now I just have some testing to do.

If you’re a current user, you’ll get the update for free. Look for an email from me sometime this weekend!

I’m biting the bullet and providing a free 30-trial of copy2web. It’s the full-meal-deal…no crippleware, no nag-screens. I’ll just cut you off after 30 days (or whenever I get around to writing a script to manage that).

Feel free to grab a copy and play with it. If you have any problems or suggestions, please let me know!