Posts Tagged group=UCSD
UCSD Libraries Public Access System Architecture
Posted by admin in Infrastructure, Standard on March 30th, 2009

Triplestore Management
Posted by admin in Infrastructure, Performance on March 21st, 2009
Here in the UCSD Libraries IT Department Development group we have been working with RDF and various triplestore, or triplestore-like implementations for several years now. More than a year ago we began investing a good deal of attention in a particular triplestore product:
We chose to give it special attention for reason which include, but are not limited to that it is:
- Commercially supported, and because Franz is very active in the Sematic Web community
Last year being a fairly major sponsor of the Semantic Technology Conference
- Free to us at the level we currently have need for it
Franz has even generously provided a significant amount of free technical support
- Actively maintained and updated by Franz
- Supports SPARQL
- Seemed to have some of the best benchmark performance results
- Has a Java-based API and is compatible with Jena and Sesame
- Goes beyond simple subject/predicate/object triple-based support
Implements statementID’s and Named Graph entries
- Can bulk load from RDF/XML and N-Triples
- Supports direct generation of JSON from SPARQL queries
- Offers Free-text indexing
- Supports clustering and federation
- Franz is also very much into artificial intelligence and reasoning,
although those are beyond the scope of our current interest.
However, in spite of all the nice features and attractions listed above, we had trouble with AllegroGraph when it came to managing various combinations of concurrent usage, including attempts to perform unregulated:
- Reads
- Writes
- Re-indexing
especially if those involved multiple simultaneous users.
Consequently, we embarked on a fairly serious attempt to analyze the performance capabilities of AllegroGraph, and this in turn led us to begin studying similar performance of other triplestore implementations, including: Oracle, Sesame, Mulgara.
Doing this analysis was a challenge though because there were a series of updates to the products which meant we had a sort of moving target to work with. For instance, at some point Franz updated from the AllegroGraph 2 to AllegroGraph 3 series and in one of our important tests we observed a performance improvement of a factor of over two orders of magnitude reduction in query response time.
I must note that we also had other tasks and priorities which demanded our attention and distracted us from the investigation. Thus we have not yet managed to either complete our testing, or actually migrate to the latest AllegroGraph (version 3.2 as of this writing).
In short we continue to have problems managing the activities of reading, writing and indexing and have gone to some complicated lengths to separate these activities. In particular, we have dedicated an independent AllegroGraph server and instance to write operations and then have tasks to synchronize the writeable instance of AllegroGraph to a read-only instance on a daily basis. The intent of this separation is to protect the read-only version from write and indexing operations which can impair its performance.
Further, because we still have concerns about triplestore (AllegroGraph) performance in our Production environment, we also synchronize particular triplestore data to a Solr instance, which we actually use as our dominant live/active Production query source. This leave the read-only triplestore protected for more specialized SPARQL query usage.
We have been in frequent contact with Franz about our issues with AllegroGraph’s abilities to handle concurrent activity. Their tech support staff have been great in trying to help us, but have also acknowledged some of AllegroGraph’s limitations in this area. They have told us that they would be working on improving some of the problems we have observed, and in fairness, we have not been able to keep up on properly testing their latest releases.
That is a quick summary of our situation. I’d be very interested to hear how others might be dealing with similar issues.
Thanks.
Chris Frymann
Digital Library Architect
UC San Diego Libraries
The above is a variant of a message originally sent to Ben Osteen
Recent Comments