Posts Tagged subject=linked_data

Large-scale RDF Graph Visualization Tools

Mike Bergman’s selection and review of 26 candidate graph visualization tools. Cytoscape comes in at Number 1.

No Comments

Cytoscape Graph Visualization Tool

Cytoscape is described as an open source bioinformatics software platform for visualizing molecular interaction networks. However, because of the generic nature of graph display and Cytoscape’s flexible import/export capabilities it can easily be used in many other contexts that involve the visual presentation of a web of connected nodes.

Which, for example, was easily created from nothing more than the following data.

Infrastructure    Future
Client    Future
Server    Future
Development    Future
Hardware    Infrastructure
Network    Infrastructure
Cloud    Infrastructure
Cluster    Infrastructure
Virtual    Infrastructure
Browser    Client
Data    Server
Code    Server
Methodology    Development
Tools    Development
Documentation    Development
Amazon    Cloud
Google    Cloud
EC2    Amazon
S3    Amazon
SimpleDB    Amazon
BigTable    Google
MapReduce    Cluster
Hadoop    MapReduce
RIA    Browser
REST    RIA
JavaScript    REST
JSON    JavaScript
AJAX    JavaScript
Jquery    JavaScript
XHTML    RIA
CSS    RIA
Files    Data
Metadata    Data
SRB    Files
SAMBA    Files
ZFS    Files
Google    Files
Flickr    Files
Other_Files    Files
LinkedData    Metadata
XML    LinkedData
MODS    XML
METS    XML
RDF_XML    XML
RDF_Model    LinkedData
Ontology    RDF_Model
RDF    Ontology
RDFS    Ontology
OWL    Ontology
DublinCore    Ontology
MODS    Ontology
FOAF    Ontology
DOAP    Ontology
SKOS    Ontology
OperatingSystem    Code
Java    Code
Database    Code
Linux    OperatingSystem
ApplicationServer    Java
Tomcat    ApplicationServer
JBOSS    ApplicationServer
QueryLanguage    Database
SQL    QueryLanguage
SPARQL    QueryLanguage
Solr    QueryLanguage
RelationalDatabase    Database
Triplestore    Database
Solr    Database
Agile    Methodology
Eclipse    Tools
Subversion    Tools
Wiki    Documentation
Blog    Documentation
KB    Documentation

No Comments

Clustered Triplestore Implementatins Can Scale Well

From an interview with Chris Bizer developer of DBpedia

Many companies start to build their own “corporate semantic web”, one of the first questions regarding the technical architecture is which triple store should be chosen. Can you recommend a method to pick the right one?

The performance of triple stores was a bottle neck a while ago, but things have improved a lot over the last two years. There are cluster editions of several triple stores now and when deployed on a proper server farm or cloud infrastructure, the stores scale very well. An indicator that might be helpful for choosing a store could be the results of the Berlin SPARQL benchmark which compares the query performance of various triple stores and SPARQL-to-SQL rewriters.

No Comments

Talis Connected Commons

Talis Connected Commons

“The terms of the offer are as follows: if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but thats should be plenty of space in which to collect some really interesting data. To qualify for the scheme, you need to be using either the Open Data Commons Public Domain Dedication and License or the recently launched Creative Commons CC0 license to publish your data. Anyone will then be able to freely access the stored data using the Platform services, without API keys and without usage limits. This means that your data will be wrapped in a ready made API right from the start.

The Platform API covers basic data management facilities, through to a configurable search engine and a fully compliant SPARQL endpoint. And with data being delivered in a range of formats including RDF/XML and JSON, there should be something there for everyone to get their teeth into no matter what kind of application you’re building or environment you’re working in.”

,

No Comments

“The structured Web is growing all around us like stalagmites in a cave!” – Michael K. Bergman

AI³

Adaptive Information
Adaptive Innovation
Adaptive Infrastructure

‘The Unreasonable Effectiveness of Data‘

No Comments

The next Web of open, linked data – Tim Berners-Lee (TEDTalks : 2009)

Ted Talk by Tim Berners-Lee

Tim Berners-Lee exhorts the audience to grow a garden of linked data:

Linked Data

Linked Data

Don’t hug your data, he says. Linked Data

Linked Data

We need to open up the silos:

Linked Data

… and then he made everyone chant:

Linked Data

No Comments

Triplestore Management

Here in the UCSD Libraries IT Department Development group we have been working with RDF and various triplestore, or triplestore-like implementations for several years now.  More than a year ago we began investing a good deal of attention in a particular triplestore product:

AllegroGraph

We chose to give it special attention for reason which include, but are not limited to that it is:

  • Commercially supported, and because Franz is very active in the Sematic Web community
    Last year being a fairly major sponsor of the Semantic Technology Conference
  • Free to us at the level we currently have need for it
    Franz has even generously provided a significant amount of free technical support
  • Actively maintained and updated by Franz
  • Supports SPARQL
  • Seemed to have some of the best benchmark performance results
  • Has a Java-based API and is compatible with Jena and Sesame
  • Goes beyond simple subject/predicate/object triple-based support
    Implements statementID’s and Named Graph entries
  • Can bulk load from RDF/XML and N-Triples
  • Supports direct generation of JSON from SPARQL queries
  • Offers Free-text indexing
  • Supports clustering and federation
  • Franz is also very much into artificial intelligence and reasoning,
    although those are beyond the scope of our current interest.

However, in spite of all the nice features and attractions listed above, we had trouble with AllegroGraph when it came to managing various combinations of concurrent usage, including attempts to perform unregulated:

  • Reads
  • Writes
  • Re-indexing

especially if those involved multiple simultaneous users.

Consequently, we embarked on a fairly serious attempt to analyze the performance capabilities of AllegroGraph, and this in turn led us to begin studying similar performance of other triplestore implementations, including: Oracle, Sesame, Mulgara.

Doing this analysis was a challenge though because there were a series of updates to the products which meant we had a sort of moving target to work with.  For instance, at some point Franz updated from the AllegroGraph 2 to AllegroGraph 3 series and in one of our important tests we observed a performance improvement of a factor of over two orders of magnitude reduction in query response time.

I must note that we also had other tasks and priorities which demanded our attention and distracted us from the investigation.  Thus we have not yet managed to either complete our testing, or actually migrate to the latest AllegroGraph (version 3.2 as of this writing).

In short we continue to have problems managing the activities of reading, writing and indexing and have gone to some complicated lengths to separate these activities.  In particular, we have dedicated an independent AllegroGraph server and instance to write operations and then have tasks to synchronize the writeable instance of AllegroGraph to a read-only instance on a daily basis.  The intent of this separation is to protect the read-only version from write and indexing operations which can impair its performance.

Further, because we still have concerns about triplestore (AllegroGraph) performance in our Production environment, we also synchronize particular triplestore data to a Solr instance, which we actually use as our dominant live/active Production query source.  This leave the read-only triplestore protected for more specialized SPARQL query usage.

We have been in frequent contact with Franz about our issues with AllegroGraph’s abilities to handle concurrent activity.  Their tech support staff have been great in trying to help us, but have also acknowledged some of AllegroGraph’s limitations in this area.  They have told us that they would be working on improving some of the problems we have observed, and in fairness, we have not been able to keep up on properly testing their latest releases.

That is a quick summary of our situation.  I’d be very interested to hear how others might be dealing with similar issues.

Thanks.

Chris Frymann
Digital Library Architect
UC San Diego Libraries

The above is a variant of a message originally sent to Ben Osteen

,

No Comments

What is Web 3.0?

This article appears to have been written in mid 2006, and as such seems especially forward thinking:

http://java.sys-con.com/node/236036

“The defining aspects of the Web 3.0 social experience may [include]:

* Two, that there are no pages. Information comes in packets of discrete units. You merge or cross them, as you need to.

* Three, that there are no Web sites. Existing Web sites are no longer meant for human eyes. They act as indexes to the information, which is accessible via XML request. Exceptions to this will not be Web sites, but independent little islands of commerce or games.”

No Comments

Metadata Extraction Tools

Calais

Calais Viewer demo

Zemanta

Suggests tags, links, photos, and related articles

GATE

Natural Language Processing (NLP)

libSVM

Support Vector Machine

No Comments

Radial Graph of foafs from code4lib conf

From Declan Fleming:

http://ratherinsane.com/~chris/c4l09/index2.php

Really neat implementation and demo of how to mess with rdf data.

Looks like it uses:  http://blog.thejit.org/javascript-information-visualization-toolkit-jit/

It was put together by Christopher Beer, http://twitter.com/_cb_

,

No Comments