Archive for March, 2009
Jonathan Schwartz – Understanding Sun in Three Easy Steps
Understanding Sun in Three Easy Steps (1 of 4)
“… I’m neither worried about the role information technology will play in the economy, nor am I worried about the relevance of Sun’s offerings. I’m not worried about the future, I’m focused on its arrival date.”
1. Technology Adoption
2. Commercial Innovation
3. Efficiently Connecting 1. and 2.
“…general purpose microprocessors and operating systems are now fast enough to eliminate the need for special purpose devices. That means you can build a router out of a server – notice you cannot build a server out of a router, try as hard as you like. The same applies to storage devices.
“At Sun, open source isn’t for servers. Open source is for datacenters.”
“The second, and arguably more important headwind was a decision made back in the 1990’s to cancel Solaris on Intel, in the belief it would protect Sun’s SPARC hardware business. Conversely, that mistake destroyed a generation of Solaris developers, and accelerated the rise of alternatives to traditional SPARC hardware. And now you understand why we prioritize developers – they are the seeds from which great forests grow. If you don’t water the roots, the trees wither.
But how do you make money giving software away to developers? Well, let’s switch gears, and talk about Software and Services.”
“When Free is Too Expensive
Numerically, most developers and technology users have more time than money. Most readers of this blog are happy to run unsupported software, and we are very happy to supply it. For a far smaller population, the price of downtime radically exceeds the price of a license or support – for some, the cost of downtime is measured in millions per minute. If you’re tracking packages or fleets of aircraft, running an emergency response network or a trading floor, you almost always have more money than time. And that’s our business model, we offer utterly exceptional service, support and enterprise technologies to those that have more money than time. It’s a good business.”
“…open source platforms generate, alongside the services attached to them, over a billion dollars a year, making Sun by far and away the world’s largest open source software company.”
Metadata Extraction Tools
Radial Graph of foafs from code4lib conf
Posted by admin in User_Interface on March 11th, 2009
From Declan Fleming:
http://ratherinsane.com/~chris/c4l09/index2.php
Really neat implementation and demo of how to mess with rdf data.
Looks like it uses: http://blog.thejit.org/javascript-information-visualization-toolkit-jit/
It was put together by Christopher Beer, http://twitter.com/_cb_
Schema-less Databases, Transactions and Eventual Consistency
Posted by admin in Infrastructure, Performance on March 11th, 2009
The value of schema-less databases seems to be a topic of emerging interest, see for instance:
Is the Relational Database Doomed?
Discusses some of the potential of key/value databases as compared to RDBs. The immediate answer to the inflammatory title is, of course, no. However, it seems increasingly clear that one can find a lot of company in suggesting that there is significant value in and adoption of schema-less database approaches.
See also:
How FriendFeed uses MySQL to store schema-less data
There are many responses to the above post, so some reading is required, but it may be worth it. I found it interesting though that there were no comments on triplestores. I’m not quite ready to jump in on that though. I’ll try and come back to it later and see what additional comments may have been made.
In any case, performance is still always an issue. Many approaches are taken to dealing with response time, and some form of replication is frequently involved, whether it be copying data into parallel systems, or storing it in alternate forms or formats that have different access versus update characteristics. This inevitably leads to a problem of maintaining consistency between the various manifestations of the data. The challenge of maintaining consistency across various forms of parallel systems is therefore a recurrent theme and one addressed in the following sources:
Eventually Consistent – Revisited
Discusses some of the problems managing reads and writes and keeping everything consistent.
Sesame 3.0 Preview: An Open Source Framework for RDF Data
From a recent DevX.com article. Mentions the concept of “eventual consistency” in the “Transactions” section.
In our case, end-user results and the process of achieving consistency depends on the order in which one updates:
- Files
- Triplestores
- Solr Indexes
The following references and quote are from an email exchange with Benjamin O’Steen [bosteen@gmail.com].
Writing to serialized data (files), and later updating Solr indices using JMS/AMQP [RAbbitMQ] enables ” indexes ‘eventually converging’ to the truth within seconds after the event (truth being whatever the data held on disc says is true.)”
Changes to an RDF document can be queued as a Talis changeset and later committed.
Note: This post originally addressed the topic of R/W Contention in triplestores and Solr as an approach, posed by Declan Fleming.
On Mapping Between RDF and METS
From: Chris Frymann
Date: Wed, 25 Feb 2009 07:13:10 -0800
To: <dot.porter@gmail.com>
Subject: Re: Mapping from RDF to METS
Hi Dot,
Your question has been passed along to me so I will attempt a short answer.
First, you might try the Google search:
mapping from RDF to METS
which will return a number of useful references.
Second, and from a more abstract point of view, any RDF can be expressed in a number of different formats, including RDF/XML, and since METS, at one level, can act as a simple container or wrapper, any XML can be wrapped in or contained in a METS file, therefore RDF or RDF/XML can be embedded/mapped into a METS container. Simply putting RDF/XML into METS may or may not be something one wants to do, but it makes the point that there is no inherent limitation on getting RDF into a METS container.
Third, and a more practical example, in my environment we have developed a small MODS ontology which enables us to express MODS in RDF (or RDF/XML in particular). We then use a locally tailored XSL transform to automatically convert our RDF/XML into METS with a MODS section. By way of a little additional explanation, we have gone to the effort of expressing our metadata in RDF as well as METS for much the same reasons expressed in the following text copied from:
Semantic Web technologies for digital preservation : the SPAR project
Note particularly, the sentence:
“All the relevant metadata being available in the METS files and in the reference information, we had to map them and index them.”
Although related to your question, technically this is quite a different issue, as it address the topic of mapping METS to RDF rather than RDF to METS.
Here is the larger relevant excerpt from the SPAR article that makes a case for expressing metadata in RDF.
“When designing a system for the long term, it is not possible to imagine all the queries that will be relevant in the future.
Complex queries involve data formats, periods of time, events that have occured to a series of digital objects, software or human agents involved in the processes, etc. The flexibility of the data management is thus a key point in the development of SPAR and we had to take this into account when designing the indexation functions for the data management module.
All the relevant metadata being available in the METS files and in the reference information, we had to map them and index them. Four options were possible :
a XML database,
a relational database,
a RDF triple store or
a search engine.
A risk analysis taking into account implementation issues, functionnal opportunities and persistence in the long term, revealed the RDF triple store as the best candidate for managing metadata in this context :
* the mapping from XML to RDF was considered more relevant and evolutive than the mapping from XML to a relational database,
* the querying and access functionalities were richer than those provided by a search engine, thanks to expressiveness of a standardised query language SPARQL,
* the scalability and robustness was expected to be better than with a XML database, taking into account the amount of expected metadata to be handled by the system (2 billions triples after 2 or 3 years).
Regarding the latter, a benchmark was realized with Virtuoso3 and 2 billions triples were generated with the the LUBM4 ; the results of this prototype were satisfying and confirmed our choice.”
Please feel free to post my response to your group, if you like.
Chris Frymann (cfrymann@ucsd.edu)
Digital Library Architect
UC San Diego
>> Dear List,
>>
>> I’m writing to see if anyone here has thoughts and/or experience
>> mapping from RDF to METS. METS is of course much more rich than RDF,
>> but is it possible to create even a skeletal METS record (containing
>> only a file section and structural map) from RDF triples? Many thanks
>> for any thoughts or advice.
>>
>> Dot
>>
>> –
>> Dot Porter (MA, MSLS) Metadata Manager Digital Humanities
>> Observatory (RIA), Pembroke House, 28-32 Upper Pembroke Street,
>> Dublin 2, Ireland
>> — A Project of the Royal Irish Academy –
>> Phone: +353 1 234 2444 Fax: +353 1 234 2400 http://dho.ie
>> Email: dot.porter@gmail.com
Semantic Web Technologies for Digital Preservation
Semantic Web technologies for digital preservation : the SPAR project
explains “…why RDF is relevant for digital preservation and how it will be implemented in SPAR”.
Offers the following case for why METS is not enough.
- “All the relevant metadata being available in the METS files and in the reference information, we had to map them and index them. Four options were possible :
a XML database,
a relational database,
a RDF triple store or
a search engine.
A risk analysis taking into account implementation issues, functionnal opportunities and persistence in the long term, revealed the RDF triple store as the best candidate for managing metadata in this context :
* the mapping from XML to RDF was considered more relevant and evolutive than the mapping from XML to a relational database,
* the querying and access functionalities were richer than those provided by a search engine, thanks to expressiveness of a standardised query language SPARQL,
* the scalability and robustness was expected to be better than with a XML database, taking into account the amount of expected metadata to be handled by the system (2 billions triples after 2 or 3 years).
Regarding the latter, a benchmark was realized with Virtuoso3 and 2 billions triples were generated with the the LUBM4 ; the results of this prototype were satisfying and confirmed our choice.”
Recent Comments