Archive for March, 2009

NSDL Releases EduPak: An Open Source Digital Library Services Platform for Education

NSDL EduPak 1.0 is a publicly available, lightweight version of NCore (http://NCore.nsdl.org), established in 2008 as an open-source digital library platform of technology and standards that create a dynamic information layer on top of library resources. Based on Fedora open source repository software (http://Fedora-Commons.org), NCore provides users, developers, information managers and decision-makers with systems for description, organization, interrelation and annotation of resources. Built using NCore components, EduPak is an all-in-one, open source, education digital repository solution bundle that provides a general platform for building digital libraries united by a common data model and interoperable applications.

No Comments

Internet Archive and Sun Modular Datacenter

Part on an email message from Art.Pasquinelli@sun.com

The Internet Archive and the Sun Modular Datacenter Announcement – March 25
Sun and the Internet Archive rolled out a joint project at a March 25 event in Santa Clara. Sun has approximately 60 Sun Fire x4500s managing the Internet Archive’ content in a Sun Modular Datacenter. Dozens of articles were generated globally (see below). We will have a presentation and consulting on the architecture at the June 24-26 Sun PASIG in Malta (Early Bird registration will be open in the next 1-2 days at www.sun-pasig.org).

The video tour and Sun case study can be seen at; http://www.sun.com/featured-articles/2009-0325/feature/index.jsp

KEY QUOTES

“’Archive.org also houses the Wayback Machine, 1 million books, 100,000 movies and about 200,000 audio recordings,’ Kahle said. ‘It is a full-on library. This technology we see as another step toward a manageable system for dealing with enormous amounts of information safely.’” – eWEEK

“Each container packs in 60 of the company’s Sun Fire X4500 Open Storage Systems and is constantly monitored for potential threats. It’s actually a pretty elegant, modular solution to an archive that grows by nearly 100TBs every month.” –Gizmodo

“’For years, the folks at the archive got by building their own open-source computer systems,’ said Brewster Kahle, its founder — at one point the archive spun off a company to build systems — but they couldn’t move fast enough to keep up with advances in technology or the growth of the Web. So today Sun Microsystems will announce that the archive has migrated its digital library to a single Sun data center housed in a shipping container on Sun’s campus in Santa Clara.” – San Francisco Chronicle

“’It may be the single largest database in the world, and it’s all in a shipping container. I think of the shipping container as a single machine or expression made up of many smaller machines,’ said Brewster Kahle, digital librarian and co-founder of the Internet Archive, the nonprofit organization that runs the Wayback Machine site.” – Computerworld

“The Internet Archive is moving its data to one of Sun’s datacenter-in-a-box units, and not a minute too soon. At the rate that print is dying, we’ll need to step up the archival pace if we’re going to retain our cultural artifacts in the digital age.” – Ars Technica

No Comments

UCSD Libraries Public Access System Architecture

No Comments

Talis Connected Commons

Talis Connected Commons

“The terms of the offer are as follows: if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but thats should be plenty of space in which to collect some really interesting data. To qualify for the scheme, you need to be using either the Open Data Commons Public Domain Dedication and License or the recently launched Creative Commons CC0 license to publish your data. Anyone will then be able to freely access the stored data using the Platform services, without API keys and without usage limits. This means that your data will be wrapped in a ready made API right from the start.

The Platform API covers basic data management facilities, through to a configurable search engine and a fully compliant SPARQL endpoint. And with data being delivered in a range of formats including RDF/XML and JSON, there should be something there for everyone to get their teeth into no matter what kind of application you’re building or environment you’re working in.”

,

No Comments

“The structured Web is growing all around us like stalagmites in a cave!” – Michael K. Bergman

AI³

Adaptive Information
Adaptive Innovation
Adaptive Infrastructure

‘The Unreasonable Effectiveness of Data‘

No Comments

The next Web of open, linked data – Tim Berners-Lee (TEDTalks : 2009)

Ted Talk by Tim Berners-Lee

Tim Berners-Lee exhorts the audience to grow a garden of linked data:

Linked Data

Linked Data

Don’t hug your data, he says. Linked Data

Linked Data

We need to open up the silos:

Linked Data

… and then he made everyone chant:

Linked Data

No Comments

Tim Bray on the Future of the Web

http://www.infoq.com/interviews/tim-bray-future-of-web#

Tim Bray comments on the importance of Ajax, JavaScript, agile development methodologies, REST, open source, and cloud computing.

In case you’re not familiar with his work “Tim Bray launched one of the first public web search engines in ‘95, co-invented XML 1.0, co-edited “Namespaces in XML”, served on the W3C Technical Architecture Group, and co-chaired the IETF AtomPub Working Group. Currently, he serves as a Distinguished Engineer and Director of Web Technologies at Sun Microsystems”

Here are some of his quotes from the article.

“Ajax is getting awfully good in particular with the advances that are being made in the browser technology with the increased compatibility between things like Firefox and Safari and so on and the new canvas element and the fact that the new browsers have these fantastically high performance JavaScript engines in them. I suspect that the gap in the ecosystem that lies between what you could achieve with Ajax and what you need something like Flash or JavaFX or Silverlight to achieve it’s not that big enough to be terribly interesting.”

… from the business point of view we are going to see that a lot of traditional application planning a deployment cycles are simply going to be broken. The notion that you can use the waterfall model to spec out a project and start by buying Oracle licenses and hardware servers for seven figures and plan for deployment fourteen months from now, the senior VP isn’t going to sign off on that anymore. They are worried about getting over the next six weeks and not about the next fourteen months.

I think that this is probably a very powerful force, in favor of things like Agile methods and Open Source Software and the Cloud all the things that are both monetization on the point of value. Technologies that are going to succeed in a tough times are going to be the ones that are free to adopt, and cheap to deploy and then when they actually start to go to production that’s when you are willing to pay some real money for them, because you saw. So I think that moves us from services and support business model to big up front license and cost business model from deployment to the cloud as opposed to deployment into privately held servers. I think that it is easy to see a bunch of existing technologies, that are going to be encouraged and promoted like Agile like Cloud, like Open Source.”

“On the client, JavaScript is really going through a golden age JQuery is very very good, presumably the ease which you can achieve JavaScript effects without having to sweat too much about different kinds of browsers will continue to increase and get better”

“…it’s pretty clear that at the moment REST is the horse that most people are betting on.”

“I see very few instances of interesting new WS-* stuff being stood up. And I would think that as we move in a more service oriented and web oriented direction, increasing the interesting services are going to be RESTful. Kinds of services and the pressure to integrate with and use those will push things in the right direction.

Even Microsoft which was clearly the leader or co-leader with IBM with WS-* movement, in the next generation of WCF everything is starting to look a whole lot more RESTful and Microsoft Azure has built around AtomPub in large parts. The vendors are pulling and pushing and the services are pulling and pushing. So I think the movement will happen fairly organically.”

, , ,

No Comments

MIT Adopts Open-Access Policy

On March 19, the MIT faculty unanimously adopted a resolution that makes scholarly articles freely and openly available to the entire world.

Hal Abelson (MIT professor of computer science and engineering, who chaired the committee) writes:

“I chaired the committee that drafted the resolution and led faculty discussions on it throughout the fall. So I’m particularly gratified that the vote was unanimously in favor. In the words of MIT Faculty Chair Bish Sanyal, the vote is “a signal to the world that we speak in a unified voice; that what we value is the free flow of ideas.”

Our resolution was closely modeled on similar ones passed last February by Harvard’s Faculty of Arts and Sciences and by the Harvard Law School, also passed by unanimous vote. Stanford’s School of Education did the same, as did Harvard’s Kennedy School of Government just last Monday.”

MIT Adopts an Open-Access Policy

also reported in:

Open Access News

ScienceCommons

No Comments

Triplestore Management

Here in the UCSD Libraries IT Department Development group we have been working with RDF and various triplestore, or triplestore-like implementations for several years now.  More than a year ago we began investing a good deal of attention in a particular triplestore product:

AllegroGraph

We chose to give it special attention for reason which include, but are not limited to that it is:

  • Commercially supported, and because Franz is very active in the Sematic Web community
    Last year being a fairly major sponsor of the Semantic Technology Conference
  • Free to us at the level we currently have need for it
    Franz has even generously provided a significant amount of free technical support
  • Actively maintained and updated by Franz
  • Supports SPARQL
  • Seemed to have some of the best benchmark performance results
  • Has a Java-based API and is compatible with Jena and Sesame
  • Goes beyond simple subject/predicate/object triple-based support
    Implements statementID’s and Named Graph entries
  • Can bulk load from RDF/XML and N-Triples
  • Supports direct generation of JSON from SPARQL queries
  • Offers Free-text indexing
  • Supports clustering and federation
  • Franz is also very much into artificial intelligence and reasoning,
    although those are beyond the scope of our current interest.

However, in spite of all the nice features and attractions listed above, we had trouble with AllegroGraph when it came to managing various combinations of concurrent usage, including attempts to perform unregulated:

  • Reads
  • Writes
  • Re-indexing

especially if those involved multiple simultaneous users.

Consequently, we embarked on a fairly serious attempt to analyze the performance capabilities of AllegroGraph, and this in turn led us to begin studying similar performance of other triplestore implementations, including: Oracle, Sesame, Mulgara.

Doing this analysis was a challenge though because there were a series of updates to the products which meant we had a sort of moving target to work with.  For instance, at some point Franz updated from the AllegroGraph 2 to AllegroGraph 3 series and in one of our important tests we observed a performance improvement of a factor of over two orders of magnitude reduction in query response time.

I must note that we also had other tasks and priorities which demanded our attention and distracted us from the investigation.  Thus we have not yet managed to either complete our testing, or actually migrate to the latest AllegroGraph (version 3.2 as of this writing).

In short we continue to have problems managing the activities of reading, writing and indexing and have gone to some complicated lengths to separate these activities.  In particular, we have dedicated an independent AllegroGraph server and instance to write operations and then have tasks to synchronize the writeable instance of AllegroGraph to a read-only instance on a daily basis.  The intent of this separation is to protect the read-only version from write and indexing operations which can impair its performance.

Further, because we still have concerns about triplestore (AllegroGraph) performance in our Production environment, we also synchronize particular triplestore data to a Solr instance, which we actually use as our dominant live/active Production query source.  This leave the read-only triplestore protected for more specialized SPARQL query usage.

We have been in frequent contact with Franz about our issues with AllegroGraph’s abilities to handle concurrent activity.  Their tech support staff have been great in trying to help us, but have also acknowledged some of AllegroGraph’s limitations in this area.  They have told us that they would be working on improving some of the problems we have observed, and in fairness, we have not been able to keep up on properly testing their latest releases.

That is a quick summary of our situation.  I’d be very interested to hear how others might be dealing with similar issues.

Thanks.

Chris Frymann
Digital Library Architect
UC San Diego Libraries

The above is a variant of a message originally sent to Ben Osteen

,

No Comments

What is Web 3.0?

This article appears to have been written in mid 2006, and as such seems especially forward thinking:

http://java.sys-con.com/node/236036

“The defining aspects of the Web 3.0 social experience may [include]:

* Two, that there are no pages. Information comes in packets of discrete units. You merge or cross them, as you need to.

* Three, that there are no Web sites. Existing Web sites are no longer meant for human eyes. They act as indexes to the information, which is accessible via XML request. Exceptions to this will not be Web sites, but independent little islands of commerce or games.”

No Comments