<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Musings &#187; Infrastructure</title>
	<atom:link href="http://www.chrisfrymann.com/category/infrastructure/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.chrisfrymann.com</link>
	<description>Thoughts and resources worth sharing or remembering</description>
	<lastBuildDate>Wed, 07 Oct 2009 21:01:21 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.3</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Reference: &#8220;Triple Stores Aren&#8217;t&#8221;</title>
		<link>http://www.chrisfrymann.com/2009/08/07/reference-triple-stores-arent/</link>
		<comments>http://www.chrisfrymann.com/2009/08/07/reference-triple-stores-arent/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 20:21:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[subject=RDF]]></category>
		<category><![CDATA[subject=triplestore]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=539</guid>
		<description><![CDATA[
From the blog of Eric Hellman
Triple Stores Aren&#8217;t
&#8220;&#8230;all the triple stores in serious use today use more that 3 columns to store the triples. Instead of triples, RDF atoms are now stored as 4-tuples, 5-tuples, 6-tuples or 7-tuples.
&#8230;
Is there anything harmful with the misnomerization of &#8220;triple&#8221;, enough for the community to try their best to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://go-to-hellman.blogspot.com"><img src="http://chrisfrymann.com/image/hellman.jpg" alt="" /></a></p>
<p>From <a href="http://go-to-hellman.blogspot.com/">the blog of Eric Hellman</a></p>
<p><a href="http://go-to-hellman.blogspot.com/2009/06/triple-stores-arent.html">Triple Stores Aren&#8217;t</a></p>
<div style="background-color: #ebf8e2; border: 1px dotted #71c837; margin: 15px 60px; padding: 8px; vertical-align: middle">&#8220;&#8230;all the triple stores in serious use today use more that 3 columns to store the triples. Instead of triples, RDF atoms are now stored as 4-tuples, 5-tuples, 6-tuples or 7-tuples.</p>
<p>&#8230;</p>
<p>Is there anything harmful with the misnomerization of &#8220;triple&#8221;, enough for the community to try their best to start talking about &#8220;tuples&#8221;? I think there is. Linked Data is the best example of how a focus on the three-ness of triples can fool people into sub-optimal implementations. I heard this fear expressed several times during the conference, although not in those words. More than once, people expressed concern that once data had been extracted via SPARQL and gone into the Linked Data cloud, there was no way to determine where the data had come from, what its provenance was, or whether is could be trusted. He was absolutely correct- if the implementation was such that the raw triple was allowed to separate from its source. If there was a greater understanding of the un-three-ness of real rdf tuplestores, then implementers of linked data would be more careful not to obliterate the id information that could enable trust and provenance. I come away from the conference both excited by Linked Data and worried that the Linked Data promoters seemed to brush-off this concern.&#8221;</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/08/07/reference-triple-stores-arent/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reference: &#8220;A Reflection on the Structure and Process of the Web of Data&#8221;</title>
		<link>http://www.chrisfrymann.com/2009/08/07/reference-a-reflection-on-the-structure-and-process-of-the-web-of-data/</link>
		<comments>http://www.chrisfrymann.com/2009/08/07/reference-a-reflection-on-the-structure-and-process-of-the-web-of-data/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 18:05:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[subject=infrastructure]]></category>
		<category><![CDATA[subject=RDB]]></category>
		<category><![CDATA[subject=RDF]]></category>
		<category><![CDATA[subject=triplestore]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=529</guid>
		<description><![CDATA[
A Reflection on the Structure and Process of the Web of Data
&#8220;What has been the sole territory of relational database technologies may soon be displaced by the use of RDF and the triple store. Moreover, because RDF is the common data model utilized by triple stores, it is possible to integrate data sets across different [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.boxesandarrows.com/files/banda/when-life-intervenes/asistlogoHiRes2.gif"></p>
<p><a href="http://www.asis.org/Bulletin/Aug-09/AugSep09_Rodriguez.html">A Reflection on the Structure and Process of the Web of Data</a></p>
<div style="background-color: #ebf8e2; border: 1px dotted #71c837; margin: 15px 60px; padding: 8px; vertical-align: middle">&#8220;What has been the sole territory of relational database technologies may soon be displaced by the use of RDF and the triple store. Moreover, because RDF is the common data model utilized by triple stores, it is possible to integrate data sets across different triple stores – across different RDF data providers. This integration is conveniently afforded by the URI and RDF as web standards and is a function foreign to the relational database domain. With the Web of Data, no longer is information isolated in individual inaccessible data silos, but instead is exposed in an open and interconnected environment – the web environment.&#8221;</div>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/08/07/reference-a-reflection-on-the-structure-and-process-of-the-web-of-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Rolls out Semantic Search Capabilities</title>
		<link>http://www.chrisfrymann.com/2009/04/28/google-rolls-out-semantic-search-capabilities/</link>
		<comments>http://www.chrisfrymann.com/2009/04/28/google-rolls-out-semantic-search-capabilities/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 23:58:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[subject=cloud_computing]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=349</guid>
		<description><![CDATA[
Google Rolls out Semantic Search Capabilities
&#8220;Google has given its Web search engine an injection of semantic technology, as the search leader pushes into what many consider the future of search on the Internet.&#8221;
]]></description>
			<content:encoded><![CDATA[<p><img src="http://images.pcworld.com/opinion/graphics/141010-googlelogo_180.jpg" alt="" /></p>
<p><a href="http://www.pcworld.com/businesscenter/article/161869/google_rolls_out_semantic_search_capabilities.html">Google Rolls out Semantic Search Capabilities</a></p>
<p>&#8220;Google has given its Web search engine an injection of semantic technology, as the search leader pushes into what many consider the future of search on the Internet.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/04/28/google-rolls-out-semantic-search-capabilities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Internet Archive and Sun Modular Datacenter</title>
		<link>http://www.chrisfrymann.com/2009/03/31/internet-archive-and-sun-modular-datacenter/</link>
		<comments>http://www.chrisfrymann.com/2009/03/31/internet-archive-and-sun-modular-datacenter/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 17:52:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[subject=open_source]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=177</guid>
		<description><![CDATA[
Part on an email message from Art.Pasquinelli@sun.com
The Internet Archive and the Sun Modular Datacenter Announcement &#8211; March 25 
Sun and the Internet Archive rolled out a joint project at a March 25 event in Santa Clara. Sun has approximately 60 Sun Fire x4500s managing the Internet Archive&#8217; content in a Sun Modular Datacenter. Dozens of [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.sun.com/images/l0/l0_featured_03-25-09.jpg" alt="" width="600" /></p>
<p>Part on an email message from Art.Pasquinelli@sun.com</p>
<p><strong>The Internet Archive and the Sun Modular Datacenter Announcement &#8211; March 25 </strong><br />
Sun and the Internet Archive rolled out a joint project at a March 25 event in Santa Clara. Sun has approximately 60 Sun Fire x4500s managing the Internet Archive&#8217; content in a Sun Modular Datacenter. Dozens of articles were generated globally (see below). We will have a presentation and consulting on the architecture at the June 24-26 Sun PASIG in Malta (Early Bird registration will be open in the next 1-2 days at <a href="http://www.sun-pasig.org/">www.sun-pasig.org</a>).</p>
<p>The video tour and Sun case study can be seen at; <a href="http://www.sun.com/featured-articles/2009-0325/feature/index.jsp">http://www.sun.com/featured-articles/2009-0325/feature/index.jsp</a></p>
<p><strong><span style="font-size: 10pt;">KEY QUOTES</span><br />
</strong></p>
<p>“’Archive.org also houses the Wayback Machine, 1 million books, 100,000 movies and about 200,000 audio recordings,’ Kahle said. ‘It is a full-on library. This technology we see as another step toward a manageable system for dealing with enormous amounts of information safely.’&#8221; – eWEEK</p>
<p style="margin-bottom: 12pt;">“Each container packs in 60 of the company&#8217;s Sun Fire X4500 Open Storage Systems and is constantly monitored for potential threats. It&#8217;s actually a pretty elegant, modular solution to an archive that grows by nearly 100TBs every month.” –Gizmodo</p>
<p>“’For years, the folks at the archive got by building their own open-source computer systems,’ said Brewster Kahle, its founder &#8212; at one point the archive spun off a company to build systems &#8212; but they couldn&#8217;t move fast enough to keep up with advances in technology or the growth of the Web. So today Sun Microsystems will announce that the archive has migrated its digital library to a single Sun data center housed in a shipping container on Sun&#8217;s campus in Santa Clara.” – San Francisco Chronicle</p>
<p>“’It may be the single largest database in the world, and it&#8217;s all in a shipping container. I think of the shipping container as a single machine or expression made up of many smaller machines,’ said Brewster Kahle, digital librarian and co-founder of the Internet Archive, the nonprofit organization that runs the Wayback Machine site.” – Computerworld</p>
<p>“The Internet Archive is moving its data to one of Sun&#8217;s datacenter-in-a-box units, and not a minute too soon. At the rate that print is dying, we&#8217;ll need to step up the archival pace if we&#8217;re going to retain our cultural artifacts in the digital age.” – Ars Technica</p>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/03/31/internet-archive-and-sun-modular-datacenter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UCSD Libraries Public Access System Architecture</title>
		<link>http://www.chrisfrymann.com/2009/03/30/ucsd-libraries-public-access-system-architecture/</link>
		<comments>http://www.chrisfrymann.com/2009/03/30/ucsd-libraries-public-access-system-architecture/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 22:29:38 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Standard]]></category>
		<category><![CDATA[group=UCSD]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=171</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.chrisfrymann.com/image/pas.jpg" alt="" width="432" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/03/30/ucsd-libraries-public-access-system-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Triplestore Management</title>
		<link>http://www.chrisfrymann.com/2009/03/21/triplestore-management/</link>
		<comments>http://www.chrisfrymann.com/2009/03/21/triplestore-management/#comments</comments>
		<pubDate>Sat, 21 Mar 2009 18:55:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[group=UCSD]]></category>
		<category><![CDATA[subject=linked_data]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=112</guid>
		<description><![CDATA[Here in the UCSD Libraries IT Department Development group we have been working with RDF and various triplestore, or triplestore-like implementations for several years now.  More than a year ago we began investing a good deal of attention in a particular triplestore product:
AllegroGraph
We chose to give it special attention for reason which include, but are [...]]]></description>
			<content:encoded><![CDATA[<p>Here in the UCSD Libraries IT Department Development group we have been working with RDF and various triplestore, or triplestore-like implementations for several years now.  More than a year ago we began investing a good deal of attention in a particular triplestore product:</p>
<p style="padding-left: 30px;"><a href="http://agraph.franz.com/">AllegroGraph</a></p>
<p>We chose to give it special attention for reason which include, but are not limited to that it is:</p>
<ul>
<li>Commercially supported, and because Franz is very active in the Sematic Web community<br />
Last year being a fairly major sponsor of the <a href="http://www.semantic-conference.com/">Semantic Technology Conference</a></li>
</ul>
<ul>
<li>Free to us at the level we currently have need for it<br />
Franz has even generously provided a significant amount of free technical support</li>
</ul>
<ul>
<li>Actively maintained and updated by Franz</li>
</ul>
<ul>
<li>Supports SPARQL</li>
</ul>
<ul>
<li>Seemed to have some of the best benchmark performance results</li>
</ul>
<ul>
<li>Has a Java-based API and is compatible with Jena and Sesame</li>
</ul>
<ul>
<li>Goes beyond simple subject/predicate/object triple-based support<br />
Implements statementID’s and Named Graph entries</li>
</ul>
<ul>
<li>Can bulk load from RDF/XML and N-Triples</li>
</ul>
<ul>
<li>Supports direct generation of JSON from SPARQL queries</li>
</ul>
<ul>
<li>Offers Free-text indexing</li>
</ul>
<ul>
<li>Supports clustering and federation</li>
</ul>
<ul>
<li>Franz is also very much into artificial intelligence and reasoning,<br />
although those are beyond the scope of our current interest.</li>
</ul>
<p>However, in spite of all the nice features and attractions listed above, we had trouble with AllegroGraph when it came to managing various combinations of concurrent usage, including attempts to perform unregulated:</p>
<ul>
<li>Reads</li>
<li>Writes</li>
<li>Re-indexing</li>
</ul>
<p>especially if those involved multiple simultaneous users.</p>
<p>Consequently, we embarked on a fairly serious attempt to analyze the performance capabilities of AllegroGraph, and this in turn led us to begin studying similar performance of other triplestore implementations, including: <a href="http://www.oracle.com/technology/tech/semantic_technologies/index.html">Oracle</a>, <a href="http://www.openrdf.org/">Sesame</a>, <a href="http://docs.mulgara.org/">Mulgara</a>.</p>
<p>Doing this analysis was a challenge though because there were a series of updates to the products which meant we had a sort of moving target to work with.  For instance, at some point Franz updated from the AllegroGraph 2 to AllegroGraph 3 series and in one of our important tests we observed a performance improvement of a factor of over two orders of magnitude reduction in query response time.</p>
<p>I must note that we also had other tasks and priorities which demanded our attention and distracted us from the investigation.  Thus we have not yet managed to either complete our testing, or actually migrate to the latest AllegroGraph (version 3.2 as of this writing).</p>
<p>In short we continue to have problems managing the activities of reading, writing and indexing and have gone to some complicated lengths to separate these activities.  In particular, we have dedicated an independent AllegroGraph server and instance to write operations and then have tasks to synchronize the writeable instance of AllegroGraph to a read-only instance on a daily basis.  The intent of this separation is to protect the read-only version from write and indexing operations which can impair its performance.</p>
<p>Further, because we still have concerns about triplestore (AllegroGraph) performance in our Production environment, we also synchronize particular triplestore data to a Solr instance, which we actually use as our dominant live/active Production query source.  This leave the read-only triplestore protected for more specialized SPARQL query usage.</p>
<p>We have been in frequent contact with Franz about our issues with AllegroGraph’s abilities to handle concurrent activity.  Their tech support staff have been great in trying to help us, but have also acknowledged some of AllegroGraph’s limitations in this area.  They have told us that they would be working on improving some of the problems we have observed, and in fairness, we have not been able to keep up on properly testing their latest releases.</p>
<p>That is a quick summary of our situation.  I’d be very interested to hear how others might be dealing with similar issues.</p>
<p>Thanks.</p>
<p>Chris Frymann<br />
Digital Library Architect<br />
UC San Diego Libraries</p>
<p><em>The above is a variant of a message originally sent to Ben Osteen</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/03/21/triplestore-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Schema-less Databases, Transactions and Eventual Consistency</title>
		<link>http://www.chrisfrymann.com/2009/03/11/transactions-and-eventual-consistency-in-schema-less-databases/</link>
		<comments>http://www.chrisfrymann.com/2009/03/11/transactions-and-eventual-consistency-in-schema-less-databases/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 19:26:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[subject=performance]]></category>
		<category><![CDATA[subject=RDB]]></category>
		<category><![CDATA[subject=triplestore]]></category>

		<guid isPermaLink="false">http://www.chrisfrymann.com/?p=43</guid>
		<description><![CDATA[The value of schema-less databases seems to be a topic of emerging interest, see for instance:
Is the Relational Database Doomed?
Discusses some of the potential of key/value databases as compared to RDBs.   The immediate answer to the inflammatory title is, of course, no.  However, it seems increasingly clear that one can find a lot of company [...]]]></description>
			<content:encoded><![CDATA[<p>The value of schema-less databases seems to be a topic of emerging interest, see for instance:</p>
<p><a href="http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php">Is the Relational Database Doomed?</a></p>
<p style="padding-left: 30px;">Discusses some of the potential of key/value databases as compared to RDBs.   The immediate answer to the inflammatory title is, of course, no.  However, it seems increasingly clear that one can find a lot of company in suggesting that there is significant value in and adoption of schema-less database approaches.</p>
<p>See also:</p>
<p><a href="http://bret.appspot.com/entry/how-friendfeed-uses-mysql">How FriendFeed uses MySQL to store schema-less data</a></p>
<p style="padding-left: 30px;">There are many responses to the above post, so<span style="color: #ff0000;"> some reading is required</span>, but it may be worth it.  I found it interesting though that there were no comments on triplestores.  I&#8217;m not quite ready to jump in on that though.  I&#8217;ll try and come back to it later and see what additional comments may have been made.</p>
<p>In any case, performance is still always an issue.  Many approaches are taken to dealing with response time, and some form of replication is frequently involved, whether it be copying data into parallel systems,  or storing it in alternate forms or formats that have different access versus update characteristics.  This inevitably leads to a problem of maintaining consistency between the various manifestations of the data.  The challenge of maintaining consistency across various forms of parallel systems is therefore a recurrent theme and one addressed in the following sources:</p>
<p><a href=" http://www.allthingsdistributed.com/2008/12/eventually_consistent.html">Eventually Consistent &#8211; Revisited</a></p>
<p style="padding-left: 30px;">Discusses some of the problems managing reads and writes and keeping everything consistent.</p>
<p><a href="http://www.devx.com/semantic/Article/40987">Sesame 3.0 Preview: An Open Source Framework for RDF Data</a></p>
<p style="padding-left: 30px;">From a recent DevX.com article.  Mentions the concept of &#8220;eventual consistency&#8221; in the &#8220;Transactions&#8221; section.</p>
<p>In our case, end-user results and the process of achieving consistency depends on the order in which one updates:</p>
<ul>
<li>Files</li>
<li>Triplestores</li>
<li>Solr Indexes</li>
</ul>
<p>The following references and quote are from an email exchange with Benjamin O&#8217;Steen [bosteen@gmail.com].</p>
<p>Writing to serialized data (files), and later updating Solr indices using JMS/<a href="http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol">AMQP</a> [<a href="http://www.rabbitmq.com/">RAbbitMQ</a>] enables &#8221; indexes &#8216;eventually converging&#8217; to the truth within seconds after the event (truth being whatever the data held on disc says is true.)&#8221;</p>
<p>Changes to an RDF document can be queued as a <a href="http://vocab.org/changeset/schema">Talis changeset</a> and later committed.</p>
<p>Note: This post originally addressed the topic of <strong>R/W Contention in triplestores and Solr as an approach</strong>, posed by Declan Fleming<strong>.<br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.chrisfrymann.com/2009/03/11/transactions-and-eventual-consistency-in-schema-less-databases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

