<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog</title>
	<atom:link href="http://adrianotto.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Wed, 01 Sep 2010 18:49:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>OpenStack Object Storage is Great For&#8230;</title>
		<link>http://adrianotto.com/2010/09/openstack-os-is-great-for/</link>
		<comments>http://adrianotto.com/2010/09/openstack-os-is-great-for/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 18:42:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[swift]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=346</guid>
		<description><![CDATA[Soon, the OpenStack Object Storage software will be released. It&#8217;s available now as a Developer Preview if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance.
About OpenStack [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.openstack.org/"><img class="alignright size-full wp-image-356" title="OpenStack" src="http://adrianotto.com/wp-content/uploads/2010/09/openstacklogo.jpg" alt="" width="139" height="143" /></a>Soon, the <a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> software will be released. It&#8217;s available now as a <a href="https://launchpad.net/swift" target="_blank">Developer Preview</a> if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. <strong>This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance.</strong></p>
<h3>About OpenStack Object Storage</h3>
<p><a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> was originally developed by <a href="http://www.rackspace.com/" target="_blank">Rackspace</a>, and was released as <a href="http://www.apache.org/licenses/LICENSE-2.0.html" target="_blank">Open Source Software</a> earlier this year as part of the <a href="http://www.openstack.org/" target="_blank">OpenStack Project</a>. It was written for hosting the <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/">Rackspace Cloud Files</a> service. It&#8217;s original project code name was <em>swift</em>, so you may see references to that in various documentation.</p>
<blockquote><p>OpenStack Object Storage aggregates commodity servers to work together  in clusters for reliable, redundant, and large-scale storage of static  objects. Objects are written to multiple hardware devices in the  datacenter, with the OpenStack software responsible for ensuring data  replication and integrity across the cluster. Storage clusters can scale  horizontally by adding new nodes, which are automatically configured.  Should a node fail, OpenStack works to replicate its content from other  active nodes. Because OpenStack uses software logic to ensure data  replication and distribution across different devices, inexpensive  commodity hard drives and servers can be used in lieu of more expensive  equipment. [<a href="http://www.openstack.org/projects/storage/" target="_blank">1</a>]</p></blockquote>
<p>The system uses a flat namespace, and has a concept an <em>account</em> (how you access the system),  a <em>container</em> (like a directory) and an <em>object</em> (like a file). You can have an arbitrary number accounts each with an arbitrary number of containers. Each container can hold an arbitrary number of objects.</p>
<p>OpenStack Object Storage is very good for is storing unstructured data using an object name as  a lookup key (like a filename). You access your data from a web client  using the web service <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">REST API</a>, not like a filesystem. Download an object (like a file) using an HTTP GET request, fetch object metadata with an HTTP HEAD request, delete an object with an HTTP DELETE request, etc. There are multiple <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">language bindings</a> so you can access your files in OpenStack Object Storage from your favorite language natively (Java, Python, Perl, PHP, .NET, etc.).</p>
<p>The system has no central point of failure, so it&#8217;s extremely fault tolerant, and the data and related metadata are distributed throughout the system, so there are no central scalability constraints. You can store arbitrary amounts of data in the system in both large and small sizes. It performs very well, even under very high levels of concurrency. It keeps multiple replicas of each object, so it&#8217;s reliable, and the storage is very durable, without any expensive hardware. You don&#8217;t need any RAID on any of the servers unless you want it for additional performance.</p>
<h3>Use OpenStack Object Storage For&#8230;</h3>
<p>Here are some good use cases for OpenStack Object Storage:</p>
<ul>
<li>Storing media libraries (photos, music, videos, etc.)</li>
<li>Archiving video surveillance files</li>
<li>Archiving phone call audio recordings</li>
<li>Archiving compressed log  files</li>
<li>Archiving backups (&lt;5GB each object)</li>
<li>Storing and loading of OS Images, etc.</li>
<li>Storing file populations that grow continuously on a  practically infinite basis.</li>
<li>Storing small files (&lt;50 KB). OpenStack Object Storage is great at this.</li>
<li>Storing billions of files.</li>
<li>Storing Petabytes (millions of Gigabytes) of data.</li>
</ul>
<h3>Recognize the Limitations</h3>
<p><strong>Objects must be &lt;5GB</strong></p>
<p>This is an arbitrary size limit, but it can not be set to an unlimited value because of the system design.  If you want to store a backup something larger than 5GB, you&#8217;ll need to   have a way of breaking it up into chunks, and storing some manifest of   the parts so you can later join them back together again when you want   to download the data and use it again.</p>
<p><strong>Not a Filesystem</strong></p>
<p>Uses a REST API, or a language binding that consumes the REST API. It does not use the typical POSIX filesystem semantics like open(), read(), write(), seek(), and close().</p>
<p><strong>No User Quotas</strong></p>
<p>There are no maximums that can be configured on a per-user basis to limit how much storage is used.</p>
<p><strong>No Directory Hierarchies</strong></p>
<p>You can create an arbitrary number of containers, but there is no nested container capability. You can simulate a directory structure using creative object names, but this is limited to a maximum string length. If you only need a shallow hierarchy, or don&#8217;t have long directory names, this might be fine. Just remember that I warned you this is generally a bad idea.</p>
<p><strong>No writing to a byte offset in a file</strong></p>
<p>The only way to update a file is to essentially overwrite it. The system creates a new version of an object each time you upload one with the same name.</p>
<p><strong>No ACL&#8217;s</strong></p>
<p>Per-Container ACL&#8217;s will probably be added in a later release. Per-Object ACL&#8217;s will probably not be supported, but maybe.</p>
<p><strong>No Append Support</strong></p>
<p>It&#8217;s possible that this may be added at a later time using a versioning trick.</p>
<p><strong>No File Locking</strong></p>
<p>Most filesystems integrate with the kernel to offer advisory locking. This is not possible with OpenStack Object Storage.</p>
<p><strong>Eventual Consistency</strong></p>
<p>Don&#8217;t expect version consistency between multiple nodes when data is being updated.</p>
<p>If you upload a new version of an object, and immediately GET that object from another client, you may get a previous version of the file. There is no way to know which version of a given object the system is responding with, unless you set version metadata on each object yourself. If there is any problem with the network, you may get outdated versions of objects, or be able to see objects that were deleted, but the local node may not yet know are deleted.</p>
<p><strong>No Support for Data Encryption</strong></p>
<p>You must encrypt the data yourself. The current version does not have SSL support either. Use an SSL proxy to work around this by terminating the SSL sessions on the same network where the OpenStack Object Storage system runs.</p>
<p><strong>Not Compatible With Web Browsers</strong></p>
<p>You must supply a storage token header to authorize each request. Regular web browsers can&#8217;t do this. This can be solved using a proxy between the client and the system to handle token authentication. This is not a problem is you are using one of the language bindings. They will take care of this when you integrate your web app with the system.</p>
<p><strong>Not a Database</strong></p>
<p>It supports no querying or processing of data on the servers. All you can do is list the objects within a given container. There is no way to search based on object metadata. You need to keep your own external search indexes.</p>
<p><strong>Don&#8217;t try to frequently update large objects.</strong></p>
<p>All updates produce a new version of an object, because objects are <a href="http://en.wikipedia.org/wiki/Immutable_object" target="_blank">immutable</a>.</p>
<p><strong>Don&#8217;t store unlimited objects per container</strong></p>
<p>You can store as many objects in a container as you wish. However, your per-object upload latency will increase considerably one you reach a certain point. I found the optimal number of objects per container to be just under one million. This number will vary depending on your equipment, and how heavy of a workload it&#8217;s subjected to.</p>
<p><strong>Changing Swift Into a Filesystem</strong></p>
<p>You might think of using FUSE to access objects and containers in OpenStack Object Storage as files and directories with a filesystem interface, but you&#8217;ll quickly discover that this is only really good for very simple use cases. Most of the things you need to implement what we think of as a filesystem are missing.</p>
<p>If you are a developer, and you are thinking of building a filesystem on top of OpenStack Object Storage using objects as blocks, that could possibly work, but would probably not perform very well compared to existing alternatives that are actually designed for distributed block storage. The blocks would need to be pretty large to keep the network/protocol overhead down. Frequent writing is not likely to work well. Most users of filesystems are not expecting eventual consistency behavior. They want strong data consistency. You would also want some strategy to handle read/write concurrency with some locking capability. Plus, you would need to have a way to keep track of the blocks like a filesystem does in some data structure or database. Frankly speaking, OpenStack Object Storage is probably not the right tool for the job.</p>
<p><strong>Conclusion</strong></p>
<p>You should probably only use OpenStack Object Storage for use cases it&#8217;s intended for. If what you really want is a clustered filesystem, you&#8217;re probably better off looking at other solutions like <a href="http://en.wikipedia.org/wiki/Lustre_%28file_system%29" target="_blank">Lustre</a>, <a href="http://en.wikipedia.org/wiki/GlusterFS" target="_blank">GlusterFS</a>, <a href="http://en.wikipedia.org/wiki/Global_File_System">GFS</a>, <a href="http://en.wikipedia.org/wiki/OCFS" target="_blank">OCFS</a>, etc. Keep in mind that each of these have their own strengths and weaknesses. Pay particular attention to what they are designed for, and use them accordingly. If you want to use OpenStack Object Storage for something that it was designed for, then you will probably be <strong>very happy with it</strong>. Keep in mind that it&#8217;s a blob storage system. It&#8217;s not a filesystem, not a file server, not a database, etc. To learn more about OpenStack Object Storage, please check out the <a href="http://swift.openstack.org/" target="_blank">Developer Documentation</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/openstack-os-is-great-for/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dev Null = Unlimited Scale</title>
		<link>http://adrianotto.com/2010/08/dev-null-unlimited-scale/</link>
		<comments>http://adrianotto.com/2010/08/dev-null-unlimited-scale/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 22:40:46 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=331</guid>
		<description><![CDATA[It occurred to me today while watching a discussion about MySQL vs. MongoDB that there needs to be more documentation about the performance of the Dev Null database, and its open source derivatives. MongoDB fanboys should be aware that it offers the following features:

100% non-blocking
Unlimited horizontal scalability
Unlimited vertical scalability
Supports Sharding
Supports Clustering
Exceeds write performance of all [...]]]></description>
			<content:encoded><![CDATA[<p>It occurred to me today while watching a discussion about <a href="http://www.xtranormal.com/watch/6995033/" target="_blank">MySQL vs. MongoDB</a> that there needs to be more documentation about the performance of the Dev Null database, and its open source derivatives. MongoDB fanboys should be aware that it offers the following features:</p>
<ul>
<li>100% non-blocking<a href="http://adrianotto.com/2010/08/dev-null-unlimited-scale/"><img class="alignright size-full wp-image-338" title="dev-null-logo" src="http://adrianotto.com/wp-content/uploads/2010/08/dev-null-logo.png" alt="" width="202" height="102" /></a></li>
<li>Unlimited horizontal scalability</li>
<li>Unlimited vertical scalability</li>
<li>Supports Sharding</li>
<li>Supports Clustering</li>
<li>Exceeds write performance of all other databases</li>
<li>Unparalleled concurrency support</li>
<li>Write-and-forget</li>
</ul>
<p>Here is a chart that illustrates write latency and throughput with various different thread concurrency:</p>
<p><img class="size-full wp-image-332 alignnone" title="dev-null-wtite-perf" src="http://adrianotto.com/wp-content/uploads/2010/08/dev-null-wtite-perf.png" alt="" width="616" height="386" /></p>
<p>As you can see, as the number of concurrent writers increases, throughput increases proportionally. No matter how many threads run concurrently, latency remains at zero.</p>
<h3>Support in MySQL<a href="http://www.mysql.com/"><img class="alignright size-full wp-image-335" title="logo-mysql-110x57" src="http://adrianotto.com/wp-content/uploads/2010/08/logo-mysql-110x57.png" alt="MySQL Logo" width="110" height="57" /></a></h3>
<p>You may be thrilled to know that this data storage system is fully supported in MySQL using the <a href="http://dev.mysql.com/doc/refman/5.0/en/blackhole-storage-engine.html" target="_blank">Blackhole Storage Engine</a> written by <a href="http://krow.net" target="_blank">Brian Aker</a>. Anyone considering MongoDB should give this alternative some consideration, as it exhibits the same level of data loss for new data pending writes before a node failure. Plus, MySQL has been around for a long time, and this storage engine is the single most reliable storage engine that MySQL ever produced.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/08/dev-null-unlimited-scale/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Rackspace and NASA Contribute Huge to Open Source</title>
		<link>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/</link>
		<comments>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 17:00:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OpenStack]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=319</guid>
		<description><![CDATA[Today Rackspace and NASA announced OpenStack as a coordinated open development project with 28 participating partner companies and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called Nebula. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nasa.gov"><img class="size-full wp-image-324 alignright" title="NASA" src="http://adrianotto.com/wp-content/uploads/2010/07/nasa-logo.png" alt="" width="155" height="100" /></a><a href="http://www.rackspace.com"><img class="size-full wp-image-327 alignright" title="Rackspace" src="http://adrianotto.com/wp-content/uploads/2010/07/rackspace-logo.png" alt="" width="150" height="100" /></a>Today <a href="http://www.rackspace.com" target="_blank">Rackspace</a> and <a href="http://www.nasa.gov">NASA</a> <a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489" target="_blank">announced</a> <a href="http://www.openstack.org" target="_blank">OpenStack</a> as a coordinated open development project with <a href="http://openstack.org/community/">28 participating partner companies</a> and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called <a href="http://nebula.nasa.gov/services/" target="_blank">Nebula</a>. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API for Cloud Servers, which was previously released with a Creative Commons open license will be used by OpenStack. I was a key contributor to the design of that API, and I&#8217;m honored to have been a part of it. Rackspace has vowed a <a href="http://openstack.org/blog/">commitment to open development</a> of this platform.</p>
<p>This is very exciting for consumers of Cloud Computing because:</p>
<ol>
<li>It allows individual companies to run their own clouds inside their own data centers and on their own equipment using the same scalable technology that powers some of the largest cloud infrastructures in the world.</li>
<li>An individual company can develop applications on their own cloud, and know with confidence that they can run their application on a number of different public clouds without any special adapter software. They need only find a cloud computing or cloud storage provider that uses OpenStack software to ensure compatibility.</li>
<li>If an application is hosted on one OpenStack public cloud, it can be easily moved to another without changes to the application source code, and without using any cloud middleware. This can completely eliminate all fears relating to single-vendor lock-in.</li>
<li>Applications can be run simultaneously in multiple clouds using the exact same software that needs to only implement a single API for universal access to computing and object storage resources.</li>
</ol>
<p><a href="http://www.openstack.org"><img class="alignnone" title="OpenStack" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-openstackorg.png" alt="" width="280" height="61" /></a><a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489"> <img class="alignnone" title="Press Release" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-pressrelease.png" alt="" width="280" height="61" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache benchmark (ab) is not exact</title>
		<link>http://adrianotto.com/2010/05/apache-benchmark-ab-is-not-exact/</link>
		<comments>http://adrianotto.com/2010/05/apache-benchmark-ab-is-not-exact/#comments</comments>
		<pubDate>Thu, 20 May 2010 22:47:44 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=311</guid>
		<description><![CDATA[I wrote an experimental web server today that keeps some internal statistics. It&#8217;s based on libev for the purpose of comparing performance to an equivalent libevent server implementation. During my benchmarking, I sent 10,000 test requests to the server using the &#8216;ab&#8217; utility from the Apache httpd software distribution using various concurrency levels. What I [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote an experimental web server today that keeps some internal statistics. It&#8217;s based on <a href="http://software.schmorp.de/pkg/libev.html">libev</a> for the purpose of comparing performance to an equivalent <a href="http://www.monkey.org/~provos/libevent/">libevent</a> server implementation. During my benchmarking, I sent 10,000 test requests to the server using the &#8216;ab&#8217; utility from the <a href="http://httpd.apache.org">Apache httpd</a> software distribution using various concurrency levels. What I found was that I would get <em>extra</em> hits to the server logged. This confused me at first, because I thought my server must somehow be corrupting its internal statistics, and showing me extra results.</p>
<p>I assumed I must have a bug in my server, and reviewed my code over and over, until I decided to try <a href="http://www.acme.com/software/http_load/">http_load</a> and compare it to the results I got with &#8216;ab&#8217;. To my delight, the http_load client actually sent exactly the right number of requests, and my internal hit counter figure matched. I ran several comparisons to confirm it. The &#8216;ab&#8217; tool does in fact measure the completion of its requests properly, but it may actually send more requests than you ask it to. That&#8217;s because it counts <em>replies</em> not requests.</p>
<p>So, if you use a concurrency setting of 1, then requests will equal responses. If you use a concurrency setting of 100, you might end up with 30 or 40 more requests that arrive at the server that the &#8216;ab&#8217; client does not count in its results. Mystery solved. </p>
<p>I will publish results from my performance study. Initial results are showing that my libev server can produce roughly 10,000 responses per second on a single CPU. This is in the same ballpark as other high performance web servers. Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/05/apache-benchmark-ab-is-not-exact/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cassandra Gets Promoted!</title>
		<link>http://adrianotto.com/2010/03/cassandra-gets-promoted/</link>
		<comments>http://adrianotto.com/2010/03/cassandra-gets-promoted/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 07:00:20 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=287</guid>
		<description><![CDATA[Today it&#8217;s the one month anniversary of Cassandra graduating to a top level Apache project. It now has a new and improved project URL: http://cassandra.apache.org
Recently you may have noticed my writing about Drizzle, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cassandra.apache.org"><img class="alignright size-full wp-image-225" title="cassandra" src="http://adrianotto.com/wp-content/uploads/2009/11/cassandra1.png" alt="" width="186" height="101" /></a>Today it&#8217;s the one month anniversary of Cassandra <a href="http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html" target="_blank">graduating</a> to a top level Apache project. It now has a new and improved project URL:<a href="http://cassandra.apache.org" target="_blank"> http://cassandra.apache.org</a></p>
<p>Recently you may have noticed <a href="http://www.rackspacecloud.com/blog/2010/03/13/rackspace-and-drizzle-its-time-to-rethink-everything/" target="_blank">my writing about Drizzle</a>, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with the same <a href="http://www.rackspacecloud.com" target="_blank">company</a> sponsoring both projects.</p>
<p><a href="http://drizzle.org">Drizzle</a> is the way to go if you want an SQL system, and <a href="http://cassandra.apache.org" target="_blank">Cassandra</a> is the way to go if you have a huge data set or if you have a data insert/update rate that&#8217;s too high for and RDBMS to keep up with.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/cassandra-gets-promoted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bandwidth != Network Performance</title>
		<link>http://adrianotto.com/2010/03/bandwidth-network-performance/</link>
		<comments>http://adrianotto.com/2010/03/bandwidth-network-performance/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 17:34:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=237</guid>
		<description><![CDATA[You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://adrianotto.com/wp-content/uploads/2010/03/rj45.jpg"><img class="alignright size-full wp-image-302" title="rj45" src="http://adrianotto.com/wp-content/uploads/2010/03/rj45.jpg" alt="" width="240" height="240" /></a>You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This post explains why.</p>
<p>First of all, let&#8217;s review some definitions:</p>
<ul>
<li><strong>Bandwidth</strong>: The amount of data that can be passed along a communications channel in a given period of time.</li>
<li><strong>Latency</strong>: The time it takes for a packet to cross a network connection, from sender to receiver.</li>
<li><strong>Speed</strong>: Fast and rapid moving, going, traveling, proceeding, or performing; swiftness.</li>
<li><strong>Throughput</strong>: The quantity data transmitted by a computer network over a given period of time.</li>
</ul>
<p>Now, all of these terms are related, and I want to highlight some of the minutia here:</p>
<p><strong>Bandwidth</strong></p>
<p>The higher the bandwidth is on a network connection, the more data it&#8217;s capable of transmitting in a given period of time. Higher bandwidth is better.</p>
<p><strong>Latency</strong></p>
<p>This is very very important, because latency effectively limits the amount of bandwidth you can consume if you are using a synchronous data transmission, like a TCP/IP download. Lower latency is better, and will yield faster speed.</p>
<p><strong>Throughput</strong></p>
<p>Throughput is another way of expressing speed. The higher the throughput, the faster your network communications will be. Note that your maximum possible throughput is your bandwidth. Actual throughput is equal to or less than your bandwidth.</p>
<p><strong>Speed</strong></p>
<p>If your network is high speed, you should observe high bandwidth, low latency, and high throughput.</p>
<h3>Latency and Bandwidth are Inversely Proportional</h3>
<p>For TCP/IP transmissions, the higher your latency is, the lower your throughput will be. Let&#8217;s explore why. The most common use of TCP/IP is for the web, which uses the HTTP protocol. HTTP works by making a TCP/IP connection to a remote server, issuing a request for a document, and then receiving the response. The protocol is text based. A simple HTTP transmission is illustrated below.</p>
<p>Client Request:</p>
<pre>GET / HTTP/1.1
User-Agent: Wget
Host: www.example.com
</pre>
<p>Server Response:</p>
<pre>HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT
ETag: "b300b4-1b6-4059a80bfd280"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Connection: Keep-Alive
Date: Wed, 18 Nov 2009 22:36:34 GMT
Age: 1010
Content-Length: 438

  Example Web Page

You have reached this web page by typing "example.com",
"example.net",
  or "example.org" into your web browser.

These domain names are reserved for use in documentation and are not available
  for registration. See &amp;lta href="http://www.rfc-editor.org/rfc/rfc2606.txt"&gt;RFC
  2606&lt;/a&gt;, Section 3.
</pre>
<p>Here is a trace of the TCP/IP packets that make up that request:</p>
<pre>14:57:47.146665 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: S 3717672264:3717672264(0) win 5840
14:57:47.220092 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 1 win 183
14:57:47.220309 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: P 1:123(122) ack 1 win 183  (GET Request)
14:57:47.300962 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: P 1:728(727) ack 123 win 4502  (200 OK Response)
14:57:47.300993 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 728 win 228
14:57:47.302035 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: F 123:123(0) ack 728 win 228
14:57:47.375475 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: . ack 124 win 4502
14:57:47.375499 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: F 728:728(0) ack 124 win 4502
14:57:47.375510 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 729 win 228
</pre>
<p>Notice that there are 10 packets in the above trace. It&#8217;s a three way handshake to set up the TCP session, then a round trip to send the data, then two more round trips to close down the connection. Each time the server receives a packet from the client, the connection may wait in the server&#8217;s connection queue to be processed, which can further increase the interactive protocol latency. Consider the impact of high latency on a connection like this. Suppose that it takes 0.2 seconds for each round trip. That connection would have a total throughput of 727 bytes downloaded in 0.8 seconds. That&#8217;s a rate of 909 Bytes/sec. Maybe your internet connection is 15 Mb/sec. bandwidth did not matter. Latency caused the throughput to be low.</p>
<p>Now, you might be wondering why we can&#8217;t just improve networking technology to make latency lower. We can, but that&#8217;s not going to help much, because we are still bounded by the speed of light, among other factors. <strong>The speed of light is slow when you consider the distance it has to travel to cross continents on the earth.</strong> Let&#8217;s look at some match to explain that:</p>
<ul>
<li>The speed of light in vacuum is 299,792,458 m/s.</li>
<li>The speed of light in fiber optic cable is ~200,000,000 m/s.</li>
<li>The distance from Anaheim, CA to New York is 4,494,898 meters</li>
<li>The one-way latency to New York is  4,494,898 / 200,000,000 = 22.47ms</li>
<li>The round-trip time between Anaheim, CA and New York is 44.95ms</li>
<li>The current ping time from Anaheim, CA to New York is 72 ms</li>
<pre>Tracing the route to sl-gw33-nyc.sprintlink.net (144.228.243.82)
  1 sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 0 msec
    sl-crs2-ana-0-14-2-0.sprintlink.net (144.232.11.11) 0 msec
    sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 4 msec
  2 sl-crs2-fw-0-13-3-0.sprintlink.net (144.232.19.197) 28 msec
    sl-crs2-fw-0-9-5-0.sprintlink.net (144.232.20.130) 28 msec
    sl-crs1-fw-0-3-3-0.sprintlink.net (144.232.9.65) 28 msec
  3 sl-crs2-kc-0-0-0-2.sprintlink.net (144.232.19.141) 40 msec
    144.232.20.57 40 msec
    sl-crs1-kc-0-5-5-0.sprintlink.net (144.232.24.9) 40 msec
  4 sl-crs2-chi-0-13-5-0.sprintlink.net (144.232.20.109) 52 msec
    sl-crs1-chi-0-1-0-3.sprintlink.net (144.232.18.214) 56 msec
    sl-crs2-chi-0-15-2-0.sprintlink.net (144.232.24.206) 52 msec
  5 sl-crs1-nyc-0-8-0-3.sprintlink.net (144.232.18.123) 72 msec
    sl-crs2-nyc-0-8-0-1.sprintlink.net (144.232.20.119) 72 msec
    sl-crs1-chi-0-10-3-0.sprintlink.net (144.232.9.148) 72 msec
  6 sl-gw33-nyc-14-0-0.sprintlink.net (144.232.6.56) 72 msec *
    sl-gw33-nyc-15-0-0.sprintlink.net (144.232.6.58) 72 msec
</pre>
</ul>
<p>This round trip time includes all of the switching and routing to get the packet through its full round trip. That means that even if all switching and routing were instantaneous, and we had a perfectly straight fiber path between all points on the earth, that we could only reduce latency by about 40%. We can not accelerate the speed of light, so without a significant advance in data transmission technology (perhaps a quantum physics approach) we must accept the speed of light as a performance boundary.</p>
<h3>Making Web Sites Faster</h3>
<p>If you&#8217;re a web content publisher, you can set up your systems to work around these natural limitations. One way to make interactive web performance faster is to place copies of your data in various geographic locations that are physically closer to your end users. Using a <a href="http://en.wikipedia.org/wiki/Content_delivery_network" target="_blank">CDN</a> for your media content is one way to do this. You can also make your web server as fast as possible so that your dynamically generated content can be processed as quickly as possible. Using <a href="http://memcached.org/" target="_blank">memcached</a> to speed up your web application can help. Also, take a look at some <a href="http://developer.yahoo.com/performance/rules.html" target="_blank">best practices</a> for web developers for good performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/bandwidth-network-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Put WiFi on your cell phone&#8217;s SIM Card!</title>
		<link>http://adrianotto.com/2010/02/put-wifi-on-your-sim-card/</link>
		<comments>http://adrianotto.com/2010/02/put-wifi-on-your-sim-card/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 17:31:30 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[VoIP]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Wireless]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=268</guid>
		<description><![CDATA[Have you ever wanted to surf the web from your laptop using the internet connection on your cell phone without connecting any wires, and with no hassle goofing around with software? Well guess what, for you happiness is close at hand!
Today Sagem Orga made a press release that raised my eyebrows. They have a new [...]]]></description>
			<content:encoded><![CDATA[<p>Have you ever wanted to surf the web from your laptop using the internet connection on your cell phone without connecting any wires, and with no hassle goofing around with software? Well guess what, for you happiness is close at hand!</p>
<p><img class="alignright size-full wp-image-269" title="wifi_sim" src="http://adrianotto.com/wp-content/uploads/2010/02/wifi_sim.jpg" alt="" width="126" height="119" />Today <a href="http://www.sagem-orga.com/" target="_blank">Sagem Orga</a> made a <a href="http://www.sagem-orga.com/index.php?mySID=f57afcdfff43f2fcba150c2e7d8d046a&amp;myELEMENT=World%20premier:%20Sagem%20Orga%20and%20Telefonica%20turn%20the%20SIM%20card%20into%20a%20Wi-Fi%20hotspot&amp;searchstring=SIMFi&amp;suchart=volltext" target="_blank">press release</a> that raised my eyebrows. They have a new SIM card (<a href="http://en.wikipedia.org/wiki/Sim" target="_blank">the identification chip in your GSM cell phone</a>) that has WiFi capability right on the chip. This is exciting, because it would enable otherwise ordinary cell phones to be used as WiFi internet gateways, running both WiFi and 3G data connections at the same time.</p>
<p><img class="alignnone size-full wp-image-274" title="laptop-to-phone-to-internet" src="http://adrianotto.com/wp-content/uploads/2010/02/laptop-to-phone-to-internet.png" alt="" width="538" height="198" /></p>
<p>This is something that most phones simply can not do. The ones that can do it require that a software program must be running on the phone to make it into a router that can relay WiFi signals over the web through a 3G data connection over the cell phone network. Getting this on a Blackberry, for example is a huge nuisance, if your service provider supports it at all.</p>
<p>Well, that nuisance may be a thing of the past once the new “SIMFi” technology hits the market. Imagine just plugging in the snazzy new card into your phone, joining its WiFi network from your laptop, and accessing the internet from practically anywhere. How cool is that!?!</p>
<p>There has been a discussion on <a href="http://mobile.slashdot.org/story/10/02/12/1824229/Wi-Fi-In-a-SIM-Card" target="_blank">Slashdot about this</a>. One of the interesting commentary was about the need for a 2.4 GHz antenna, which can actually fit fine on the SIM card itself, as long as it&#8217;s bent around a bit. An obvious question with any WiFi product is &#8220;what&#8217;s the implication on battery life?&#8221;. It will definitely be shorter. Hopefully this device will have some sort of a tunable transmit power adjustment for the WiFi signal so power consumption can be kept to a minimum. After all, your laptop and your cell phone will only be an arm&#8217;s length apart when you are using this setup anyway, so range is not a major concern.</p>
<p>Yes, I do love technical gadgets. The thought of where this could go is very exciting. I&#8217;ll be the first on the waiting list for this!</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/put-wifi-on-your-sim-card/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CPU Time stolen from a virtual machine?</title>
		<link>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/</link>
		<comments>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:42:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=258</guid>
		<description><![CDATA[Those of you studying the vmstat(8) man page may be wondering what the &#8217;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;Time stolen from a virtual machine&#8220;. More specifically:
It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you studying the vmstat(8) man page may be wondering what the &#8217;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;<em>Time stolen from a virtual machine</em>&#8220;. More specifically:</p>
<p>It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.</p>
<p>There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that&#8217;s got a very low cpu workload on it:</p>
<pre>
vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7
</pre>
<p>As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the <em>id</em> and <em>st</em> columns.</p>
<p>In this example, I really don&#8217;t care that I have a nonzero <em>st</em> column because my workload is basically idle all the time anyway.</p>
<p>If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ED Strikes Again?</title>
		<link>http://adrianotto.com/2010/02/ed-strikes-again/</link>
		<comments>http://adrianotto.com/2010/02/ed-strikes-again/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 22:28:19 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=251</guid>
		<description><![CDATA[It&#8217;s not the ED you are thinking of. Nope, it&#8217;s actually the External Dependency.
One piece of advice that I continually dispense is to try to reduce dependencies on remote web sites when coding your own. The problem strikes most dramatically when you run a very busy site, and you have some feed or resource that [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s not the ED you are thinking of. Nope, it&#8217;s actually the <span style="color: #ff0000;"><strong>E</strong></span>xternal <span style="color: #ff0000;"><strong>D</strong></span>ependency.</p>
<p>One piece of advice that I continually dispense is to try to reduce dependencies on remote web sites when coding your own. The problem strikes most dramatically when you run a very busy site, and you have some feed or resource that you download from a remote site. That remote site crashes, and oops, so does yours. It also happens when your busy site gets more traffic than the corresponding requests to the remote site can handle.</p>
<p>I ran into this again today. One site that I host was consuming a remote feed from a site that has a much smaller capacity than my customer does. The site on my end gets over 10 million page views a day (peak ~2000 page views per second). The capacity mismatch became very apparent when something went wrong on the remote end.</p>
<p>The code logic was:</p>
<ol>
<li>If you have a cached version of the feed, and its fresh, then use it.</li>
<li>If the cached entry is expired, then fetch a new one, and replace the one in cache.</li>
</ol>
<p>This logic is fundamentally flawed for busy sites. It seems sensible, but think about what happens when the cached entry expires, and the remote site is responding very slowly. All of a sudden a stampede of requests start stacking up, all trying to get the feed in parallel. It crashes the remote site even worse. The remote site tries to reboot, and you quickly crash it again. The sequence repeats indefinitely.</p>
<p>Why? Because the window of time during which the cache is invalid gets wider and wider as the remote site gets slower and slower. The longer that window is open, the more traffic the remote site will get from cache misses.</p>
<p>A clean solution is to update the cache asynchronously using a scheduled batch job that keeps a local cache of the data. Only attempt to update the cache when it has actually changed. The logic in the web appication changes to:</p>
<ol>
<li>Always use the data in the cached file.</li>
</ol>
<p>The feed site is consulted on regular intervals using a scheduled batch job (cron), and the cached data is updated if it&#8217;s able to get a response. If the remote site is down or too slow, then the application simply continues to use the version it had before. Problem solved!</p>
<p>Why is this not a best practice for all web developers? Because most web sites don&#8217;t get enough traffic for it to matter much. But, if you&#8217;ve got a busy site, and you don&#8217;t want it to crash when your remote feeds do, then you might want to consider getting that data asynchronously, or at least use a cache update procedure that&#8217;s serialized.</p>
<p>Here is <a href="http://cloudsites.rackspacecloud.com/index.php/How_to_download_data_from_remote_web_servers_efficiently" target="_blank">an example</a> of a non-blocking serialization approach that works for PHP applications.</p>
<p>So all you web developers out there who like to consume RSS feeds on the server-side of your web application&#8230; don&#8217;t say I didn&#8217;t warn you. Go look at all your code and make sure you don&#8217;t have an dependency on a remote site. If you do, you now know at least two ways to solve that problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/ed-strikes-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Putting Entropy in the Cloud</title>
		<link>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/</link>
		<comments>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 04:48:56 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Random]]></category>
		<category><![CDATA[RNG]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=247</guid>
		<description><![CDATA[I was browsing through twitter mentions of @adrian_otto and found one posted by Ian Thompson mentioning an article about weak randomness in the cloud. It suggests that because there may be insufficient entropy sources on a Cloud Server or instance that it may make it easier to guess random number sequences because different cloud servers [...]]]></description>
			<content:encoded><![CDATA[<p>I was browsing through twitter mentions of <a href="http://twitter.com/adrian_otto" target="_blank">@adrian_otto</a> and found one posted by <a href="http://twitter.com/MystirrE" target="_blank">Ian Thompson</a> mentioning <a href="http://bit.ly/34Wom8" target="_blank">an article</a> about weak randomness in the cloud. It suggests that because there may be insufficient entropy sources on a Cloud Server or instance that it may make it easier to guess random number sequences because different cloud servers may have similar or even identical entropy pools (or worse yet identical host keys) when created, and therefore easier to break encryption algorithms that depend on them.</p>
<p>Yes, if you have similar entropy pools it is easier to break encryption dependent on it. It&#8217;s reasonably easy to work around this and make sure your entropy pool is uniquely initialized. You can consult the <a href="http://linux.die.net/man/4/random" target="_blank">random manual for the Linux Kernel</a> for information about how to seed your entropy pool with a particular set of data. If you are running an application in the cloud that utilizes encryption, and you are concerned about the initial state of your entropy pool, you can solve that. Use this procedure:</p>
<p>1) Seed your own pool from a long running system that has sufficient entropy in it, rather than relying on what you read from the kernel at startup.</p>
<p>2) Produce a network service that you use to seed your initial entropy pools. This service could be as simple as an entropy file that you create on pseudo-random time intervals, and just discard them as you serve them to cloud server instances (as they boot up) so you never serve the same one twice. At boot time from your VM, simply connect to wherever you run this service and download an input file to seed your entropy pool with. Restrict access to this so that it&#8217;s only available to your own server instances.</p>
<p>3) Make sure that your custom entropy pool initialization takes place prior to starting your encryption software.</p>
<p>4) If you are creating an AMI, or other server image that you plan to clone, be sure that it does not have a host key generated yet. Delete it and allow your initialization scripts to create it when the server is created (after step rather than making copies of the same one.</p>
<p>If you don&#8217;t trust what /dev/random or /dev/urandom emit, you can optionally use OpenSSL with <a href="http://prngd.sourceforge.net/" target="_blank">prngd</a> or <a href="http://egd.sourceforge.net/" target="_blank">egd</a> as alternate entropy sources, and potentially feed in your own sensory input data. If you want to go hardcore, you could add environmental noise such as resistor noise on the microphone input of a sound card, or some other sensory data. There is <a href="http://vanheusden.com/aed/" target="_blank">existing software for doing just that</a>. There&#8217;s all sorts of possibilities. Among them are a number of hardware solutions for RNG, most of which are pretty expensive and are not options for a cloud environment. There are sources of random numbers provided <a href="http://random.irb.hr/">as a service</a> from <a href="http://www.random.org/" target="_blank">various sources</a>.</p>
<p>There are things that we can do as Cloud Computing service providers to pre-initialize your entropy pools for you when the given server instance is created so the procedure above would be redundant. This still leaves the question as to the quality of the <a href="http://en.wikipedia.org/wiki/Random_number_generator" target="_blank">RNG</a> available to you on a cloud server.</p>
<p>There are two standard randomness sources that you should know about:</p>
<p>/dev/random   = produces actual entropy, if you have some, and blocks otherwise.<br />
/dev/urandom = produces available entropy regardless of quality, but does not block.</p>
<p>The Linux kernel has a paravirtual entropy driver which provides kernel-side support for the virtual <a href="http://en.wikipedia.org/wiki/Random_number_generator" target="_blank">RNG</a> hardware. The kernel compile option CONFIG_HW_RANDOM_VIRTIO enables it, and it can be built as a kernel module. There are drivers that run within the hypervisor host kernel that connect this with the RNG hardware available on the server (if any).</p>
<p>drivers/char/hw_random/amd-rng.ko = H/W RNG driver for AMD chipsets<br />
drivers/char/hw_random/intel-rng.ko = H/W RNG driver for Intel chipsets<br />
drivers/char/hw_random/virtio-rng.ko = VirtIO Random Number Generator support</p>
<p>How it works is the hypervisor host (dom0) runs <a href="http://linux.die.net/man/8/rngd/" target="_blank">rngd</a> to read data from /dev/hwrandom (using the Intel or AMD modules mentoined above) and feeds it into /dev/random, then the guest VM (domU) does the same thing. The rngd can mixes data from both /dev/random and /dev/urandom so you get as much random data as you need in a non-blocking fashion. You can consult the kernel <a href="http://lwn.net/Articles/282721/" target="_blank">source code</a> to learn more. Then you run rngd in the guest VM to feed that into the kernel.</p>
<p>What happens if multiple guest VM&#8217;s are reading this data at the same time using this arrangement? I&#8217;m not sure if it&#8217;s possible to deplete the entropy pool of the hypervisor host and produce <a href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator" target="_blank">PRNG</a> patterns that are therefore less random. So if one guest VM emptied the entropy pool by aggressively reading from the /dev/hwrandom device, you might cause someone else&#8217;s guest VM to get less data. This could be solved if there were a simply a rate limit enforced on the consumption of RNG data allowed per guest VM. There is <a href="http://lwn.net/Articles/283103/" target="_blank">further discussion</a> of that as well.</p>
<p>The truth is that for most needs you can have reasonably secure encryption by simply having an ordinary PRNG source like /dev/urandom that&#8217;s properly initialized with random data. I suggest that you use that approach in your cloud deployments.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
