<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog &#187; Cloud</title>
	<atom:link href="http://adrianotto.com/category/cloud/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Sun, 08 Apr 2012 00:02:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Maximizing Elasticity in the Cloud</title>
		<link>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/</link>
		<comments>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 14:35:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=571</guid>
		<description><![CDATA[Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more [...]]]></description>
			<content:encoded><![CDATA[<p>Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more capacity, you can add more servers, and when they are not needed anymore, you simply turn them back off. You only pay for the time those servers were running, so it&#8217;s more economic than having a large number of servers deployed all the time.</p>
<p>Most simple web clusters rely on a single database sever that all the application servers connect to. This way, all of the application servers have concurrent access to the same data. This can be problematic in the elastic use case when workloads increase, and more servers are added to the cluster. If the work is bottle-necked on storing or accessing data in the database server, adding additional application servers will not help. It will actually make the problem worse.</p>
<p>I spoke on a panel at Zendcon yesterday, which was covered in an <a title="Infoworld Article" href="http://www.infoworld.com/d/cloud-computing/security-remains-top-concern-cloud-app-builders-176707" target="_blank">Infoworld article</a> where my remarks were published. The article says:</p>
<blockquote><p>Panelists also debated use of SQL and database connectivity in clouds. SQL as a design pattern for storage &#8220;is not ideal for cloud applications,&#8221; said Adrian Otto, senior technical strategist for Rackspace Cloud. Afterward, he described SQL issues as &#8220;typically the No. 1 bottleneck&#8221; to elasticity in the cloud. With elasticity, applications use more or fewer application servers based on demand. Otto recommended that developers who want elasticity should have a decentralized data model that scales horizontally. &#8220;SQL itself isn&#8217;t the problem. The problem is row-oriented data in an application,&#8221; which causes performance bottlenecks, said Otto.</p></blockquote>
<p>The author Paul Krill did a good job here of accurately reporting my position on this subject. Data stored in databases are arranged in tables of rows and columns. A new piece of data adds a new row. Each row has multiple columns that separate fields of a single record of data in the table. The truth is that most web applications work very well with this data design pattern. Those should continue to use SQL databases with row oriented data. However, there are some applications where data may be arranged differently to make reading the data more efficient.</p>
<p>If you have a big table of data, and you want to pull out just a little bit of it using a query, the database server must determine the location of that data in the table by consulting the table&#8217;s index, and return the desired portion that matches the constraints given in the query. This makes the reading of data relatively expensive from a computational perspective. If data were instead arranged in lots of columns instead, it could be retrieved more efficiently, and the data could be more easily distributed over a larger number of servers yielding the horizontal scalability that cloud applications want. This works very well in cases where the number of reads are very high, but the data is not updated very frequently in proportion to the reads.</p>
<p>Let&#8217;s use a blog application as an example. Blog posts are written once, and maybe updated a few times, possibly once each time a comment is submitted. However, on a busy web site, a blog post may be read millions of times. If the posts were stored in a column oriented storage system like <a title="Cassandra" href="http://cassandra.apache.org/" target="_blank">Cassandra</a>, they could be quickly and easily retrieved using the id number of the blog post. The listing of recent blog posts can also be arranged in a column so that the front page of the blog site with the listing of the articles can be generated. Using this approach requires that the data be properly arranged as it&#8217;s stored, putting the computational burden on the (infrequent) write rather than on the (frequent) read.</p>
<p>Using a distributed system to store data in columns allows the data to be evenly distributed over an arbitrary number of servers, eliminating the central data bottleneck. Adding more servers in the correct proportion of application servers and storage servers can result in true horizontal scalability, meaning that the capacity increases as a direct proportion of how many servers are in the cluster.</p>
<p>Why doesn&#8217;t everyone do this already? For some good reasons:</p>
<ol>
<li>The concept of running applications in clouds is still relatively new. The related technology is still maturing.</li>
<li>Existing software tends to use SQL already. If you want to use an existing CMS platform, chances are it will require a central SQL database.</li>
<li>Most heavy-read workloads can be scaled well using data caching techniques. If applications don&#8217;t write data very often, it may not be necessary to scale beyond a single database server.</li>
<li>You must anticipate exactly how the application will use the data, and arrange it just right.</li>
<li>It may be harder to analyze the data. Once your data is arranged in a column store, you may not be able to query it in arbitrary ways. You may only be able to pull it out using it&#8217;s id numbers, or by systematically scanning all of it to find the parts you want.</li>
<li>Distributed data storage (aka: NoSQL) systems like Cassandra, Hbase, Redis, etc. are complicated, and there is a considerable learning curve associated with setting them up and maintaining them. In some cases these systems are not as good in terms of data durability or data consistency as the prevailing SQL database systems. These tradeoffs can be difficult to navigate.</li>
<li>Today&#8217;s software developers are very familiar with SQL as a data storage and access paradigm. They can very quickly develop software that relies on the ACID qualities of a SQL database.</li>
</ol>
<p>If you have an application that you want to deploy into a cloud, and you want it to be very elastic, you should think about the subject of how you arrange your data. If you use a centralized data design, you will probably have scalability bottlenecks when you add lots of servers. You should aim to decentralize the data in a way that you can easily add more servers to horizontally scale the environment, and not stumble on the limits of the database server. This is particularly important in situations where you need the application to write a lot of data, and a cache is not a suitable solution for you.</p>
<p>Over time, the reasons why not to use column oriented data will begin to shrink, and better tools and services will make it easier to do. Until then, I suggest that you carefully consider if you need maximum elasticity. If not, then it&#8217;s perfectly appropriate to keep using the same centralized row-oriented data paradigm. Use a cache like memcached in cases where you have heavy reads, and when it&#8217;s acceptable to show slightly outdated information to readers. The truth is that traditional solutions work really well for most web applications. However, if you have one of the more unique situations where you need true horizontal scalability, take a good look at a different arrangement for your data, and the systems and tools to make that possible for you in the cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>I&#8217;m Paranoid, just like you!</title>
		<link>http://adrianotto.com/2011/10/im-paranoid-just-like-you/</link>
		<comments>http://adrianotto.com/2011/10/im-paranoid-just-like-you/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 00:04:05 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=13</guid>
		<description><![CDATA[By: Adrian Otto Over the years I’ve administered email systems that provided service to thousands of end user’s mailboxes. In the early years in the 1990’s most woes of a mail system administrator were about how to instrument the setting up of email accounts and related client settings, and changing passwords when they were forgotten [...]]]></description>
			<content:encoded><![CDATA[<p>By: Adrian Otto</p>
<p>Over the years I’ve administered email systems that provided service to thousands of end user’s mailboxes. In the early years in the 1990’s most woes of a mail system administrator were about how to instrument the setting up of email accounts and related client settings, and changing passwords when they were forgotten by end users.</p>
<p>As the internet became more and more commercialized, spam exploded in our face. Everyone hates spam. Mail administrators hate it with a passion. They are doing everything they can to try and fight it&#8230; they filter, they black-hole, they tattle to abuse@whatever.com about it. Sometimes their own users send spam, and they get black-holed and need to jump through hoops to undo the damage.</p>
<p>At the time I reached my breaking point I managed email for about a dozen domain names, probably about two hundred mailboxes in total. I hated it. I hated every waking moment of it. The RBL’s that worked one day did not work the next. I’m convinced that e-mail system administration is the nastiest dirtiest job there is for a sysadmin.</p>
<p>People kept suggesting to me that I outsource email, which I shrugged off. I had problems with outsourcing:</p>
<p style="padding-left: 30px;"><strong>1) I’m Paranoid about Uptime.<br />
</strong></p>
<p style="padding-left: 30px;">It’s hard for me to trust other people, let alone trust a company. And trusting a company with something as important as my email??? No way. I’m a control freak, and I was going to keep control at all costs. Yes, I hated email system administration. I wasn’t even a sysadmin any more, but I still did it just so that I could control it. It needed to be highly available. I simply could not trust anyone to do it better than me.</p>
<p style="padding-left: 30px;"><strong>2) I’m Paranoid about Security.<br />
</strong></p>
<p style="padding-left: 30px;">Although email is inherently an insecure communication mechanism, all sorts of highly sensitive information is in there anyway. What would happen if a competitor would somehow get control of our email and read it. They could learn all of our secrets. No way, I’m keeping control of the security so that I know it’s locked down as much as humanly possible.</p>
<p style="padding-left: 30px;"><strong>3) I’m Paranoid about Reliability and Control.<br />
</strong></p>
<p style="padding-left: 30px;">If something goes wrong, I want to be able to fix it quick. If I host it, I have full control of everything in the system. I can find what’s wrong and fix it fast. I’m really good at that.</p>
<p>I became a source code contributor for an open source email filtering system called bogofilter that uses Bayes filters to learn what’s spam and what’s not and filter based on that. I thought my spam filtering setup was the bomb! It worked great!</p>
<p>I got busier and busier with my work. I administered my email systems less and less. The better they worked, the less I would work on them because I had other fish to fry. The spammers got smarter and smarter, and soon enough my super cool spam filtering setup was becoming less and less effective.</p>
<p>So in 2006 something happened. I got super frustrated with spam administration. I was tired of having to keep finding or inventing better mouse traps to trap that nasty spam. So I thought to myself&#8230; There is an unlimited desire to send spam. Why? Because it works. If it did not work, the spammers would not be so determined to keep doing it. They are doing everything they can to outsmart you to get mail in your inbox. They keep getting smarter and smarter.</p>
<p>I thought some more&#8230; It’s like viruses. The hackers keep making better viruses, and the virus scanner software companies keep making their virus scanners better to clean them up and block them out. I needed something like virus scan, but for my email. I thought about all the technical ways to do it. I started hunting the web to find answers. I just wanted SOMEONE&#8230; anyone to handle this spam nonsense for me.</p>
<p>In the process, I stumbled across a company called “Webmail.us” (Later acquired by <a href="http://www.rackspace.com" target="_blank">Rackpace</a> and now called “<a href="//www.rackspace.com/email_hosting" target="_blank">Rackspace Email</a>”). They had a great web site, said (at the time) they had 700,000 mailboxes in service. They had a complete spam filtering solution built in. The mailbox hosting was cheap. So cheap I could not ignore it. They were charging less for complete hosting of mailboxes than I was willing to pay for outsourced spam filtering.</p>
<p>In 2006 I did an experiment. I put my own domain name where I get my home email on webmail.us to see how it worked. I told myself that if it worked really well that I might switch all my email over to it, and wash my hands of email sysadmin work and all the spam nonsense that goes along with it. I did it for a month. It worked great. It was fast, it never went down. I got no spam. I was thrilled!</p>
<p><strong>I did the unthinkable. I outsourced my email!</strong></p>
<p>One by one I migrated all of my domains, and all my mail users over to the hosted system. I have never looked back. The system has been rock solid. The few problems I’ve seen over the past three years have been really minor, and solved more quickly than I would have been able to solve using my own systems. I had been converted.</p>
<p>I was so happy to finally be free of all the nuisance of administering email and spam filtering systems. It was great. Years later I ended up working with Rackspace, and told them the story of how I used and loved the email platform. I later met the people behind the system, and it was no wonder that it works as well as it does.</p>
<p><strong>If you are still administering your own email&#8230;</strong> especially if you are running an Exchange system in your own office building. You need to take a serious look in the mirror and ask yourself why you are not outsourcing it to <a href="//www.rackspace.com/email_hosting" target="_blank">Rackspace Email</a>. The truth is:</p>
<p style="padding-left: 30px;">1) It’s more expensive to host it internally. Run the numbers.<br />
2) Your uptime it a lot worse. Measure it.<br />
3) Your security is no stronger. Audit it.<br />
4) You are paranoid, just like me. Yes, you are.</p>
<p>You trust your bank with your money. You trust your phone company not to spy on all your phone calls. You do this stuff without worrying about it. These things are much bigger leaps of trust than outsourcing your email.</p>
<p>From me to you&#8230; do yourself a favor. Run the same experiment I did. You’ll be delighted. I work for Rackspace now, so my view is corrupt, right? Don&#8217;t take my word for it, because you&#8217;re paranoid. Just try it and see.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/im-paranoid-just-like-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a Cloud Platform?</title>
		<link>http://adrianotto.com/2011/02/cloud-platform/</link>
		<comments>http://adrianotto.com/2011/02/cloud-platform/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 22:02:51 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=462</guid>
		<description><![CDATA[Definition of Cloud Platform: A system where software applications may be run in an environment composed of utility cloud services in a logically abstract environment. Definition of PaaS: Platform as a Service. A Cloud Platform offered by a service provider as a hosted service which facilitates the deployment of software applications without the cost and [...]]]></description>
			<content:encoded><![CDATA[<p>Definition of <strong>Cloud Platform</strong>:</p>
<p>A system where software applications may be run in an environment composed of utility cloud services in a logically abstract environment.</p>
<p>Definition of <strong>PaaS</strong>:</p>
<p>Platform as a Service. A Cloud Platform offered by a service provider as a hosted service which facilitates the deployment of software applications without the cost and complexity of acquiring and managing the underlying hardware and software layers.</p>
<p>Examples of PaaS:</p>
<ol>
<li><a href="http://code.google.com/appengine/">Google AppEngine</a></li>
<li><a href="http://force.com">Force.com</a></li>
</ol>
<p>The key drawback of most PaaS services that exist today is vendor lock-in. Implementing an application using a proprietary platform means that it becomes difficult or impossible to move a deployed application from one service provider to another, unless they have compatible Cloud Platforms</p>
<p><a href="http://www.openstack.org"><img class="alignright size-full wp-image-356" title="OpenStack" src="http://cdn.adrianotto.com/wp-content/uploads/2010/09/openstacklogo.jpg" alt="" width="174" height="179" /></a>I plan to address this problem by embracing open source solutions that allow numerous service providers to host applications in a compatible PaaS model where applications can be easily moved with little or no modification required.</p>
<p>Keep an eye on the <a href="http://www.openstack.org">OpenStack</a> project in the upcoming weeks and months for blueprints for a PaaS system for the whole world to use. I invite you to participate in the design and future implementation of the cloud platform that will end vendor lock-in concerns, and simplify application hosting for software developers.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/02/cloud-platform/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>One Step Closer to Ideal</title>
		<link>http://adrianotto.com/2011/02/one-step-closer-to-ideal/</link>
		<comments>http://adrianotto.com/2011/02/one-step-closer-to-ideal/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 20:04:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=459</guid>
		<description><![CDATA[Rackspace announced today that they are no longer charging per-request fees for access to their data on the Cloud Files service. This is good news for those of you who want to closely integrate an application with a cloud storage service. The leasing service in this space is Amazon&#8217;s S3 which has a rather convoluted [...]]]></description>
			<content:encoded><![CDATA[<p>Rackspace <a href="http://www.rackspacecloud.com/blog/2011/02/04/cloud-files-puts-an-end-to-request-charges/" target="_blank">announced today</a> that they are no longer charging per-request fees for access to their data on the <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/">Cloud Files</a> service. This is good news for those of you who want to closely integrate an application with a cloud storage service. The leasing service in this space is Amazon&#8217;s S3 which has a rather convoluted pricing scheme that&#8217;s pretty hard to understand. I&#8217;m happy to see that Rackspace is helping to keep things simple. Now all you pay for when using Cloud Files is your used storage capacity and the bandwidth for the data transfer.</p>
<p>This announcement is on the heels of another <a href="http://www.rackspacecloud.com/blog/2011/01/12/big-news-for-cloud-files-users-akamai-is-coming/">recent announcement</a> that the public web delivery of files from the Cloud Files service now uses the world&#8217;s leading content distribution network from <a href="http://www.akamai.com/">Akamai</a>. This means that if you are looking for somewhere to host an extensive media or video archive, Cloud Files is definitely worth your consideration.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/02/one-step-closer-to-ideal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drizzle is now BETA</title>
		<link>http://adrianotto.com/2010/09/drizzle-is-now-beta/</link>
		<comments>http://adrianotto.com/2010/09/drizzle-is-now-beta/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 19:21:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=376</guid>
		<description><![CDATA[Today Drizzle enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drizzle.org"><img class="alignright size-full wp-image-222" title="Drizzle Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/drizzle64.png" alt="" width="64" height="64" /></a>Today <a href="http://drizzle.org" target="_blank">Drizzle</a> enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate sponsors, including <a href="http://www.sun.com" target="_blank">Sun Microsystems</a> in the early days, and now <a href="http://www.rackspace.com/">Rackspace</a>. Most of the code is contributed by the <a href="https://launchpad.net/drizzle">developer community</a>, which is made up of of a very talented group of open source developers with core committers from four different companies. More about the Drizzle project:</p>
<h3>Charter</h3>
<ul>
<li>A database optimized for Cloud infrastructure and Web applications</li>
<li>Design for massive concurrency on modern multi-cpu architecture</li>
<li>Optimize memory for increased performance and parallelism</li>
<li>Open source, open community, open design</li>
</ul>
<h3>Scope</h3>
<ul>
<li>Re-designed modular architecture providing plugins with defined APIs</li>
<li>Simple design for ease of use and administration</li>
<li>Reliable, ACID transactional</li>
</ul>
<p>There are many exciting changes, such as optimizing everything for 64-bit CPU&#8217;s and Multi-Core. You can&#8217;t hardly even buy 32-bit and Single Core servers nowadays if you want them. It makes no sense to have software that&#8217;s optimized for these antiquated hardware designs. No effort is spent optimizing software to work with rotational hard drives because SSD drives are the way of the future. All the language collations have simply been replaced with UTF-8 only, because the web uses UTF-8. Plus, this is tested with 41 different language translations. Drizzle has a new scheduler. The legacy MySQL scheduler was designed to work for a thread-per-session setup. In Drizzle, sessions are handled independently from the threads. The new scheduler allows this to work.</p>
<p>Drizzle uses InnoDB as its default storage engine, which is great for OLTP. It also supports the <a href="http://www.primebase.org/" target="_blank">PBXT</a> storage engine. There are available plugins for the InnoDB Embedded Engine and <a href="http://www.haildb.com/" target="_blank">HailDB</a> which will soon be the new default. DDL Operations (like ALTER TABLE) can actually roll back in the event that something goes wrong in the process, rather than leaving you with incomplete or corrupt data.</p>
<p>The code base in Drizzle has been fully modernized, and brought up to today&#8217;s standards of C++ with extensive use of the <a href="http://en.wikipedia.org/wiki/Standard_Template_Library">C++ STL</a> to replace MySQL&#8217;s usage of obscure custom data type implementations that offered no real benefit compared to what the STL has today. Another example of improvements in this area is the replacement of the legacy REGEX implementation with a more standard library. All of these changes reduce the amount of Drizzle source code dramatically compared to MySQL. Less code and simpler code means less bugs, plain and simple. Drizzle is well on its way to being an ideal fit for web applications that need a reliable, and high performance transactional database.</p>
<h3>Features in Drizzle7 Beta</h3>
<ul>
<li>New micro kernel</li>
<li>Migration Tool</li>
<li>Instance Catalog Support</li>
<li>Universal Replication</li>
<li>User query analysis</li>
<li>Mutli-core Support</li>
</ul>
<h3>What &#8220;Beta&#8221; means</h3>
<ul>
<li>Your data is safe. Transactional engine by default and stable for over 2 years.</li>
<li>Upgrade the system in-place without exporting/importing data.</li>
<li>Replication is still being tested.</li>
</ul>
<p>In Microsoft terms, it means that this project would have launched about a year ago. In Google terms, it probably would have launched six months ago. Simply put, if you trust your data to a MySQL system today running InnoDB, you should feel comfortable trying Drizzle. There have been some changes to the InnoDB setup, such as the elimination of the FRM files from disk which eliminate possible inconsistency between the state on disk and the state in InnoDB. I am in the process of moving a few of my produciton applications to use the Drizzle Beta. If you&#8217;re an accomplished system administrator and DBA, you should seriously consider putting at least one of your production applications on Drizzle now, and see how it works for you.</p>
<h3>What&#8217;s Next?</h3>
<ul>
<li>Beta <a href="https://launchpad.net/drizzle/+announcement/6840" target="_blank">announced today 2010-09-29</a>.</li>
<li>GA February 2011</li>
<li>GA May 2011 for Multi-Tenancy features that allow an arbitrary number of logical databases (Schemas, Tables, etc.) to exist concurrently with full data isolation between them. This allows for individual security and resource controls (Threads, Memory, IO), and individual database backups, rather than system level backups. This feature will be called &#8220;Catalogs&#8221;.</li>
</ul>
<h3>Download Drizzle</h3>
<p>Time to get started with the beta. Download <a href="https://launchpad.net/drizzle/elliott/2010-09-27">the beta</a> today!</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/drizzle-is-now-beta/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>OpenStack Object Storage is Great For&#8230;</title>
		<link>http://adrianotto.com/2010/09/openstack-os-is-great-for/</link>
		<comments>http://adrianotto.com/2010/09/openstack-os-is-great-for/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 18:42:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[swift]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=346</guid>
		<description><![CDATA[Soon, the OpenStack Object Storage software will be released. It&#8217;s available now as a Developer Preview if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance. About [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.openstack.org/"><img class="alignright size-full wp-image-356" title="OpenStack" src="http://cdn.adrianotto.com/wp-content/uploads/2010/09/openstacklogo.jpg" alt="" width="139" height="143" /></a>Soon, the <a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> software will be released. It&#8217;s available now as a <a href="https://launchpad.net/swift" target="_blank">Developer Preview</a> if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. <strong>This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance.</strong></p>
<h3>About OpenStack Object Storage</h3>
<p><a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> was originally developed by <a href="http://www.rackspace.com/" target="_blank">Rackspace</a>, and was released as <a href="http://www.apache.org/licenses/LICENSE-2.0.html" target="_blank">Open Source Software</a> earlier this year as part of the <a href="http://www.openstack.org/" target="_blank">OpenStack Project</a>. It was written for hosting the <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/">Rackspace Cloud Files</a> service. It&#8217;s original project code name was <em>swift</em>, so you may see references to that in various documentation.</p>
<blockquote><p>OpenStack Object Storage aggregates commodity servers to work together  in clusters for reliable, redundant, and large-scale storage of static  objects. Objects are written to multiple hardware devices in the  datacenter, with the OpenStack software responsible for ensuring data  replication and integrity across the cluster. Storage clusters can scale  horizontally by adding new nodes, which are automatically configured.  Should a node fail, OpenStack works to replicate its content from other  active nodes. Because OpenStack uses software logic to ensure data  replication and distribution across different devices, inexpensive  commodity hard drives and servers can be used in lieu of more expensive  equipment. [<a href="http://www.openstack.org/projects/storage/" target="_blank">1</a>]</p></blockquote>
<p>The system uses a flat namespace, and has a concept an <em>account</em> (how you access the system),  a <em>container</em> (like a directory) and an <em>object</em> (like a file). You can have an arbitrary number accounts each with an arbitrary number of containers. Each container can hold an arbitrary number of objects.</p>
<p>OpenStack Object Storage is very good for is storing unstructured data using an object name as  a lookup key (like a filename). You access your data from a web client  using the web service <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">REST API</a>, not like a filesystem. Download an object (like a file) using an HTTP GET request, fetch object metadata with an HTTP HEAD request, delete an object with an HTTP DELETE request, etc. There are multiple <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">language bindings</a> so you can access your files in OpenStack Object Storage from your favorite language natively (Java, Python, Perl, PHP, .NET, etc.).</p>
<p>The system has no central point of failure, so it&#8217;s extremely fault tolerant, and the data and related metadata are distributed throughout the system, so there are no central scalability constraints. You can store arbitrary amounts of data in the system in both large and small sizes. It performs very well, even under very high levels of concurrency. It keeps multiple replicas of each object, so it&#8217;s reliable, and the storage is very durable, without any expensive hardware. You don&#8217;t need any RAID on any of the servers unless you want it for additional performance.</p>
<h3>Use OpenStack Object Storage For&#8230;</h3>
<p>Here are some good use cases for OpenStack Object Storage:</p>
<ul>
<li>Storing media libraries (photos, music, videos, etc.)</li>
<li>Archiving video surveillance files</li>
<li>Archiving phone call audio recordings</li>
<li>Archiving compressed log  files</li>
<li>Archiving backups (&lt;5GB each object)</li>
<li>Storing and loading of OS Images, etc.</li>
<li>Storing file populations that grow continuously on a  practically infinite basis.</li>
<li>Storing small files (&lt;50 KB). OpenStack Object Storage is great at this.</li>
<li>Storing billions of files.</li>
<li>Storing Petabytes (millions of Gigabytes) of data.</li>
</ul>
<h3>Recognize the Limitations</h3>
<p><strong>Objects must be &lt;5GB</strong></p>
<p>This is an arbitrary size limit, but it can not be set to an unlimited value because of the system design.  If you want to store a backup something larger than 5GB, you&#8217;ll need to   have a way of breaking it up into chunks, and storing some manifest of   the parts so you can later join them back together again when you want   to download the data and use it again.</p>
<p><strong>Not a Filesystem</strong></p>
<p>Uses a REST API, or a language binding that consumes the REST API. It does not use the typical POSIX filesystem semantics like open(), read(), write(), seek(), and close().</p>
<p><strong>No User Quotas</strong></p>
<p>There are no maximums that can be configured on a per-user basis to limit how much storage is used.</p>
<p><strong>No Directory Hierarchies</strong></p>
<p>You can create an arbitrary number of containers, but there is no nested container capability. You can simulate a directory structure using creative object names, but this is limited to a maximum string length. If you only need a shallow hierarchy, or don&#8217;t have long directory names, this might be fine. Just remember that I warned you this is generally a bad idea.</p>
<p><strong>No writing to a byte offset in a file</strong></p>
<p>The only way to update a file is to essentially overwrite it. The system creates a new version of an object each time you upload one with the same name.</p>
<p><strong>No ACL&#8217;s</strong></p>
<p>Per-Container ACL&#8217;s will probably be added in a later release. Per-Object ACL&#8217;s will probably not be supported, but maybe.</p>
<p><strong>No Append Support</strong></p>
<p>It&#8217;s possible that this may be added at a later time using a versioning trick.</p>
<p><strong>No File Locking</strong></p>
<p>Most filesystems integrate with the kernel to offer advisory locking. This is not possible with OpenStack Object Storage.</p>
<p><strong>Eventual Consistency</strong></p>
<p>Don&#8217;t expect version consistency between multiple nodes when data is being updated.</p>
<p>If you upload a new version of an object, and immediately GET that object from another client, you may get a previous version of the file. There is no way to know which version of a given object the system is responding with, unless you set version metadata on each object yourself. If there is any problem with the network, you may get outdated versions of objects, or be able to see objects that were deleted, but the local node may not yet know are deleted.</p>
<p><strong>No Support for Data Encryption</strong></p>
<p>You must encrypt the data yourself. The current version does not have SSL support either. Use an SSL proxy to work around this by terminating the SSL sessions on the same network where the OpenStack Object Storage system runs.</p>
<p><strong>Not Compatible With Web Browsers</strong></p>
<p>You must supply a storage token header to authorize each request. Regular web browsers can&#8217;t do this. This can be solved using a proxy between the client and the system to handle token authentication. This is not a problem is you are using one of the language bindings. They will take care of this when you integrate your web app with the system.</p>
<p><strong>Not a Database</strong></p>
<p>It supports no querying or processing of data on the servers. All you can do is list the objects within a given container. There is no way to search based on object metadata. You need to keep your own external search indexes.</p>
<p><strong>Don&#8217;t try to frequently update large objects.</strong></p>
<p>All updates produce a new version of an object, because objects are <a href="http://en.wikipedia.org/wiki/Immutable_object" target="_blank">immutable</a>.</p>
<p><strong>Don&#8217;t store unlimited objects per container</strong></p>
<p>You can store as many objects in a container as you wish. However, your per-object upload latency will increase considerably one you reach a certain point. I found the optimal number of objects per container to be just under one million. This number will vary depending on your equipment, and how heavy of a workload it&#8217;s subjected to.</p>
<p><strong>Changing Swift Into a Filesystem</strong></p>
<p>You might think of using FUSE to access objects and containers in OpenStack Object Storage as files and directories with a filesystem interface, but you&#8217;ll quickly discover that this is only really good for very simple use cases. Most of the things you need to implement what we think of as a filesystem are missing.</p>
<p>If you are a developer, and you are thinking of building a filesystem on top of OpenStack Object Storage using objects as blocks, that could possibly work, but would probably not perform very well compared to existing alternatives that are actually designed for distributed block storage. The blocks would need to be pretty large to keep the network/protocol overhead down. Frequent writing is not likely to work well. Most users of filesystems are not expecting eventual consistency behavior. They want strong data consistency. You would also want some strategy to handle read/write concurrency with some locking capability. Plus, you would need to have a way to keep track of the blocks like a filesystem does in some data structure or database. Frankly speaking, OpenStack Object Storage is probably not the right tool for the job.</p>
<p><strong>Conclusion</strong></p>
<p>You should probably only use OpenStack Object Storage for use cases it&#8217;s intended for. If what you really want is a clustered filesystem, you&#8217;re probably better off looking at other solutions like <a href="http://en.wikipedia.org/wiki/Lustre_%28file_system%29" target="_blank">Lustre</a>, <a href="http://en.wikipedia.org/wiki/GlusterFS" target="_blank">GlusterFS</a>, <a href="http://en.wikipedia.org/wiki/Global_File_System">GFS</a>, <a href="http://en.wikipedia.org/wiki/OCFS" target="_blank">OCFS</a>, etc. Keep in mind that each of these have their own strengths and weaknesses. Pay particular attention to what they are designed for, and use them accordingly. If you want to use OpenStack Object Storage for something that it was designed for, then you will probably be <strong>very happy with it</strong>. Keep in mind that it&#8217;s a blob storage system. It&#8217;s not a filesystem, not a file server, not a database, etc. To learn more about OpenStack Object Storage, please check out the <a href="http://swift.openstack.org/" target="_blank">Developer Documentation</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/openstack-os-is-great-for/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Rackspace and NASA Contribute Huge to Open Source</title>
		<link>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/</link>
		<comments>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 17:00:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OpenStack]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=319</guid>
		<description><![CDATA[Today Rackspace and NASA announced OpenStack as a coordinated open development project with 28 participating partner companies and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called Nebula. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nasa.gov"><img class="size-full wp-image-324 alignright" title="NASA" src="http://cdn.adrianotto.com/wp-content/uploads/2010/07/nasa-logo.png" alt="" width="155" height="100" /></a><a href="http://www.rackspace.com"><img class="size-full wp-image-327 alignright" title="Rackspace" src="http://cdn.adrianotto.com/wp-content/uploads/2010/07/rackspace-logo.png" alt="" width="150" height="100" /></a>Today <a href="http://www.rackspace.com" target="_blank">Rackspace</a> and <a href="http://www.nasa.gov">NASA</a> <a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489" target="_blank">announced</a> <a href="http://www.openstack.org" target="_blank">OpenStack</a> as a coordinated open development project with <a href="http://openstack.org/community/">28 participating partner companies</a> and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called <a href="http://nebula.nasa.gov/services/" target="_blank">Nebula</a>. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API for Cloud Servers, which was previously released with a Creative Commons open license will be used by OpenStack. I was a key contributor to the design of that API, and I&#8217;m honored to have been a part of it. Rackspace has vowed a <a href="http://openstack.org/blog/">commitment to open development</a> of this platform.</p>
<p>This is very exciting for consumers of Cloud Computing because:</p>
<ol>
<li>It allows individual companies to run their own clouds inside their own data centers and on their own equipment using the same scalable technology that powers some of the largest cloud infrastructures in the world.</li>
<li>An individual company can develop applications on their own cloud, and know with confidence that they can run their application on a number of different public clouds without any special adapter software. They need only find a cloud computing or cloud storage provider that uses OpenStack software to ensure compatibility.</li>
<li>If an application is hosted on one OpenStack public cloud, it can be easily moved to another without changes to the application source code, and without using any cloud middleware. This can completely eliminate all fears relating to single-vendor lock-in.</li>
<li>Applications can be run simultaneously in multiple clouds using the exact same software that needs to only implement a single API for universal access to computing and object storage resources.</li>
</ol>
<p><a href="http://www.openstack.org"><img class="alignnone" title="OpenStack" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-openstackorg.png" alt="" width="280" height="61" /></a><a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489"> <img class="alignnone" title="Press Release" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-pressrelease.png" alt="" width="280" height="61" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cassandra Gets Promoted!</title>
		<link>http://adrianotto.com/2010/03/cassandra-gets-promoted/</link>
		<comments>http://adrianotto.com/2010/03/cassandra-gets-promoted/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 07:00:20 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=287</guid>
		<description><![CDATA[Today it&#8217;s the one month anniversary of Cassandra graduating to a top level Apache project. It now has a new and improved project URL: http://cassandra.apache.org Recently you may have noticed my writing about Drizzle, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cassandra.apache.org"><img class="alignright size-full wp-image-225" title="cassandra" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/cassandra1.png" alt="" width="186" height="101" /></a>Today it&#8217;s the one month anniversary of Cassandra <a href="http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html" target="_blank">graduating</a> to a top level Apache project. It now has a new and improved project URL:<a href="http://cassandra.apache.org" target="_blank"> http://cassandra.apache.org</a></p>
<p>Recently you may have noticed <a href="http://www.rackspacecloud.com/blog/2010/03/13/rackspace-and-drizzle-its-time-to-rethink-everything/" target="_blank">my writing about Drizzle</a>, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with the same <a href="http://www.rackspacecloud.com" target="_blank">company</a> sponsoring both projects.</p>
<p><a href="http://drizzle.org">Drizzle</a> is the way to go if you want an SQL system, and <a href="http://cassandra.apache.org" target="_blank">Cassandra</a> is the way to go if you have a huge data set or if you have a data insert/update rate that&#8217;s too high for and RDBMS to keep up with.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/cassandra-gets-promoted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bandwidth != Network Performance</title>
		<link>http://adrianotto.com/2010/03/bandwidth-network-performance/</link>
		<comments>http://adrianotto.com/2010/03/bandwidth-network-performance/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 17:34:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=237</guid>
		<description><![CDATA[You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg"><img class="alignright size-full wp-image-302" title="rj45" src="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg" alt="" width="240" height="240" /></a>You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This post explains why.</p>
<p>First of all, let&#8217;s review some definitions:</p>
<ul>
<li><strong>Bandwidth</strong>: The amount of data that can be passed along a communications channel in a given period of time.</li>
<li><strong>Latency</strong>: The time it takes for a packet to cross a network connection, from sender to receiver.</li>
<li><strong>Speed</strong>: Fast and rapid moving, going, traveling, proceeding, or performing; swiftness.</li>
<li><strong>Throughput</strong>: The quantity data transmitted by a computer network over a given period of time.</li>
</ul>
<p>Now, all of these terms are related, and I want to highlight some of the minutia here:</p>
<p><strong>Bandwidth</strong></p>
<p>The higher the bandwidth is on a network connection, the more data it&#8217;s capable of transmitting in a given period of time. Higher bandwidth is better.</p>
<p><strong>Latency</strong></p>
<p>This is very very important, because latency effectively limits the amount of bandwidth you can consume if you are using a synchronous data transmission, like a TCP/IP download. Lower latency is better, and will yield faster speed.</p>
<p><strong>Throughput</strong></p>
<p>Throughput is another way of expressing speed. The higher the throughput, the faster your network communications will be. Note that your maximum possible throughput is your bandwidth. Actual throughput is equal to or less than your bandwidth.</p>
<p><strong>Speed</strong></p>
<p>If your network is high speed, you should observe high bandwidth, low latency, and high throughput.</p>
<h3>Latency and Bandwidth are Inversely Proportional</h3>
<p>For TCP/IP transmissions, the higher your latency is, the lower your throughput will be. Let&#8217;s explore why. The most common use of TCP/IP is for the web, which uses the HTTP protocol. HTTP works by making a TCP/IP connection to a remote server, issuing a request for a document, and then receiving the response. The protocol is text based. A simple HTTP transmission is illustrated below.</p>
<p>Client Request:</p>
<pre>GET / HTTP/1.1
User-Agent: Wget
Host: www.example.com
</pre>
<p>Server Response:</p>
<pre>HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT
ETag: "b300b4-1b6-4059a80bfd280"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Connection: Keep-Alive
Date: Wed, 18 Nov 2009 22:36:34 GMT
Age: 1010
Content-Length: 438

  Example Web Page

You have reached this web page by typing "example.com",
"example.net",
  or "example.org" into your web browser.

These domain names are reserved for use in documentation and are not available
  for registration. See &amp;lta href="http://www.rfc-editor.org/rfc/rfc2606.txt"&gt;RFC
  2606&lt;/a&gt;, Section 3.
</pre>
<p>Here is a trace of the TCP/IP packets that make up that request:</p>
<pre>14:57:47.146665 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: S 3717672264:3717672264(0) win 5840
14:57:47.220092 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 1 win 183
14:57:47.220309 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: P 1:123(122) ack 1 win 183  (GET Request)
14:57:47.300962 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: P 1:728(727) ack 123 win 4502  (200 OK Response)
14:57:47.300993 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 728 win 228
14:57:47.302035 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: F 123:123(0) ack 728 win 228
14:57:47.375475 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: . ack 124 win 4502
14:57:47.375499 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: F 728:728(0) ack 124 win 4502
14:57:47.375510 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 729 win 228
</pre>
<p>Notice that there are 10 packets in the above trace. It&#8217;s a three way handshake to set up the TCP session, then a round trip to send the data, then two more round trips to close down the connection. Each time the server receives a packet from the client, the connection may wait in the server&#8217;s connection queue to be processed, which can further increase the interactive protocol latency. Consider the impact of high latency on a connection like this. Suppose that it takes 0.2 seconds for each round trip. That connection would have a total throughput of 727 bytes downloaded in 0.8 seconds. That&#8217;s a rate of 909 Bytes/sec. Maybe your internet connection is 15 Mb/sec. bandwidth did not matter. Latency caused the throughput to be low.</p>
<p>Now, you might be wondering why we can&#8217;t just improve networking technology to make latency lower. We can, but that&#8217;s not going to help much, because we are still bounded by the speed of light, among other factors. <strong>The speed of light is slow when you consider the distance it has to travel to cross continents on the earth.</strong> Let&#8217;s look at some match to explain that:</p>
<ul>
<li>The speed of light in vacuum is 299,792,458 m/s.</li>
<li>The speed of light in fiber optic cable is ~200,000,000 m/s.</li>
<li>The distance from Anaheim, CA to New York is 4,494,898 meters</li>
<li>The one-way latency to New York is  4,494,898 / 200,000,000 = 22.47ms</li>
<li>The round-trip time between Anaheim, CA and New York is 44.95ms</li>
<li>The current ping time from Anaheim, CA to New York is 72 ms</li>
<pre>Tracing the route to sl-gw33-nyc.sprintlink.net (144.228.243.82)
  1 sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 0 msec
    sl-crs2-ana-0-14-2-0.sprintlink.net (144.232.11.11) 0 msec
    sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 4 msec
  2 sl-crs2-fw-0-13-3-0.sprintlink.net (144.232.19.197) 28 msec
    sl-crs2-fw-0-9-5-0.sprintlink.net (144.232.20.130) 28 msec
    sl-crs1-fw-0-3-3-0.sprintlink.net (144.232.9.65) 28 msec
  3 sl-crs2-kc-0-0-0-2.sprintlink.net (144.232.19.141) 40 msec
    144.232.20.57 40 msec
    sl-crs1-kc-0-5-5-0.sprintlink.net (144.232.24.9) 40 msec
  4 sl-crs2-chi-0-13-5-0.sprintlink.net (144.232.20.109) 52 msec
    sl-crs1-chi-0-1-0-3.sprintlink.net (144.232.18.214) 56 msec
    sl-crs2-chi-0-15-2-0.sprintlink.net (144.232.24.206) 52 msec
  5 sl-crs1-nyc-0-8-0-3.sprintlink.net (144.232.18.123) 72 msec
    sl-crs2-nyc-0-8-0-1.sprintlink.net (144.232.20.119) 72 msec
    sl-crs1-chi-0-10-3-0.sprintlink.net (144.232.9.148) 72 msec
  6 sl-gw33-nyc-14-0-0.sprintlink.net (144.232.6.56) 72 msec *
    sl-gw33-nyc-15-0-0.sprintlink.net (144.232.6.58) 72 msec
</pre>
</ul>
<p>This round trip time includes all of the switching and routing to get the packet through its full round trip. That means that even if all switching and routing were instantaneous, and we had a perfectly straight fiber path between all points on the earth, that we could only reduce latency by about 40%. We can not accelerate the speed of light, so without a significant advance in data transmission technology (perhaps a quantum physics approach) we must accept the speed of light as a performance boundary.</p>
<h3>Making Web Sites Faster</h3>
<p>If you&#8217;re a web content publisher, you can set up your systems to work around these natural limitations. One way to make interactive web performance faster is to place copies of your data in various geographic locations that are physically closer to your end users. Using a <a href="http://en.wikipedia.org/wiki/Content_delivery_network" target="_blank">CDN</a> for your media content is one way to do this. You can also make your web server as fast as possible so that your dynamically generated content can be processed as quickly as possible. Using <a href="http://memcached.org/" target="_blank">memcached</a> to speed up your web application can help. Also, take a look at some <a href="http://developer.yahoo.com/performance/rules.html" target="_blank">best practices</a> for web developers for good performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/bandwidth-network-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CPU Time stolen from a virtual machine?</title>
		<link>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/</link>
		<comments>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:42:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=258</guid>
		<description><![CDATA[Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;Time stolen from a virtual machine&#8220;. More specifically: It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;<em>Time stolen from a virtual machine</em>&#8220;. More specifically:</p>
<p>It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.</p>
<p>There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that&#8217;s got a very low cpu workload on it:</p>
<pre>
vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7
</pre>
<p>As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the <em>id</em> and <em>st</em> columns.</p>
<p>In this example, I really don&#8217;t care that I have a nonzero <em>st</em> column because my workload is basically idle all the time anyway.</p>
<p>If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 4/37 queries in 0.065 seconds using disk: basic
Object Caching 618/694 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-05-18 14:13:12 -->
