<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog &#187; Cloud</title>
	<atom:link href="http://adrianotto.com/tag/cloud/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Sun, 08 Apr 2012 00:02:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Maximizing Elasticity in the Cloud</title>
		<link>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/</link>
		<comments>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 14:35:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=571</guid>
		<description><![CDATA[Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more [...]]]></description>
			<content:encoded><![CDATA[<p>Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more capacity, you can add more servers, and when they are not needed anymore, you simply turn them back off. You only pay for the time those servers were running, so it&#8217;s more economic than having a large number of servers deployed all the time.</p>
<p>Most simple web clusters rely on a single database sever that all the application servers connect to. This way, all of the application servers have concurrent access to the same data. This can be problematic in the elastic use case when workloads increase, and more servers are added to the cluster. If the work is bottle-necked on storing or accessing data in the database server, adding additional application servers will not help. It will actually make the problem worse.</p>
<p>I spoke on a panel at Zendcon yesterday, which was covered in an <a title="Infoworld Article" href="http://www.infoworld.com/d/cloud-computing/security-remains-top-concern-cloud-app-builders-176707" target="_blank">Infoworld article</a> where my remarks were published. The article says:</p>
<blockquote><p>Panelists also debated use of SQL and database connectivity in clouds. SQL as a design pattern for storage &#8220;is not ideal for cloud applications,&#8221; said Adrian Otto, senior technical strategist for Rackspace Cloud. Afterward, he described SQL issues as &#8220;typically the No. 1 bottleneck&#8221; to elasticity in the cloud. With elasticity, applications use more or fewer application servers based on demand. Otto recommended that developers who want elasticity should have a decentralized data model that scales horizontally. &#8220;SQL itself isn&#8217;t the problem. The problem is row-oriented data in an application,&#8221; which causes performance bottlenecks, said Otto.</p></blockquote>
<p>The author Paul Krill did a good job here of accurately reporting my position on this subject. Data stored in databases are arranged in tables of rows and columns. A new piece of data adds a new row. Each row has multiple columns that separate fields of a single record of data in the table. The truth is that most web applications work very well with this data design pattern. Those should continue to use SQL databases with row oriented data. However, there are some applications where data may be arranged differently to make reading the data more efficient.</p>
<p>If you have a big table of data, and you want to pull out just a little bit of it using a query, the database server must determine the location of that data in the table by consulting the table&#8217;s index, and return the desired portion that matches the constraints given in the query. This makes the reading of data relatively expensive from a computational perspective. If data were instead arranged in lots of columns instead, it could be retrieved more efficiently, and the data could be more easily distributed over a larger number of servers yielding the horizontal scalability that cloud applications want. This works very well in cases where the number of reads are very high, but the data is not updated very frequently in proportion to the reads.</p>
<p>Let&#8217;s use a blog application as an example. Blog posts are written once, and maybe updated a few times, possibly once each time a comment is submitted. However, on a busy web site, a blog post may be read millions of times. If the posts were stored in a column oriented storage system like <a title="Cassandra" href="http://cassandra.apache.org/" target="_blank">Cassandra</a>, they could be quickly and easily retrieved using the id number of the blog post. The listing of recent blog posts can also be arranged in a column so that the front page of the blog site with the listing of the articles can be generated. Using this approach requires that the data be properly arranged as it&#8217;s stored, putting the computational burden on the (infrequent) write rather than on the (frequent) read.</p>
<p>Using a distributed system to store data in columns allows the data to be evenly distributed over an arbitrary number of servers, eliminating the central data bottleneck. Adding more servers in the correct proportion of application servers and storage servers can result in true horizontal scalability, meaning that the capacity increases as a direct proportion of how many servers are in the cluster.</p>
<p>Why doesn&#8217;t everyone do this already? For some good reasons:</p>
<ol>
<li>The concept of running applications in clouds is still relatively new. The related technology is still maturing.</li>
<li>Existing software tends to use SQL already. If you want to use an existing CMS platform, chances are it will require a central SQL database.</li>
<li>Most heavy-read workloads can be scaled well using data caching techniques. If applications don&#8217;t write data very often, it may not be necessary to scale beyond a single database server.</li>
<li>You must anticipate exactly how the application will use the data, and arrange it just right.</li>
<li>It may be harder to analyze the data. Once your data is arranged in a column store, you may not be able to query it in arbitrary ways. You may only be able to pull it out using it&#8217;s id numbers, or by systematically scanning all of it to find the parts you want.</li>
<li>Distributed data storage (aka: NoSQL) systems like Cassandra, Hbase, Redis, etc. are complicated, and there is a considerable learning curve associated with setting them up and maintaining them. In some cases these systems are not as good in terms of data durability or data consistency as the prevailing SQL database systems. These tradeoffs can be difficult to navigate.</li>
<li>Today&#8217;s software developers are very familiar with SQL as a data storage and access paradigm. They can very quickly develop software that relies on the ACID qualities of a SQL database.</li>
</ol>
<p>If you have an application that you want to deploy into a cloud, and you want it to be very elastic, you should think about the subject of how you arrange your data. If you use a centralized data design, you will probably have scalability bottlenecks when you add lots of servers. You should aim to decentralize the data in a way that you can easily add more servers to horizontally scale the environment, and not stumble on the limits of the database server. This is particularly important in situations where you need the application to write a lot of data, and a cache is not a suitable solution for you.</p>
<p>Over time, the reasons why not to use column oriented data will begin to shrink, and better tools and services will make it easier to do. Until then, I suggest that you carefully consider if you need maximum elasticity. If not, then it&#8217;s perfectly appropriate to keep using the same centralized row-oriented data paradigm. Use a cache like memcached in cases where you have heavy reads, and when it&#8217;s acceptable to show slightly outdated information to readers. The truth is that traditional solutions work really well for most web applications. However, if you have one of the more unique situations where you need true horizontal scalability, take a good look at a different arrangement for your data, and the systems and tools to make that possible for you in the cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Drizzle is now BETA</title>
		<link>http://adrianotto.com/2010/09/drizzle-is-now-beta/</link>
		<comments>http://adrianotto.com/2010/09/drizzle-is-now-beta/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 19:21:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=376</guid>
		<description><![CDATA[Today Drizzle enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drizzle.org"><img class="alignright size-full wp-image-222" title="Drizzle Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/drizzle64.png" alt="" width="64" height="64" /></a>Today <a href="http://drizzle.org" target="_blank">Drizzle</a> enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate sponsors, including <a href="http://www.sun.com" target="_blank">Sun Microsystems</a> in the early days, and now <a href="http://www.rackspace.com/">Rackspace</a>. Most of the code is contributed by the <a href="https://launchpad.net/drizzle">developer community</a>, which is made up of of a very talented group of open source developers with core committers from four different companies. More about the Drizzle project:</p>
<h3>Charter</h3>
<ul>
<li>A database optimized for Cloud infrastructure and Web applications</li>
<li>Design for massive concurrency on modern multi-cpu architecture</li>
<li>Optimize memory for increased performance and parallelism</li>
<li>Open source, open community, open design</li>
</ul>
<h3>Scope</h3>
<ul>
<li>Re-designed modular architecture providing plugins with defined APIs</li>
<li>Simple design for ease of use and administration</li>
<li>Reliable, ACID transactional</li>
</ul>
<p>There are many exciting changes, such as optimizing everything for 64-bit CPU&#8217;s and Multi-Core. You can&#8217;t hardly even buy 32-bit and Single Core servers nowadays if you want them. It makes no sense to have software that&#8217;s optimized for these antiquated hardware designs. No effort is spent optimizing software to work with rotational hard drives because SSD drives are the way of the future. All the language collations have simply been replaced with UTF-8 only, because the web uses UTF-8. Plus, this is tested with 41 different language translations. Drizzle has a new scheduler. The legacy MySQL scheduler was designed to work for a thread-per-session setup. In Drizzle, sessions are handled independently from the threads. The new scheduler allows this to work.</p>
<p>Drizzle uses InnoDB as its default storage engine, which is great for OLTP. It also supports the <a href="http://www.primebase.org/" target="_blank">PBXT</a> storage engine. There are available plugins for the InnoDB Embedded Engine and <a href="http://www.haildb.com/" target="_blank">HailDB</a> which will soon be the new default. DDL Operations (like ALTER TABLE) can actually roll back in the event that something goes wrong in the process, rather than leaving you with incomplete or corrupt data.</p>
<p>The code base in Drizzle has been fully modernized, and brought up to today&#8217;s standards of C++ with extensive use of the <a href="http://en.wikipedia.org/wiki/Standard_Template_Library">C++ STL</a> to replace MySQL&#8217;s usage of obscure custom data type implementations that offered no real benefit compared to what the STL has today. Another example of improvements in this area is the replacement of the legacy REGEX implementation with a more standard library. All of these changes reduce the amount of Drizzle source code dramatically compared to MySQL. Less code and simpler code means less bugs, plain and simple. Drizzle is well on its way to being an ideal fit for web applications that need a reliable, and high performance transactional database.</p>
<h3>Features in Drizzle7 Beta</h3>
<ul>
<li>New micro kernel</li>
<li>Migration Tool</li>
<li>Instance Catalog Support</li>
<li>Universal Replication</li>
<li>User query analysis</li>
<li>Mutli-core Support</li>
</ul>
<h3>What &#8220;Beta&#8221; means</h3>
<ul>
<li>Your data is safe. Transactional engine by default and stable for over 2 years.</li>
<li>Upgrade the system in-place without exporting/importing data.</li>
<li>Replication is still being tested.</li>
</ul>
<p>In Microsoft terms, it means that this project would have launched about a year ago. In Google terms, it probably would have launched six months ago. Simply put, if you trust your data to a MySQL system today running InnoDB, you should feel comfortable trying Drizzle. There have been some changes to the InnoDB setup, such as the elimination of the FRM files from disk which eliminate possible inconsistency between the state on disk and the state in InnoDB. I am in the process of moving a few of my produciton applications to use the Drizzle Beta. If you&#8217;re an accomplished system administrator and DBA, you should seriously consider putting at least one of your production applications on Drizzle now, and see how it works for you.</p>
<h3>What&#8217;s Next?</h3>
<ul>
<li>Beta <a href="https://launchpad.net/drizzle/+announcement/6840" target="_blank">announced today 2010-09-29</a>.</li>
<li>GA February 2011</li>
<li>GA May 2011 for Multi-Tenancy features that allow an arbitrary number of logical databases (Schemas, Tables, etc.) to exist concurrently with full data isolation between them. This allows for individual security and resource controls (Threads, Memory, IO), and individual database backups, rather than system level backups. This feature will be called &#8220;Catalogs&#8221;.</li>
</ul>
<h3>Download Drizzle</h3>
<p>Time to get started with the beta. Download <a href="https://launchpad.net/drizzle/elliott/2010-09-27">the beta</a> today!</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/drizzle-is-now-beta/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Cassandra Gets Promoted!</title>
		<link>http://adrianotto.com/2010/03/cassandra-gets-promoted/</link>
		<comments>http://adrianotto.com/2010/03/cassandra-gets-promoted/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 07:00:20 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=287</guid>
		<description><![CDATA[Today it&#8217;s the one month anniversary of Cassandra graduating to a top level Apache project. It now has a new and improved project URL: http://cassandra.apache.org Recently you may have noticed my writing about Drizzle, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cassandra.apache.org"><img class="alignright size-full wp-image-225" title="cassandra" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/cassandra1.png" alt="" width="186" height="101" /></a>Today it&#8217;s the one month anniversary of Cassandra <a href="http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01518.html" target="_blank">graduating</a> to a top level Apache project. It now has a new and improved project URL:<a href="http://cassandra.apache.org" target="_blank"> http://cassandra.apache.org</a></p>
<p>Recently you may have noticed <a href="http://www.rackspacecloud.com/blog/2010/03/13/rackspace-and-drizzle-its-time-to-rethink-everything/" target="_blank">my writing about Drizzle</a>, but that&#8217;s not the only database system I love. I&#8217;m also a fan of Cassandra, and I&#8217;m proud to work with the same <a href="http://www.rackspacecloud.com" target="_blank">company</a> sponsoring both projects.</p>
<p><a href="http://drizzle.org">Drizzle</a> is the way to go if you want an SQL system, and <a href="http://cassandra.apache.org" target="_blank">Cassandra</a> is the way to go if you have a huge data set or if you have a data insert/update rate that&#8217;s too high for and RDBMS to keep up with.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/cassandra-gets-promoted/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CPU Time stolen from a virtual machine?</title>
		<link>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/</link>
		<comments>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:42:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=258</guid>
		<description><![CDATA[Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;Time stolen from a virtual machine&#8220;. More specifically: It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;<em>Time stolen from a virtual machine</em>&#8220;. More specifically:</p>
<p>It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.</p>
<p>There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that&#8217;s got a very low cpu workload on it:</p>
<pre>
vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7
</pre>
<p>As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the <em>id</em> and <em>st</em> columns.</p>
<p>In this example, I really don&#8217;t care that I have a nonzero <em>st</em> column because my workload is basically idle all the time anyway.</p>
<p>If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Writing Code That Scales</title>
		<link>http://adrianotto.com/2009/11/writing-code-that-scales/</link>
		<comments>http://adrianotto.com/2009/11/writing-code-that-scales/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 16:38:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=234</guid>
		<description><![CDATA[Check my post from today on the Rackspace Cloud blog. It covers several tips on planning ahead when writing a web-scale application.]]></description>
			<content:encoded><![CDATA[<p>Check my <a href="http://www.rackspacecloud.com/blog/2009/11/18/writing-code-that-scales/" target="_blank">post from today</a> on the Rackspace Cloud blog. It covers several tips on planning ahead when writing a web-scale application.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/writing-code-that-scales/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Remus Project: Full Memory Mirroring!</title>
		<link>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/</link>
		<comments>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 22:30:10 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[VoIP]]></category>
		<category><![CDATA[Remus]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=163</guid>
		<description><![CDATA[Imagine that you have a cluster with two machines side by side in an active/standby configuration. Let&#8217;s say you have your data replicated, and the systems are basically identical except for the IP address and hostname. You can use heartbeat to share an IP address such that if the primary fails, the secondary takes over. [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-166" title="Mirrored Servers" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/server-mirror.jpg" alt="Mirrored Servers" width="130" height="90" />Imagine that you have a cluster with two machines side by side in an active/standby configuration. Let&#8217;s say you have your data replicated, and the systems are basically identical except for the IP address and hostname. You can use heartbeat to share an IP address such that if the primary fails, the secondary takes over. You can also perform the equivalent using &#8220;live migration&#8221; features in a Xen or VMWare hypervisor. The problem with these sorts of fail-overs is that any active TCP/IP sessions end up getting broken, and new connections must be established between clients and the application.</p>
<p>Okay, here&#8217;s something that fixes that problem: the <a href="http://dsg.cs.ubc.ca/remus/" target="_blank">Remus Project</a>. The approach is brilliant. On regular intervals it ships the changed memory registers from one host to the other. Memory reading does not need to be replicated, only writes, and writes to the same location don&#8217;t all need to be replicated, only the most recent write. The primary node simply delays its response to TCP/IP packets (output buffering) until after it has confirmed that the standby node has received the replicated memory data. Very very clever.</p>
<p>Here are the key features listed on the Remus web site:</p>
<ul>
<li>The backup VM is an <em>exact copy</em> of the primary VM. When     failure happens, it continues running on the backup host as if     failure had never occurred.</li>
<li>The backup is <em>completely up-to-date</em>. Even active TCP     sessions are maintained without interruption.</li>
<li>Protection is <em>transparent</em>. Existing guests can be     protected without modifying them in any way.</li>
</ul>
<p><a href="http://www.xen.org/"><img class="alignright size-full wp-image-170" title="Xen Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/xen_logo.gif" alt="Xen Logo" width="149" height="67" /></a>Okay, I&#8217;ve been running HA systems in multiple geographies now for about a decade. I&#8217;ve experimented with lots and lots of clustering and replication technology. Most of the time when I hear about something new, I cringe and wonder if it&#8217;s just another thing that&#8217;s using the same old tricks I&#8217;ve been using for years, or if its something truly innovative and truly <a href="http://en.wikipedia.org/wiki/Open_source" target="_blank">open source</a>. Before you go making comments that VMWare has this feature or that feature, relax. This post is not about VMWare. It&#8217;s about open source Xen.</p>
<p>Now, you might already be wondering if this would work if you separated the two nodes to run in separate locations. The short answer is maybe. You would still need a very clever network configuration to re-route your traffic dynamically to the new location. For those of us that do operate our own Autonomous Systems, that may seem possible with a BGP route update. But here&#8217;s the bummer&#8230; The additional latency it would introduce would bring your performance to a screeching halt. You could probably afford to have about 25ms of average latency between two locations and get away with it. The cut-over would still be better than nothing, but you&#8217;d better have a rock solid network in there, and you&#8217;d better be ready to pump lots of bandwidth over it. Plan for 100Mb/sec if you checkpoint every 100ms.</p>
<p><a href="http://www.memcached.org/"><img class="size-full wp-image-164 alignright" style="margin-left: 10px; margin-right: 10px;" title="memcached logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/memcache_logo.png" alt="memcache_logo" hspace="10" width="76" height="75" /></a>This would be great for a high read application like a web cache, or some <a href="http://www.memcached.org" target="_blank">memcached</a> applications. People ask on the memcached mailing list all the time how they can set up replication and HA. The answer is always &#8220;it&#8217;s a cache&#8230; not a database.&#8221;. Well, for those of you that want to do HA for a memcached system, give Remus a try.</p>
<p><img class="alignright size-full wp-image-174" title="trixbox logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/trixbox_logo.png" alt="trixbox logo" />Let&#8217;s not stop there. Imagine you have a SIP call control platform or <a href="http://www.trixbox.org/" target="_blank">Trixbox</a> system, and you don&#8217;t want to lose all your active calls in the event of a system crash? Pretty much any mission critical application that supports long running connections over TCP/IP</p>
<p>Remus has been around for some time, so why am I so excited now? It&#8217;s now part of <a href="http://www.xen.org" target="_blank">Xen</a>! You don&#8217;t need to do anything special on the master or slave node to use it! Whoot! Now I&#8217;m impressed. Anyone out there have experience running it? I&#8217;d love to hear your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Advice for backing up your Macs</title>
		<link>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/</link>
		<comments>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 06:23:21 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[OSX]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=149</guid>
		<description><![CDATA[My wife asked me today if I could give a colleague some advice for how to backup a bunch of Macs. I&#8217;ll share my advice for you here. Over the past two decades I&#8217;ve used so many different backup systems and software and hardware combinations, I can&#8217;t even count them all. So this begs the [...]]]></description>
			<content:encoded><![CDATA[<p>My wife asked me today if I could give a colleague some advice for how to backup a bunch of Macs. I&#8217;ll share my advice for you here. Over the past two decades I&#8217;ve used so many different backup systems and software and hardware combinations, I can&#8217;t even count them all. So this begs the question, what do I do at home?</p>
<p><a href="http://www.apple.com/findouthow/mac/#timemachinebasics"><img class="size-full wp-image-474 alignleft" style="margin-left: 10px; margin-right: 10px;" title="Time Machine" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/hero_timemachine_lg.jpg" alt="" width="77" height="77" /></a></p>
<p>I use the <a href="http://www.apple.com/macosx/what-is-macosx/time-machine.html">TimeMachine</a> software built into Leopard (and newer) <a href="http://www.apple.com/macosx/" target="_blank">OSX</a>. I use a locally connected USB2. A Firewire drive would also be good. Here is a drive that I like because it has lots of capacity, reasonably affordable, compact, and runs quietly.</p>
<p><a href="http://www.buy.com/prod/fantom-greendrive-1tb-usb-2-0-and-esata-external-hard-drive-2-year/q/loc/101/208503758.html" target="_blank"><img class="alignright size-full wp-image-154" title="Fantom Drive" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/FantomDrive.png" alt="Fantom Drive" width="163" height="190" />Fantom GreenDrive Pro 2TB eSATA and USB 2.0 7200RPM 32MB External Hard Drive</a></p>
<p>Another that&#8217;s half the capacity, but cheaper:</p>
<p><a href="http://www.buy.com/prod/fantom-greendrive-1tb-usb-2-0-and-esata-external-hard-drive-2-year/q/loc/101/208503758.html" target="_blank">Fantom GreenDrive 1TB USB 2.0 and eSATA External Hard Drive</a></p>
<p>Now, for home use a 2TB drive is probably enough for all your computers. At first I networked them all together to use just one drive on one of my computers shared to all the others so that all the backups were on the one big drive. I later decided that every computer should have it&#8217;s own drive for backups. Why? A few reasons:</p>
<ol>
<li>To conserve electricity. When you are using the computer is when the backup snapshots should be taken and archived. When the computer is asleep, may not respond over the network depending on how it&#8217;s set up, meaning you need to keep that host machine powered up all the time wasting electricity.</li>
<li>Each computer does its backups when they get used, and in the idle time before they fall asleep again. It works much better for me this way.</li>
<li>Immediate restores. Having a local drive on each computer makes restoration super fast. It&#8217;s not like a network or tape backup where you need to wait for your data to transfer back on to your hard drive to begin using it.</li>
</ol>
<p>It&#8217;s easy to set up Time Machine. Connect the drive, open &#8220;Time Machine Preferences&#8221; and select the drive.</p>
<p>I re-initialized mine using the disk utility first so that it had a journaled MacOS filesystem on it instead of the default FAT partitioning that comes from the factory.</p>
<p>One really nice thing about Time Machine is that you can easily revert to a prior point in time in the event you accidentally mess something up, get a virus, or whatever. It&#8217;s about the easiest tool I&#8217;ve ever used. it automatically rotates backups hourly, daily, weekly, etc and deletes old backups automatically to make room for new ones. It&#8217;s totally automatic whereas with other tools you need to set that all up yourself.</p>
<p>This sort of local backup does not help if your house or office gets burglarized or burns down because you lose both the primary and backup copy of the data.</p>
<p><img class="alignright size-full wp-image-156" title="Jungle Disk" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/jd-logo.png" alt="Jungle Disk" width="222" height="50" />Another option is to use <a href="http://www.jungledisk.com/" target="_blank">JungleDisk</a> to back your data up to the cloud. That has the advantage of only paying for the storage you actually use, the backups are off site, so if you have theft or fire, you can still restore, potentially somewhere else. A disadvantage is that it requires adequate internet connectivity. Your upload speed needs to be fast enough to accommodate all of the data you produce within each backup interval. If your network is already constrained on available bandwidth, running backups over it could potentially aggravate matters. In short, if you have a big fat internet connection, then use <a href="http://www.jungledisk.com/" target="_blank">JungleDisk</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coding in the Cloud</title>
		<link>http://adrianotto.com/2009/09/coding-in-the-cloud/</link>
		<comments>http://adrianotto.com/2009/09/coding-in-the-cloud/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 07:00:35 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[best practices]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=22</guid>
		<description><![CDATA[I have been writing a 10-part series on the Rackspace Cloud Blog. I&#8217;ll be keeping a running list of the posts here as they are published. Rule 1 &#8211; Cache is Your Friend Rule 2 &#8211; Don’t write to the database in real time Rule 3 &#8211; Use a “Stateless” design whenever possible Rule 4 [...]]]></description>
			<content:encoded><![CDATA[<p>I have been writing a 10-part series on the <a href="http://www.rackspacecloud.com/blog/">Rackspace Cloud Blog</a>. I&#8217;ll be keeping a running list of the posts here as they are published.</p>
<p><a href="http://www.rackspacecloud.com/blog/2009/06/coding-in-the-cloud-%e2%80%93-rule-1-cache-is-your-friend/">Rule 1 &#8211; Cache is Your Friend</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-2-dont-write-to-the-database-in-real-time/">Rule 2 &#8211; Don’t write to the database in real time</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-3-use-a-stateless-design-whenever-possible/">Rule 3 &#8211; Use a “Stateless” design whenever possible</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/coding-in-the-cloud-rule-4-avoid-external-dependencies/" target="_blank">Rule 4 &#8211; Avoid Unnecessary External Dependencies</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/19/coding-in-the-cloud-rule-5-cms-plugins/" target="_blank">Rule 5 &#8211; CMS Plugins</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/09/22/coding-in-the-cloud-rule-6-http-includes/" target="_blank">Rule 6 &#8211; HTTP Includes</a></p>
<p>Rule 7 &#8211; Coming Soon</p>
<p>Rule 8 &#8211; Coming Later</p>
<p>Rule 9 &#8211; Coming Later</p>
<p>Rule 10 &#8211; Coming Later</p>
<p>Yep, if you follow all 10 of the rules, you&#8217;ll probably have a really good cloud based web app.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/09/coding-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Patch for memcached on public network</title>
		<link>http://adrianotto.com/2009/08/patch-for-memcached-on-public-network/</link>
		<comments>http://adrianotto.com/2009/08/patch-for-memcached-on-public-network/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 18:08:51 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=104</guid>
		<description><![CDATA[If you want to know what memcached is all about, check out my recent post about memcached on The Rackspace Cloud Blog. In order to use memcached in the cloud, you may need to run it on a public network. This introduces a rash of security concerns. Originally memcached was only intended for use on [...]]]></description>
			<content:encoded><![CDATA[<p>If you want to know what memcached is all about, check out my recent <a href="http://www.rackspacecloud.com/blog/2009/07/memcached-more-cache-less-cash/" target="_blank">post about memcached</a> on The Rackspace Cloud Blog.</p>
<p>In order to use memcached in the cloud, you may need to run it on a public network. This introduces a rash of security concerns. Originally memcached was only intended for use on private networks that were not available to the public, so there was no attempt made to provide access controls in the memcached server. There are no concepts of users, passwords, or any access control at all.</p>
<p>If you do run your memcached on a public interface you could use iptables or other host-based firewall rules to limit what IP addresses can access your memcached. However, if you are using a platform hosting service that other subscribers share with you, then others may be able to make connection from the same IP address(es) as you. This means that even if you did limit access to your memcached by IP address it&#8217;s possible that some other subscriber of the same hosting service could access your memcached, and cause you all sorts of security problems.</p>
<p>Here is a custom patched memcached 1.4.0 <a href="http://c0177911.cdn.cloudfiles.rackspacecloud.com/memcached-1.4.0-2.x86_64.rpm">x86_64 RPM</a> I wrote that adds a command line option &#8216;S&#8217; to disable &#8216;flush_all&#8217; and &#8216;stats detail on&#8217; . The original 1.4 source, a SPEC file for RHEL5 and CentOS5 and the patch are both included in the <a href="http://c0177911.cdn.cloudfiles.rackspacecloud.com/memcached-1.4.0-2.src.rpm">SRPM</a>. By disabling these commands with the -S option in /etc/sysconfig/memcached (OPTIONS=&#8221;-S&#8221;) you can prevent would-be hackers from dropping all your cached items, or finding out what the names are of the keys you are using. The memcached maintainers want to do this a different way, so this patch won&#8217;t be included in the base memecahced source tree.</p>
<p>The right long-term solution is to build multi-tenant features directly into memcached. I&#8217;m aware that Dustin Sallings at <a href="http://www.northscale.com/" target="_blank">NorthScale</a> has started some work of this sort, and has a working proof of concept. It&#8217;s not yet mature, and is generally incompatible with the current release of memcached, so it&#8217;s not yet suitable for production use. The main idea is that a TCP/IP connection to memcached could be authenticated with SASL, and limited to it&#8217;s own view of what&#8217;s inside memcahced.</p>
<p>My patch does not change how memcahced works, except for what it does when you enter the commands that I&#8217;m disabling. It will be just as stable as memcached 1.4.0 without the patch. The only difference is that you won&#8217;t have the &#8216;flush_all&#8217; command, and you won&#8217;t have access to detailed stats either.</p>
<p>If you want to flush your entire cache, simply reconfigure your application to begin using a new &#8220;secret&#8221; key prefix, and you&#8217;ll have the functional equivalent of a flush_all because none of the prior cached data will be accessed by your application any more. The old data will simply expire or <a href="http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used" target="_blank">LRU</a> out of the cache and be replaced by new data naturally.</p>
<p>By using a simple &#8220;secret&#8221; text prefix to all your keys, you will ensure that hackers won&#8217;t know how to access your data in the cache.  Consider prepending a reasonably long test string to the beginning of every key you store and access. Don&#8217;t make it too long, or that will multiply the number of packets required to get the data in and out of the cache, but something long enough that it won&#8217;t be easily guessed.</p>
<p><strong>This patch does not make memcached bulletproof</strong>. An attacker can still do a bunch of SET commands to fill your cache with junk, and force your hot content out. They can still irritate it with a bunch of &#8216;stats sizes&#8217; commands in a loop, or try to guess your secret prefix by randomly generating keys as a brute force attack until they find your content. For these reasons, you should only use this for storing data that&#8217;s not mission critical. There&#8217;s lots of data in this category that could really speed up your system under high load if you stored it in memcahed, but is not particularly sensitive to tampering.</p>
<p>Some have argued that this sort of a patch offers a false sense of security. I completely agree. Only use this if you know that your memcached installation will still not be secure, and that the security weakness could be exploited to ultimately hack your application. It will just be a little bit less insecure than it is without the patch.</p>
<p>I have seen memcached used in situations where only statistics are stored and accessed in memcached (instead of generating log files, statistical counters are stored in the cache). The application can do strict checking of the data it gets back from the cache, and not use it in any way that could lead to a security compromise. For example, make sure that all values returned are only numeric, and within acceptable value boundaries. An application of this sort would be appropriate with this patch.</p>
<p>I was thinking of making a better version of this patch that would allow you to specify an IP address (potentially 127.0.0.1 for example) that would have access to all commands that you define in a restricted access class. This way you could configure what IP address(es) could access what commands. Implementing this will require slowing memcached down a bit for all commands. I&#8217;ll plan to join forces with the others who are also interested in memcached multi-tenant features and produce a suitable solution that allows for secure deployments in insecure networks.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/08/patch-for-memcached-on-public-network/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Cloud Mobile Brings Cloud Storage to your iPhone</title>
		<link>http://adrianotto.com/2009/07/cloud-mobile-brings-cloud-storage-to-your-iphone/</link>
		<comments>http://adrianotto.com/2009/07/cloud-mobile-brings-cloud-storage-to-your-iphone/#comments</comments>
		<pubDate>Fri, 31 Jul 2009 22:30:46 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[iPhone]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=28</guid>
		<description><![CDATA[Cloud Mobile lets you manage your files stored on Cloud Files from the palm of your hand using your iPhone. It&#8217;s exciting to see software companies like Proactive Apps, LLC using Rackspace&#8217;s public REST API&#8217;s to build useful tools like this one. Founder of Proactive Apps, Marc Jones, quotes: &#8220;We made the decision to support [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=323040372&amp;mt=8"><img class="alignright size-full wp-image-1782" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloudmobilelogo.png" alt="cloudmobilelogo" width="119" height="97" /></a></p>
<p><a title="Cloud Mobile" href="http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=323040372&amp;mt=8" target="_blank">Cloud Mobile</a> lets you manage your files stored on <a title="Cloud Files" href="http://www.rackspacecloud.com/cloud_hosting_products/files" target="_blank">Cloud Files</a> from the palm of your hand using your iPhone. It&#8217;s exciting to see software companies like Proactive Apps, LLC using Rackspace&#8217;s public REST API&#8217;s to build useful tools like this one. Founder of Proactive Apps, Marc Jones, quotes:</p>
<blockquote><p>&#8220;We made the decision to support The Rackspace Cloud, and specifically Cloud Files, in the first version of Cloud Mobile based on their open, easy-to-use API and their support of open cloud standards.  The combination of Cloud Files strong feature set and the open API allowed us to create an iPhone app that leveraged both platforms.&#8221;</p></blockquote>
<p>Check out the Features:</p>
<p><strong>Features</strong></p>
<ul>
<li>Supports multiple Rackspace Cloud accounts</li>
<li>Container count and disk space for account</li>
<li>List containers</li>
<li>Icons designate container status: empty, not empty, CDN status</li>
<li>Create new containers</li>
<li>Delete containers</li>
<li>View container details: object count, disk space, and CDN info</li>
<li>CDN enable or disable containers; set TTLs; view CDN URL</li>
<li>Upload photos and movies (direct capture / library) from your iPhone to a container</li>
<li>List objects in a container</li>
<li>Delete objects</li>
<li>View object details: size, etag, content type, last modified</li>
<li>View / add / edit / delete object metadata</li>
<li>View or play objects supported on the iPhone: images, audio, video</li>
<li>Built-in links to Cloud Files and Rackspace Cloud status pages</li>
</ul>
<p><strong>Screenshots</strong></p>
<p><img class="alignnone size-full wp-image-1767" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_041.png" alt="cloud_mobile_041" width="238" height="471" /> <img class="alignnone size-full wp-image-1768" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_051.png" alt="cloud_mobile_051" width="238" height="471" /> <img class="alignnone size-full wp-image-1770" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_061.png" alt="cloud_mobile_061" width="238" height="471" /> <img class="alignnone size-full wp-image-1772" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_081.png" alt="cloud_mobile_081" width="238" height="471" /><img class="alignnone size-full wp-image-1773" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_091.png" alt="cloud_mobile_091" width="238" height="471" /> <img class="alignnone size-full wp-image-1775" src="http://cdn.adrianotto.com/wp-content/uploads/2009/07/cloud_mobile_101.png" alt="cloud_mobile_101" width="238" height="471" /></p>
<p><strong>&#8220;</strong><strong>What&#8217;s so cool?&#8221;</strong> &#8211; The coolest things you can do with this app are:</p>
<p>1. Take a photo or video on your iPhone, upload it to Cloud Files, enable the CDN feature, and share the link.</p>
<p>2. Once you have photos, audio, and video on Cloud Files, you can stream them directly to your iPhone and view them in the native player at the click of  a button.</p>
<p>3. This is the easiest possible way to change the TTL of a CDN enabled item we&#8217;ve seen so far.</p>
<p>Marc Jones:</p>
<blockquote><p>&#8220;&#8230;Snap a pic or grab some video, upload it to Cloud Files, CDN enable your container with one touch, and share the link &#8211; it&#8217;s that easy.&#8221;</p></blockquote>
<p>A future revision of this app will include support for management of other cloud services from <a href="http://www.rackspacecloud.com/">The Rackspace Cloud</a> including <a href="http://www.rackspacecloud.com/cloud_hosting_products/servers">Cloud Servers</a>.  This app might just be the perfect companion for your iPhone.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/07/cloud-mobile-brings-cloud-storage-to-your-iphone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 3/31 queries in 0.020 seconds using disk: basic
Object Caching 630/696 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-05-18 14:30:02 -->
