<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog &#187; General</title>
	<atom:link href="http://adrianotto.com/category/general/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Sun, 08 Apr 2012 00:02:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>How to shrink a Windows VM in XenServer</title>
		<link>http://adrianotto.com/2012/04/how-to-shrink-a-windows-vm-in-xenserver/</link>
		<comments>http://adrianotto.com/2012/04/how-to-shrink-a-windows-vm-in-xenserver/#comments</comments>
		<pubDate>Sun, 08 Apr 2012 00:02:36 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=584</guid>
		<description><![CDATA[When I was told that you could only grow (not shrink) the storage volume for a running Windows VM in XenCenter, I took that as a challenge. Guess what, there is a way to shrink it! Here is how: Use XenCenter to add a new disk to the existing VM, making it the size you [...]]]></description>
			<content:encoded><![CDATA[<p>When I was told that you could only grow (not shrink) the storage volume for a running Windows VM in XenCenter, I took that as a challenge. Guess what, there is a way to shrink it! Here is how:</p>
<ol>
<li>Use XenCenter to add a new disk to the existing VM, making it the size you want to srink the current server to.</li>
<ul>
<li>Select the VM from the list on the right, click Storage.</li>
<li>Click &#8220;Add&#8230;&#8221; and create the new volume the size you want it.</li>
</ul>
<li>Log into the VM, (you can use the Console tab in XenCenter) and start &#8216;Disk Management&#8217; and Initialize the new disk.</li>
<ul>
<li>You can find it in Start-&gt;Administrative Tools-&gt;Computer Management.</li>
<li>Click on the new disk. Mine showed up as &#8220;Disk 1, Unknown&#8230; Not Initialized&#8221;.</li>
<li>If you click on the words &#8220;Not Initialized&#8221; it will be selected.</li>
<li>Next select Action-&gt;All Tasks-&gt;Initialize Disk. Select MBR and OK.</li>
</ul>
<li>Format a new partition on your new volume (you might be able to skip this step, did not try).</li>
<ul>
<li>Right click on the black &#8220;Unallocated&#8221; partition.</li>
<li>Select &#8220;New Simple Volume&#8221;.</li>
<li>Click Next twice, the size will be defaulted to the full disk, and specify a new drive letter.</li>
<li>In my example, I use E:\.</li>
<li>Format as NTFS using Quick Format, and click Finish.</li>
<li>Once you see formatting finish, and the new partition turns blue and is marked &#8220;Healthy&#8221; (Primary Partition). Proceed to the next step.</li>
</ul>
<li>In the VM, download and run XenConvert.</li>
<ul>
<li>I used version 2.4.1.</li>
<li>It requires Microsoft .NET v4.0, so you may need to download that from Microsoft and install it before running XenConvert.</li>
</ul>
<li>Start XenConvert and select From: Volume and To: Volume.</li>
<ul>
<li>Set the Source Volume to your boot drive (C:).</li>
<li>Set the Destination Volume to your new drive (E:).</li>
<li>Say &#8220;Yes&#8221; to the warning about losing free space.</li>
<li>Click Convert, and accept the warning about erasing data on your Destination volume (E:)</li>
<li>Go have a coffee, or something, and come back later.</li>
</ul>
<li>When XenConvert is finished, use &#8220;Disk Management&#8221; again in the VM to activate the new partition.</li>
<ul>
<li>Right Click the new partition and select &#8220;Mark Partition As Active&#8221;.</li>
</ul>
<li>Shut down the VM.</li>
<ul>
<li>I did this from inside the VM using Start-&gt;Shut down.</li>
</ul>
<li>Now re-order the drives on the VM so the new drive is in position 0.</li>
<ul>
<li>In XenCenter, select the VM, and select the Storage tab.</li>
<li>Detach the original drive by selecting it and clicking the &#8220;Detach&#8221; button.</li>
<li>Select the new drive, and click Properties, and set the Position to 0.</li>
</ul>
<li>Now you can boot the VM and voila! It&#8217;s now srunken to the size you wanted.</li>
<ul>
<li>If it all works the way you want it you can go delete the original drive in XenCenter to reclaim the space.</li>
</ul>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2012/04/how-to-shrink-a-windows-vm-in-xenserver/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Congestion, Convoy Effect, and Thundering Herd</title>
		<link>http://adrianotto.com/2011/01/convoy-effect/</link>
		<comments>http://adrianotto.com/2011/01/convoy-effect/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 00:52:13 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=441</guid>
		<description><![CDATA[When you have a limited supply of some resource, and a demand for that resource that exceeds the supply, you have something economists call a shortage. In this article I explain how we deal with shortages of resources (called congestion) in software systems. In economics we learn about the concept of Supply and Demand. Simply [...]]]></description>
			<content:encoded><![CDATA[<p>When you have a limited supply of some resource, and a demand for that resource that exceeds the supply, you have something economists call a shortage. In this article I explain how we deal with shortages of resources (called congestion) in software systems.</p>
<p>In economics we learn about the concept of <a href="http://en.wikipedia.org/wiki/Supply_and_demand" target="_blank">Supply and Demand</a>. Simply put, as something becomes more scarce, the price will adjust upward until an equilibrium is met between the supply and the demand. This holds true for situations where the pricing can be adjusted.</p>
<p>So, what happens when the pricing is fixed? You need some strategy to deal with the scarcity. A discipline known as <a href="http://en.wikipedia.org/wiki/Queueing_theory" target="_blank">queueing theory</a> can describe this. Let&#8217;s think about a common example of a system where the demand (work) exceeds the supply (capacity). I will explain these in terms of system engineering, as they would pertain to a software system.</p>
<p>In the grocery store there are a finite number of checkout clerks. We will call the lines of waiting shoppers <em>queues</em>. If queue length increases, the system is experiencing <em>congestion</em>. When this happens the store manager calls in more <em>workers</em> (store employees) to help with checkouts. This capacity expansion process may continue until all checkstands are open, or until there are no more available workers to add. This condition is the <em>maximum capacity</em> of the system.  If we are at maximum capacity, we hope that the congestion is eliminated. If expanding the capacity did not alleviate the congestion, then the shoppers must wait. New shoppers entering the store may see the long queues, and decide to leave. Some people standing in line may <em>abandon</em> the line, and leave the store without completing their intended purchase.</p>
<p>Having multiple shorter queues gives shoppers the illusion that the congestion is lower than it actually is. Some checkout queues move more quickly than others, so having one queue per checkstand is not a fair system. A shopper may wait longer than they deserve because of what queue they select. Sometimes shoppers will jump from one queue to another if they feel they won&#8217;t have to wait as long.</p>
<p><strong>Lesson One: Use a combined queue instead of multiple separate queues<br />
</strong></p>
<p>When all shoppers have equal priority, it&#8217;s better to use a common queue that&#8217;s serviced by multiple workers. This way every shopper in the queue waits a fair share of the congestion backlog. Nobody is forced to wait longer because of queue selection, since there is only one. This also eliminates the inefficiency of changing between queues when one is slow.</p>
<p><strong>Lesson Two: When using a priority queue, limit the number of priority waiters</strong></p>
<p>This leads to the temptation to have multiple classifications of shared queues. If you want to have a sense of a premium service for select <em>priority</em> shoppers, you can create two queues. The <em>priority queue</em> is for your select priority shoppers, and the main queue is for everyone else. This works when the population of priority shoppers is smaller than the general population and they arrive in the line infrequently. In this situation, the next available worker can select a shopper to service from the priority queue before they service someone from the main queue. If there are too many priority shoppers, the main queue does not get serviced, which leads to further congestion, and eventually the waiters will abandon the main queue.</p>
<p><strong>Lesson Three: Use Admission Control</strong></p>
<p>Now, you are familiar with the example of using the store, and simple queues, and the concept of consolidating multiple queues into a single queue, or the combination of a single queue and a priority queue. These are generally easy concepts to implement in software, and for most cases where congestion is rare, they work fine. However there are some conditions where they don&#8217;t work at all. Remember that in software systems, queued jobs may or may not be able to abandon the queue. Here are examples of when a simple queue with no admission control will fail:</p>
<ol>
<li>When you have the ability to create an unlimited number of workers, but they all rely on some shared pool or pools of limited resources. This is called a <em>concurrency bottleneck</em>. For example, when you have a finite amount of CPU resources, but the ability to create an effectively unlimited number of threads. There is an upper limit to the number of additional <em>throughput</em> gained by adding more threads when the CPU is congested.</li>
<li>When there is an unlimited number of <em>jobs</em> (waiters) to service, that arrive suddenly in bursts or steadily at a rate that&#8217;s faster than the combined capacity of all your workers. This is called a <em>work overflow</em>.</li>
</ol>
<p>The concept of <em>admission control</em> allows you to implement a policy controlling how much work you will accept into your queue(s) and at what rate. One simple policy is a maximum queue length. This length<em> (limit)</em> can be chosen by considering your desired maximum wait time <em>(max)</em> and your average service time (<em>t)</em> using the formula:</p>
<p><em>limit = ( max / t )</em></p>
<p>When <em>limit</em> jobs are queued, you must not queue new work, but instead refuse the work. Some protocols have a response code that can be used for this. For example, in HTTP applications, you can return a 502 Server Busy response with an optional Retry-After header that indicates to the client that they may retry the request at the specified later time when the congestion may be gone.</p>
<p>A more advanced policy may use a strategy where the source of work and the type of request, and apply a different policy. For example, you may limit the rate at which you accept requests from a given IP Address or geography. You may prioritize write requests over read requests. If an individual resource is congested, you may reject requests for that resource, but accept other requests.</p>
<p>There are a number of advantages to using an admission control policy:</p>
<ol>
<li>You don&#8217;t end up overloading your system to the point where service quality degrades for everyone due to extreme congestion conditions.</li>
<li>You can temporarily smooth out your work pattern so that the work can be completed at some later time when a temporary surge in demand has subsided.</li>
<li>It is very easy to monitor for congestion. A simple external check of your service will return an error during congestion conditions.</li>
<li>If the system has elastic capability to add more worker resources, this can be triggered when queue lengths are consistently non-zero over a span of time, and reduced when queue lengths are consistently zero over a span of time.</li>
</ol>
<p><strong>Lesson Four: Beware of the Thundering Herd and the Convoy Effect</strong></p>
<p>If you have a workload that experiences sudden bursts where new work is rapidly queued, and you have not properly limited the number of workers to a level of concurrency that you can manage with your available system resources, then you may experience a condition known as <a href="http://en.wikipedia.org/wiki/Resource_starvation" target="_blank"><em>resource starvation</em></a>. This can lead to a complete system failure, typically in a cascading series of related failures. When your system is in a state of resource starvation, it is too busy handling multitudes of concurrent work requests that it&#8217;s unable to make meaningful progress servicing the work. Nothing completes in a reasonable time because you are too busy switching between all the requests making tiny increments of progress on each.</p>
<p>A <em>Convoy Effect</em> is where you have numerous threads or processes that are <em>blocked</em> (suspended making no progress) on some resource (like an I/O operation, or a lock) and then cause additional congestion when the blocking condition clears. At this point all the runnable threads are synchronized together in a convoy, requesting other scarce resources (like Disk I/O bandwidth to write a lock file) and that the existence of the convoy is causing the delays to be further delayed than if there were simply a long queue of work. For example:</p>
<ol>
<li>A long queue of requests is present.</li>
<li>All workers dequeue the requests rapidly, but all of them need to read and write to the same file.</li>
<li>Concurrent reads are allowed with a shared read lock, and concurrent writes are protected by an exclusive write lock.</li>
<li>The underlying lock implementation uses a <a href="http://en.wikipedia.org/wiki/Spinlock">spinlock</a>.</li>
<li>Multitudes of concurrent readers cause write access to the file to be substantially slowed down because the disk drive is chattering doing all the reads.</li>
<li>All workers become synchronized on the exclusive write lock, slowing down the rate of progress, and essentially causing the system to be in a state of <a href="http://www.webopedia.com/TERM/L/livelock.html" target="_blank">live lock</a>.</li>
</ol>
<p>A <a href="http://en.wikipedia.org/wiki/Thundering_herd_problem"><em>Thundering Herd</em></a> is a situation where a multitude of serialized processes are blocked waiting on an event. When the event happens, all processes become runnable, but only one of them can be serviced at a time, so all the others must become blocked again waiting on a new event. This condition causes throughput of work to be suboptimal because of the wasted effort of the blocked processed getting woken up and going back to sleep. This is generally solved by only waking one of the blocked processes at a time.</p>
<p>So, a Thundering Herd will typically lead to a Convoy Effect, leaving your system in a critical state of congestion. Solve this by using sensible admission control, and more sophisticated queuing strategies.</p>
<p><strong>Lesson Five: Multiplex de-queue of identical requests</strong></p>
<p>If you have a queue of requests that are likely to have multiple identical queued entries, you can optimize your service of that queue by using a <a href="http://en.wikipedia.org/wiki/Multiplexing">Multiplexing</a> technique. Instead of each worker simply taking one request off the queue and servicing it, consider this approach instead:</p>
<ol>
<li>Read the first request on the queue.</li>
<li>Scan remaining entries in the queue for other identical requests, de-queuing them together as a batch. You might de-queue up to some maximum number per batch, or de-queue all matches in a single batch. This depends on how large your responses will be, and how many items you have in queue.</li>
<li>Process the request, sending the response to all clients in the batch at once.</li>
</ol>
<p>Note: this is only appropriate in use cases where out-of-order responses are gracefully handled by the client.</p>
<p>You will need to be careful that your queue implementation will allow you to safely de-queue any entry in the queue without leading to a <a href="http://en.wikipedia.org/wiki/Race_condition" target="_blank">race condition</a> where multiple workers are trying to de-queue the same entries simultaneously. You may decide that batching requests when they are queued up is more efficient than searching for matching requests when it&#8217;s time to de-queue. It would work in a similar way:</p>
<ol>
<li>Treat all queued entries as batches, with at least one request in each, up to some maximum.</li>
<li>Hash the request as you receive it, and check to see if you have an existing entry or entries in the queue for the given request.</li>
<li>If have a matching entry in the queue, then temporarily lock that entry, and add the new request&#8217;s client details (or connection handle) to the existing entry.</li>
<li>If the batch is full, iterate to the next, creating a new batch when you reach the end of the list.</li>
<li>Workers simply read the next batch from the queue, and send the response to all the associated clients.</li>
</ol>
<p>In general, you want your workers de-queueing work faster than you can queue it up. If you do your batching upon receipt of the request, you avoid the risk that the efficiency cost of the batching does not lead to a Convoy Effect.</p>
<p>Feedback welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/01/convoy-effect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drizzle Lines of Code vs. MySQL</title>
		<link>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/</link>
		<comments>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/#comments</comments>
		<pubDate>Wed, 13 Oct 2010 23:07:38 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=408</guid>
		<description><![CDATA[In my recent post about Drizzle I suggested that because Drizzle has fewer lines of code (less than half compared to MySQL) that is has a lower intrinsic risk of software defects. Of course it has bugs of its own, but because Drizzle is focused squarely on OLTP use cases, it can be substantially smaller [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="../2010/09/drizzle-is-now-beta/" target="_blank">recent post about Drizzle</a> I suggested that because <a href="http://drizzle.org" target="_blank">Drizzle</a> has fewer lines of code (less  than half compared to <a href="http://www.mysql.com">MySQL</a>) that is has a lower intrinsic risk of  software defects. Of course it has bugs of its own, but because Drizzle  is focused squarely on OLTP use cases, it can be substantially smaller in terms of lines of code. In fact, when I used <a href="http://www.ohloh.net/p/compare" target="_blank">Ohloh</a> to compare the source code line count of MySQL and Drizzle the picture was very clear:</p>
<p><img class="size-full wp-image-409 alignnone" title="mysql-drizzle-loc" src="http://cdn.adrianotto.com/wp-content/uploads/2010/10/mysql-drizzle-loc.png" alt="" width="593" height="298" /></p>
<p>Part of the reason that the code count is smaller is because the non-relevant parts have simply been removed, but there was also a lot of effort put into modernizing the code base, and using C++. Using Boost and other foundational software projects as building blocks, the need for internal implementations for basic things is greatly reduced. For example, the MySQL code base has a REGEX implementation in it. This was removed in Drizzle, and replaced by PCRE, which is expected to change again soon to use Boost. By using libraries that are leveraged by numerous projects, the probability of a lower bug count is an additional benefit. Simply put, with more smart people looking at and using the software, and improving its weaknesses, the more likely it is to be high quality.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Drizzle is now BETA</title>
		<link>http://adrianotto.com/2010/09/drizzle-is-now-beta/</link>
		<comments>http://adrianotto.com/2010/09/drizzle-is-now-beta/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 19:21:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=376</guid>
		<description><![CDATA[Today Drizzle enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drizzle.org"><img class="alignright size-full wp-image-222" title="Drizzle Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/drizzle64.png" alt="" width="64" height="64" /></a>Today <a href="http://drizzle.org" target="_blank">Drizzle</a> enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate sponsors, including <a href="http://www.sun.com" target="_blank">Sun Microsystems</a> in the early days, and now <a href="http://www.rackspace.com/">Rackspace</a>. Most of the code is contributed by the <a href="https://launchpad.net/drizzle">developer community</a>, which is made up of of a very talented group of open source developers with core committers from four different companies. More about the Drizzle project:</p>
<h3>Charter</h3>
<ul>
<li>A database optimized for Cloud infrastructure and Web applications</li>
<li>Design for massive concurrency on modern multi-cpu architecture</li>
<li>Optimize memory for increased performance and parallelism</li>
<li>Open source, open community, open design</li>
</ul>
<h3>Scope</h3>
<ul>
<li>Re-designed modular architecture providing plugins with defined APIs</li>
<li>Simple design for ease of use and administration</li>
<li>Reliable, ACID transactional</li>
</ul>
<p>There are many exciting changes, such as optimizing everything for 64-bit CPU&#8217;s and Multi-Core. You can&#8217;t hardly even buy 32-bit and Single Core servers nowadays if you want them. It makes no sense to have software that&#8217;s optimized for these antiquated hardware designs. No effort is spent optimizing software to work with rotational hard drives because SSD drives are the way of the future. All the language collations have simply been replaced with UTF-8 only, because the web uses UTF-8. Plus, this is tested with 41 different language translations. Drizzle has a new scheduler. The legacy MySQL scheduler was designed to work for a thread-per-session setup. In Drizzle, sessions are handled independently from the threads. The new scheduler allows this to work.</p>
<p>Drizzle uses InnoDB as its default storage engine, which is great for OLTP. It also supports the <a href="http://www.primebase.org/" target="_blank">PBXT</a> storage engine. There are available plugins for the InnoDB Embedded Engine and <a href="http://www.haildb.com/" target="_blank">HailDB</a> which will soon be the new default. DDL Operations (like ALTER TABLE) can actually roll back in the event that something goes wrong in the process, rather than leaving you with incomplete or corrupt data.</p>
<p>The code base in Drizzle has been fully modernized, and brought up to today&#8217;s standards of C++ with extensive use of the <a href="http://en.wikipedia.org/wiki/Standard_Template_Library">C++ STL</a> to replace MySQL&#8217;s usage of obscure custom data type implementations that offered no real benefit compared to what the STL has today. Another example of improvements in this area is the replacement of the legacy REGEX implementation with a more standard library. All of these changes reduce the amount of Drizzle source code dramatically compared to MySQL. Less code and simpler code means less bugs, plain and simple. Drizzle is well on its way to being an ideal fit for web applications that need a reliable, and high performance transactional database.</p>
<h3>Features in Drizzle7 Beta</h3>
<ul>
<li>New micro kernel</li>
<li>Migration Tool</li>
<li>Instance Catalog Support</li>
<li>Universal Replication</li>
<li>User query analysis</li>
<li>Mutli-core Support</li>
</ul>
<h3>What &#8220;Beta&#8221; means</h3>
<ul>
<li>Your data is safe. Transactional engine by default and stable for over 2 years.</li>
<li>Upgrade the system in-place without exporting/importing data.</li>
<li>Replication is still being tested.</li>
</ul>
<p>In Microsoft terms, it means that this project would have launched about a year ago. In Google terms, it probably would have launched six months ago. Simply put, if you trust your data to a MySQL system today running InnoDB, you should feel comfortable trying Drizzle. There have been some changes to the InnoDB setup, such as the elimination of the FRM files from disk which eliminate possible inconsistency between the state on disk and the state in InnoDB. I am in the process of moving a few of my produciton applications to use the Drizzle Beta. If you&#8217;re an accomplished system administrator and DBA, you should seriously consider putting at least one of your production applications on Drizzle now, and see how it works for you.</p>
<h3>What&#8217;s Next?</h3>
<ul>
<li>Beta <a href="https://launchpad.net/drizzle/+announcement/6840" target="_blank">announced today 2010-09-29</a>.</li>
<li>GA February 2011</li>
<li>GA May 2011 for Multi-Tenancy features that allow an arbitrary number of logical databases (Schemas, Tables, etc.) to exist concurrently with full data isolation between them. This allows for individual security and resource controls (Threads, Memory, IO), and individual database backups, rather than system level backups. This feature will be called &#8220;Catalogs&#8221;.</li>
</ul>
<h3>Download Drizzle</h3>
<p>Time to get started with the beta. Download <a href="https://launchpad.net/drizzle/elliott/2010-09-27">the beta</a> today!</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/drizzle-is-now-beta/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Rackspace and NASA Contribute Huge to Open Source</title>
		<link>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/</link>
		<comments>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 17:00:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[OpenStack]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=319</guid>
		<description><![CDATA[Today Rackspace and NASA announced OpenStack as a coordinated open development project with 28 participating partner companies and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called Nebula. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nasa.gov"><img class="size-full wp-image-324 alignright" title="NASA" src="http://cdn.adrianotto.com/wp-content/uploads/2010/07/nasa-logo.png" alt="" width="155" height="100" /></a><a href="http://www.rackspace.com"><img class="size-full wp-image-327 alignright" title="Rackspace" src="http://cdn.adrianotto.com/wp-content/uploads/2010/07/rackspace-logo.png" alt="" width="150" height="100" /></a>Today <a href="http://www.rackspace.com" target="_blank">Rackspace</a> and <a href="http://www.nasa.gov">NASA</a> <a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489" target="_blank">announced</a> <a href="http://www.openstack.org" target="_blank">OpenStack</a> as a coordinated open development project with <a href="http://openstack.org/community/">28 participating partner companies</a> and growing. NASA contributed source code from its NOVA project for running a large scale computing platform called <a href="http://nebula.nasa.gov/services/" target="_blank">Nebula</a>. Rackspace contributed source code for its Object Store, used to host the Cloud Files web storage service. The API for Cloud Servers, which was previously released with a Creative Commons open license will be used by OpenStack. I was a key contributor to the design of that API, and I&#8217;m honored to have been a part of it. Rackspace has vowed a <a href="http://openstack.org/blog/">commitment to open development</a> of this platform.</p>
<p>This is very exciting for consumers of Cloud Computing because:</p>
<ol>
<li>It allows individual companies to run their own clouds inside their own data centers and on their own equipment using the same scalable technology that powers some of the largest cloud infrastructures in the world.</li>
<li>An individual company can develop applications on their own cloud, and know with confidence that they can run their application on a number of different public clouds without any special adapter software. They need only find a cloud computing or cloud storage provider that uses OpenStack software to ensure compatibility.</li>
<li>If an application is hosted on one OpenStack public cloud, it can be easily moved to another without changes to the application source code, and without using any cloud middleware. This can completely eliminate all fears relating to single-vendor lock-in.</li>
<li>Applications can be run simultaneously in multiple clouds using the exact same software that needs to only implement a single API for universal access to computing and object storage resources.</li>
</ol>
<p><a href="http://www.openstack.org"><img class="alignnone" title="OpenStack" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-openstackorg.png" alt="" width="280" height="61" /></a><a href="http://www.rackspace.com/information/mediacenter/release.php?id=8489"> <img class="alignnone" title="Press Release" src="http://www.rackspace.com/images/information/mediacenter/openstack/button-pressrelease.png" alt="" width="280" height="61" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/07/rackspace-and-nasa-contribute-huge-to-open-source/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bandwidth != Network Performance</title>
		<link>http://adrianotto.com/2010/03/bandwidth-network-performance/</link>
		<comments>http://adrianotto.com/2010/03/bandwidth-network-performance/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 17:34:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=237</guid>
		<description><![CDATA[You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg"><img class="alignright size-full wp-image-302" title="rj45" src="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg" alt="" width="240" height="240" /></a>You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This post explains why.</p>
<p>First of all, let&#8217;s review some definitions:</p>
<ul>
<li><strong>Bandwidth</strong>: The amount of data that can be passed along a communications channel in a given period of time.</li>
<li><strong>Latency</strong>: The time it takes for a packet to cross a network connection, from sender to receiver.</li>
<li><strong>Speed</strong>: Fast and rapid moving, going, traveling, proceeding, or performing; swiftness.</li>
<li><strong>Throughput</strong>: The quantity data transmitted by a computer network over a given period of time.</li>
</ul>
<p>Now, all of these terms are related, and I want to highlight some of the minutia here:</p>
<p><strong>Bandwidth</strong></p>
<p>The higher the bandwidth is on a network connection, the more data it&#8217;s capable of transmitting in a given period of time. Higher bandwidth is better.</p>
<p><strong>Latency</strong></p>
<p>This is very very important, because latency effectively limits the amount of bandwidth you can consume if you are using a synchronous data transmission, like a TCP/IP download. Lower latency is better, and will yield faster speed.</p>
<p><strong>Throughput</strong></p>
<p>Throughput is another way of expressing speed. The higher the throughput, the faster your network communications will be. Note that your maximum possible throughput is your bandwidth. Actual throughput is equal to or less than your bandwidth.</p>
<p><strong>Speed</strong></p>
<p>If your network is high speed, you should observe high bandwidth, low latency, and high throughput.</p>
<h3>Latency and Bandwidth are Inversely Proportional</h3>
<p>For TCP/IP transmissions, the higher your latency is, the lower your throughput will be. Let&#8217;s explore why. The most common use of TCP/IP is for the web, which uses the HTTP protocol. HTTP works by making a TCP/IP connection to a remote server, issuing a request for a document, and then receiving the response. The protocol is text based. A simple HTTP transmission is illustrated below.</p>
<p>Client Request:</p>
<pre>GET / HTTP/1.1
User-Agent: Wget
Host: www.example.com
</pre>
<p>Server Response:</p>
<pre>HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT
ETag: "b300b4-1b6-4059a80bfd280"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Connection: Keep-Alive
Date: Wed, 18 Nov 2009 22:36:34 GMT
Age: 1010
Content-Length: 438

  Example Web Page

You have reached this web page by typing "example.com",
"example.net",
  or "example.org" into your web browser.

These domain names are reserved for use in documentation and are not available
  for registration. See &amp;lta href="http://www.rfc-editor.org/rfc/rfc2606.txt"&gt;RFC
  2606&lt;/a&gt;, Section 3.
</pre>
<p>Here is a trace of the TCP/IP packets that make up that request:</p>
<pre>14:57:47.146665 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: S 3717672264:3717672264(0) win 5840
14:57:47.220092 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 1 win 183
14:57:47.220309 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: P 1:123(122) ack 1 win 183  (GET Request)
14:57:47.300962 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: P 1:728(727) ack 123 win 4502  (200 OK Response)
14:57:47.300993 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 728 win 228
14:57:47.302035 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: F 123:123(0) ack 728 win 228
14:57:47.375475 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: . ack 124 win 4502
14:57:47.375499 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: F 728:728(0) ack 124 win 4502
14:57:47.375510 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 729 win 228
</pre>
<p>Notice that there are 10 packets in the above trace. It&#8217;s a three way handshake to set up the TCP session, then a round trip to send the data, then two more round trips to close down the connection. Each time the server receives a packet from the client, the connection may wait in the server&#8217;s connection queue to be processed, which can further increase the interactive protocol latency. Consider the impact of high latency on a connection like this. Suppose that it takes 0.2 seconds for each round trip. That connection would have a total throughput of 727 bytes downloaded in 0.8 seconds. That&#8217;s a rate of 909 Bytes/sec. Maybe your internet connection is 15 Mb/sec. bandwidth did not matter. Latency caused the throughput to be low.</p>
<p>Now, you might be wondering why we can&#8217;t just improve networking technology to make latency lower. We can, but that&#8217;s not going to help much, because we are still bounded by the speed of light, among other factors. <strong>The speed of light is slow when you consider the distance it has to travel to cross continents on the earth.</strong> Let&#8217;s look at some match to explain that:</p>
<ul>
<li>The speed of light in vacuum is 299,792,458 m/s.</li>
<li>The speed of light in fiber optic cable is ~200,000,000 m/s.</li>
<li>The distance from Anaheim, CA to New York is 4,494,898 meters</li>
<li>The one-way latency to New York is  4,494,898 / 200,000,000 = 22.47ms</li>
<li>The round-trip time between Anaheim, CA and New York is 44.95ms</li>
<li>The current ping time from Anaheim, CA to New York is 72 ms</li>
<pre>Tracing the route to sl-gw33-nyc.sprintlink.net (144.228.243.82)
  1 sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 0 msec
    sl-crs2-ana-0-14-2-0.sprintlink.net (144.232.11.11) 0 msec
    sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 4 msec
  2 sl-crs2-fw-0-13-3-0.sprintlink.net (144.232.19.197) 28 msec
    sl-crs2-fw-0-9-5-0.sprintlink.net (144.232.20.130) 28 msec
    sl-crs1-fw-0-3-3-0.sprintlink.net (144.232.9.65) 28 msec
  3 sl-crs2-kc-0-0-0-2.sprintlink.net (144.232.19.141) 40 msec
    144.232.20.57 40 msec
    sl-crs1-kc-0-5-5-0.sprintlink.net (144.232.24.9) 40 msec
  4 sl-crs2-chi-0-13-5-0.sprintlink.net (144.232.20.109) 52 msec
    sl-crs1-chi-0-1-0-3.sprintlink.net (144.232.18.214) 56 msec
    sl-crs2-chi-0-15-2-0.sprintlink.net (144.232.24.206) 52 msec
  5 sl-crs1-nyc-0-8-0-3.sprintlink.net (144.232.18.123) 72 msec
    sl-crs2-nyc-0-8-0-1.sprintlink.net (144.232.20.119) 72 msec
    sl-crs1-chi-0-10-3-0.sprintlink.net (144.232.9.148) 72 msec
  6 sl-gw33-nyc-14-0-0.sprintlink.net (144.232.6.56) 72 msec *
    sl-gw33-nyc-15-0-0.sprintlink.net (144.232.6.58) 72 msec
</pre>
</ul>
<p>This round trip time includes all of the switching and routing to get the packet through its full round trip. That means that even if all switching and routing were instantaneous, and we had a perfectly straight fiber path between all points on the earth, that we could only reduce latency by about 40%. We can not accelerate the speed of light, so without a significant advance in data transmission technology (perhaps a quantum physics approach) we must accept the speed of light as a performance boundary.</p>
<h3>Making Web Sites Faster</h3>
<p>If you&#8217;re a web content publisher, you can set up your systems to work around these natural limitations. One way to make interactive web performance faster is to place copies of your data in various geographic locations that are physically closer to your end users. Using a <a href="http://en.wikipedia.org/wiki/Content_delivery_network" target="_blank">CDN</a> for your media content is one way to do this. You can also make your web server as fast as possible so that your dynamically generated content can be processed as quickly as possible. Using <a href="http://memcached.org/" target="_blank">memcached</a> to speed up your web application can help. Also, take a look at some <a href="http://developer.yahoo.com/performance/rules.html" target="_blank">best practices</a> for web developers for good performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/bandwidth-network-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CPU Time stolen from a virtual machine?</title>
		<link>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/</link>
		<comments>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:42:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=258</guid>
		<description><![CDATA[Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;Time stolen from a virtual machine&#8220;. More specifically: It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;<em>Time stolen from a virtual machine</em>&#8220;. More specifically:</p>
<p>It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.</p>
<p>There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that&#8217;s got a very low cpu workload on it:</p>
<pre>
vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7
</pre>
<p>As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the <em>id</em> and <em>st</em> columns.</p>
<p>In this example, I really don&#8217;t care that I have a nonzero <em>st</em> column because my workload is basically idle all the time anyway.</p>
<p>If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ED Strikes Again?</title>
		<link>http://adrianotto.com/2010/02/ed-strikes-again/</link>
		<comments>http://adrianotto.com/2010/02/ed-strikes-again/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 22:28:19 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=251</guid>
		<description><![CDATA[It&#8217;s not the ED you are thinking of. Nope, it&#8217;s actually the External Dependency. One piece of advice that I continually dispense is to try to reduce dependencies on remote web sites when coding your own. The problem strikes most dramatically when you run a very busy site, and you have some feed or resource [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s not the ED you are thinking of. Nope, it&#8217;s actually the <span style="color: #ff0000;"><strong>E</strong></span>xternal <span style="color: #ff0000;"><strong>D</strong></span>ependency.</p>
<p>One piece of advice that I continually dispense is to try to reduce dependencies on remote web sites when coding your own. The problem strikes most dramatically when you run a very busy site, and you have some feed or resource that you download from a remote site. That remote site crashes, and oops, so does yours. It also happens when your busy site gets more traffic than the corresponding requests to the remote site can handle.</p>
<p>I ran into this again today. One site that I host was consuming a remote feed from a site that has a much smaller capacity than my customer does. The site on my end gets over 10 million page views a day (peak ~2000 page views per second). The capacity mismatch became very apparent when something went wrong on the remote end.</p>
<p>The code logic was:</p>
<ol>
<li>If you have a cached version of the feed, and its fresh, then use it.</li>
<li>If the cached entry is expired, then fetch a new one, and replace the one in cache.</li>
</ol>
<p>This logic is fundamentally flawed for busy sites. It seems sensible, but think about what happens when the cached entry expires, and the remote site is responding very slowly. All of a sudden a stampede of requests start stacking up, all trying to get the feed in parallel. It crashes the remote site even worse. The remote site tries to reboot, and you quickly crash it again. The sequence repeats indefinitely.</p>
<p>Why? Because the window of time during which the cache is invalid gets wider and wider as the remote site gets slower and slower. The longer that window is open, the more traffic the remote site will get from cache misses.</p>
<p>A clean solution is to update the cache asynchronously using a scheduled batch job that keeps a local cache of the data. Only attempt to update the cache when it has actually changed. The logic in the web appication changes to:</p>
<ol>
<li>Always use the data in the cached file.</li>
</ol>
<p>The feed site is consulted on regular intervals using a scheduled batch job (cron), and the cached data is updated if it&#8217;s able to get a response. If the remote site is down or too slow, then the application simply continues to use the version it had before. Problem solved!</p>
<p>Why is this not a best practice for all web developers? Because most web sites don&#8217;t get enough traffic for it to matter much. But, if you&#8217;ve got a busy site, and you don&#8217;t want it to crash when your remote feeds do, then you might want to consider getting that data asynchronously, or at least use a cache update procedure that&#8217;s serialized.</p>
<p>Here is <a href="http://cloudsites.rackspacecloud.com/index.php/How_to_download_data_from_remote_web_servers_efficiently" target="_blank">an example</a> of a non-blocking serialization approach that works for PHP applications.</p>
<p>So all you web developers out there who like to consume RSS feeds on the server-side of your web application&#8230; don&#8217;t say I didn&#8217;t warn you. Go look at all your code and make sure you don&#8217;t have an dependency on a remote site. If you do, you now know at least two ways to solve that problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/ed-strikes-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Advice for backing up your Macs</title>
		<link>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/</link>
		<comments>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 06:23:21 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[OSX]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=149</guid>
		<description><![CDATA[My wife asked me today if I could give a colleague some advice for how to backup a bunch of Macs. I&#8217;ll share my advice for you here. Over the past two decades I&#8217;ve used so many different backup systems and software and hardware combinations, I can&#8217;t even count them all. So this begs the [...]]]></description>
			<content:encoded><![CDATA[<p>My wife asked me today if I could give a colleague some advice for how to backup a bunch of Macs. I&#8217;ll share my advice for you here. Over the past two decades I&#8217;ve used so many different backup systems and software and hardware combinations, I can&#8217;t even count them all. So this begs the question, what do I do at home?</p>
<p><a href="http://www.apple.com/findouthow/mac/#timemachinebasics"><img class="size-full wp-image-474 alignleft" style="margin-left: 10px; margin-right: 10px;" title="Time Machine" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/hero_timemachine_lg.jpg" alt="" width="77" height="77" /></a></p>
<p>I use the <a href="http://www.apple.com/macosx/what-is-macosx/time-machine.html">TimeMachine</a> software built into Leopard (and newer) <a href="http://www.apple.com/macosx/" target="_blank">OSX</a>. I use a locally connected USB2. A Firewire drive would also be good. Here is a drive that I like because it has lots of capacity, reasonably affordable, compact, and runs quietly.</p>
<p><a href="http://www.buy.com/prod/fantom-greendrive-1tb-usb-2-0-and-esata-external-hard-drive-2-year/q/loc/101/208503758.html" target="_blank"><img class="alignright size-full wp-image-154" title="Fantom Drive" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/FantomDrive.png" alt="Fantom Drive" width="163" height="190" />Fantom GreenDrive Pro 2TB eSATA and USB 2.0 7200RPM 32MB External Hard Drive</a></p>
<p>Another that&#8217;s half the capacity, but cheaper:</p>
<p><a href="http://www.buy.com/prod/fantom-greendrive-1tb-usb-2-0-and-esata-external-hard-drive-2-year/q/loc/101/208503758.html" target="_blank">Fantom GreenDrive 1TB USB 2.0 and eSATA External Hard Drive</a></p>
<p>Now, for home use a 2TB drive is probably enough for all your computers. At first I networked them all together to use just one drive on one of my computers shared to all the others so that all the backups were on the one big drive. I later decided that every computer should have it&#8217;s own drive for backups. Why? A few reasons:</p>
<ol>
<li>To conserve electricity. When you are using the computer is when the backup snapshots should be taken and archived. When the computer is asleep, may not respond over the network depending on how it&#8217;s set up, meaning you need to keep that host machine powered up all the time wasting electricity.</li>
<li>Each computer does its backups when they get used, and in the idle time before they fall asleep again. It works much better for me this way.</li>
<li>Immediate restores. Having a local drive on each computer makes restoration super fast. It&#8217;s not like a network or tape backup where you need to wait for your data to transfer back on to your hard drive to begin using it.</li>
</ol>
<p>It&#8217;s easy to set up Time Machine. Connect the drive, open &#8220;Time Machine Preferences&#8221; and select the drive.</p>
<p>I re-initialized mine using the disk utility first so that it had a journaled MacOS filesystem on it instead of the default FAT partitioning that comes from the factory.</p>
<p>One really nice thing about Time Machine is that you can easily revert to a prior point in time in the event you accidentally mess something up, get a virus, or whatever. It&#8217;s about the easiest tool I&#8217;ve ever used. it automatically rotates backups hourly, daily, weekly, etc and deletes old backups automatically to make room for new ones. It&#8217;s totally automatic whereas with other tools you need to set that all up yourself.</p>
<p>This sort of local backup does not help if your house or office gets burglarized or burns down because you lose both the primary and backup copy of the data.</p>
<p><img class="alignright size-full wp-image-156" title="Jungle Disk" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/jd-logo.png" alt="Jungle Disk" width="222" height="50" />Another option is to use <a href="http://www.jungledisk.com/" target="_blank">JungleDisk</a> to back your data up to the cloud. That has the advantage of only paying for the storage you actually use, the backups are off site, so if you have theft or fire, you can still restore, potentially somewhere else. A disadvantage is that it requires adequate internet connectivity. Your upload speed needs to be fast enough to accommodate all of the data you produce within each backup interval. If your network is already constrained on available bandwidth, running backups over it could potentially aggravate matters. In short, if you have a big fat internet connection, then use <a href="http://www.jungledisk.com/" target="_blank">JungleDisk</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/advice-for-backing-up-your-macs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coding in the Cloud</title>
		<link>http://adrianotto.com/2009/09/coding-in-the-cloud/</link>
		<comments>http://adrianotto.com/2009/09/coding-in-the-cloud/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 07:00:35 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[best practices]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=22</guid>
		<description><![CDATA[I have been writing a 10-part series on the Rackspace Cloud Blog. I&#8217;ll be keeping a running list of the posts here as they are published. Rule 1 &#8211; Cache is Your Friend Rule 2 &#8211; Don’t write to the database in real time Rule 3 &#8211; Use a “Stateless” design whenever possible Rule 4 [...]]]></description>
			<content:encoded><![CDATA[<p>I have been writing a 10-part series on the <a href="http://www.rackspacecloud.com/blog/">Rackspace Cloud Blog</a>. I&#8217;ll be keeping a running list of the posts here as they are published.</p>
<p><a href="http://www.rackspacecloud.com/blog/2009/06/coding-in-the-cloud-%e2%80%93-rule-1-cache-is-your-friend/">Rule 1 &#8211; Cache is Your Friend</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-2-dont-write-to-the-database-in-real-time/">Rule 2 &#8211; Don’t write to the database in real time</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-3-use-a-stateless-design-whenever-possible/">Rule 3 &#8211; Use a “Stateless” design whenever possible</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/coding-in-the-cloud-rule-4-avoid-external-dependencies/" target="_blank">Rule 4 &#8211; Avoid Unnecessary External Dependencies</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/19/coding-in-the-cloud-rule-5-cms-plugins/" target="_blank">Rule 5 &#8211; CMS Plugins</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/09/22/coding-in-the-cloud-rule-6-http-includes/" target="_blank">Rule 6 &#8211; HTTP Includes</a></p>
<p>Rule 7 &#8211; Coming Soon</p>
<p>Rule 8 &#8211; Coming Later</p>
<p>Rule 9 &#8211; Coming Later</p>
<p>Rule 10 &#8211; Coming Later</p>
<p>Yep, if you follow all 10 of the rules, you&#8217;ll probably have a really good cloud based web app.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/09/coding-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 4/51 queries in 0.088 seconds using disk: basic
Object Caching 580/703 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-05-18 14:17:36 -->
