<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog &#187; best practices</title>
	<atom:link href="http://adrianotto.com/tag/best-practices/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Sun, 08 Apr 2012 00:02:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Maximizing Elasticity in the Cloud</title>
		<link>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/</link>
		<comments>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 14:35:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=571</guid>
		<description><![CDATA[Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more [...]]]></description>
			<content:encoded><![CDATA[<p>Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more capacity, you can add more servers, and when they are not needed anymore, you simply turn them back off. You only pay for the time those servers were running, so it&#8217;s more economic than having a large number of servers deployed all the time.</p>
<p>Most simple web clusters rely on a single database sever that all the application servers connect to. This way, all of the application servers have concurrent access to the same data. This can be problematic in the elastic use case when workloads increase, and more servers are added to the cluster. If the work is bottle-necked on storing or accessing data in the database server, adding additional application servers will not help. It will actually make the problem worse.</p>
<p>I spoke on a panel at Zendcon yesterday, which was covered in an <a title="Infoworld Article" href="http://www.infoworld.com/d/cloud-computing/security-remains-top-concern-cloud-app-builders-176707" target="_blank">Infoworld article</a> where my remarks were published. The article says:</p>
<blockquote><p>Panelists also debated use of SQL and database connectivity in clouds. SQL as a design pattern for storage &#8220;is not ideal for cloud applications,&#8221; said Adrian Otto, senior technical strategist for Rackspace Cloud. Afterward, he described SQL issues as &#8220;typically the No. 1 bottleneck&#8221; to elasticity in the cloud. With elasticity, applications use more or fewer application servers based on demand. Otto recommended that developers who want elasticity should have a decentralized data model that scales horizontally. &#8220;SQL itself isn&#8217;t the problem. The problem is row-oriented data in an application,&#8221; which causes performance bottlenecks, said Otto.</p></blockquote>
<p>The author Paul Krill did a good job here of accurately reporting my position on this subject. Data stored in databases are arranged in tables of rows and columns. A new piece of data adds a new row. Each row has multiple columns that separate fields of a single record of data in the table. The truth is that most web applications work very well with this data design pattern. Those should continue to use SQL databases with row oriented data. However, there are some applications where data may be arranged differently to make reading the data more efficient.</p>
<p>If you have a big table of data, and you want to pull out just a little bit of it using a query, the database server must determine the location of that data in the table by consulting the table&#8217;s index, and return the desired portion that matches the constraints given in the query. This makes the reading of data relatively expensive from a computational perspective. If data were instead arranged in lots of columns instead, it could be retrieved more efficiently, and the data could be more easily distributed over a larger number of servers yielding the horizontal scalability that cloud applications want. This works very well in cases where the number of reads are very high, but the data is not updated very frequently in proportion to the reads.</p>
<p>Let&#8217;s use a blog application as an example. Blog posts are written once, and maybe updated a few times, possibly once each time a comment is submitted. However, on a busy web site, a blog post may be read millions of times. If the posts were stored in a column oriented storage system like <a title="Cassandra" href="http://cassandra.apache.org/" target="_blank">Cassandra</a>, they could be quickly and easily retrieved using the id number of the blog post. The listing of recent blog posts can also be arranged in a column so that the front page of the blog site with the listing of the articles can be generated. Using this approach requires that the data be properly arranged as it&#8217;s stored, putting the computational burden on the (infrequent) write rather than on the (frequent) read.</p>
<p>Using a distributed system to store data in columns allows the data to be evenly distributed over an arbitrary number of servers, eliminating the central data bottleneck. Adding more servers in the correct proportion of application servers and storage servers can result in true horizontal scalability, meaning that the capacity increases as a direct proportion of how many servers are in the cluster.</p>
<p>Why doesn&#8217;t everyone do this already? For some good reasons:</p>
<ol>
<li>The concept of running applications in clouds is still relatively new. The related technology is still maturing.</li>
<li>Existing software tends to use SQL already. If you want to use an existing CMS platform, chances are it will require a central SQL database.</li>
<li>Most heavy-read workloads can be scaled well using data caching techniques. If applications don&#8217;t write data very often, it may not be necessary to scale beyond a single database server.</li>
<li>You must anticipate exactly how the application will use the data, and arrange it just right.</li>
<li>It may be harder to analyze the data. Once your data is arranged in a column store, you may not be able to query it in arbitrary ways. You may only be able to pull it out using it&#8217;s id numbers, or by systematically scanning all of it to find the parts you want.</li>
<li>Distributed data storage (aka: NoSQL) systems like Cassandra, Hbase, Redis, etc. are complicated, and there is a considerable learning curve associated with setting them up and maintaining them. In some cases these systems are not as good in terms of data durability or data consistency as the prevailing SQL database systems. These tradeoffs can be difficult to navigate.</li>
<li>Today&#8217;s software developers are very familiar with SQL as a data storage and access paradigm. They can very quickly develop software that relies on the ACID qualities of a SQL database.</li>
</ol>
<p>If you have an application that you want to deploy into a cloud, and you want it to be very elastic, you should think about the subject of how you arrange your data. If you use a centralized data design, you will probably have scalability bottlenecks when you add lots of servers. You should aim to decentralize the data in a way that you can easily add more servers to horizontally scale the environment, and not stumble on the limits of the database server. This is particularly important in situations where you need the application to write a lot of data, and a cache is not a suitable solution for you.</p>
<p>Over time, the reasons why not to use column oriented data will begin to shrink, and better tools and services will make it easier to do. Until then, I suggest that you carefully consider if you need maximum elasticity. If not, then it&#8217;s perfectly appropriate to keep using the same centralized row-oriented data paradigm. Use a cache like memcached in cases where you have heavy reads, and when it&#8217;s acceptable to show slightly outdated information to readers. The truth is that traditional solutions work really well for most web applications. However, if you have one of the more unique situations where you need true horizontal scalability, take a good look at a different arrangement for your data, and the systems and tools to make that possible for you in the cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Bandwidth != Network Performance</title>
		<link>http://adrianotto.com/2010/03/bandwidth-network-performance/</link>
		<comments>http://adrianotto.com/2010/03/bandwidth-network-performance/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 17:34:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=237</guid>
		<description><![CDATA[You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg"><img class="alignright size-full wp-image-302" title="rj45" src="http://cdn.adrianotto.com/wp-content/uploads/2010/03/rj45.jpg" alt="" width="240" height="240" /></a>You might think that if you want faster internet performance, you can simply get a connection to the internet that has higher bandwidth. When you get a &#8220;faster&#8221; internet connection you may observe faster downloads. But it&#8217;s less frequently the additional bandwidth, and more frequently reduced latency that actually produces increased interactive web performance. This post explains why.</p>
<p>First of all, let&#8217;s review some definitions:</p>
<ul>
<li><strong>Bandwidth</strong>: The amount of data that can be passed along a communications channel in a given period of time.</li>
<li><strong>Latency</strong>: The time it takes for a packet to cross a network connection, from sender to receiver.</li>
<li><strong>Speed</strong>: Fast and rapid moving, going, traveling, proceeding, or performing; swiftness.</li>
<li><strong>Throughput</strong>: The quantity data transmitted by a computer network over a given period of time.</li>
</ul>
<p>Now, all of these terms are related, and I want to highlight some of the minutia here:</p>
<p><strong>Bandwidth</strong></p>
<p>The higher the bandwidth is on a network connection, the more data it&#8217;s capable of transmitting in a given period of time. Higher bandwidth is better.</p>
<p><strong>Latency</strong></p>
<p>This is very very important, because latency effectively limits the amount of bandwidth you can consume if you are using a synchronous data transmission, like a TCP/IP download. Lower latency is better, and will yield faster speed.</p>
<p><strong>Throughput</strong></p>
<p>Throughput is another way of expressing speed. The higher the throughput, the faster your network communications will be. Note that your maximum possible throughput is your bandwidth. Actual throughput is equal to or less than your bandwidth.</p>
<p><strong>Speed</strong></p>
<p>If your network is high speed, you should observe high bandwidth, low latency, and high throughput.</p>
<h3>Latency and Bandwidth are Inversely Proportional</h3>
<p>For TCP/IP transmissions, the higher your latency is, the lower your throughput will be. Let&#8217;s explore why. The most common use of TCP/IP is for the web, which uses the HTTP protocol. HTTP works by making a TCP/IP connection to a remote server, issuing a request for a document, and then receiving the response. The protocol is text based. A simple HTTP transmission is illustrated below.</p>
<p>Client Request:</p>
<pre>GET / HTTP/1.1
User-Agent: Wget
Host: www.example.com
</pre>
<p>Server Response:</p>
<pre>HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT
ETag: "b300b4-1b6-4059a80bfd280"
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Connection: Keep-Alive
Date: Wed, 18 Nov 2009 22:36:34 GMT
Age: 1010
Content-Length: 438

  Example Web Page

You have reached this web page by typing "example.com",
"example.net",
  or "example.org" into your web browser.

These domain names are reserved for use in documentation and are not available
  for registration. See &amp;lta href="http://www.rfc-editor.org/rfc/rfc2606.txt"&gt;RFC
  2606&lt;/a&gt;, Section 3.
</pre>
<p>Here is a trace of the TCP/IP packets that make up that request:</p>
<pre>14:57:47.146665 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: S 3717672264:3717672264(0) win 5840
14:57:47.220092 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 1 win 183
14:57:47.220309 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: P 1:123(122) ack 1 win 183  (GET Request)
14:57:47.300962 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: P 1:728(727) ack 123 win 4502  (200 OK Response)
14:57:47.300993 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 728 win 228
14:57:47.302035 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: F 123:123(0) ack 728 win 228
14:57:47.375475 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: . ack 124 win 4502
14:57:47.375499 IP 192.0.32.10.80 &gt; 192.168.144.2.39556: F 728:728(0) ack 124 win 4502
14:57:47.375510 IP 192.168.144.2.39556 &gt; 192.0.32.10.80: . ack 729 win 228
</pre>
<p>Notice that there are 10 packets in the above trace. It&#8217;s a three way handshake to set up the TCP session, then a round trip to send the data, then two more round trips to close down the connection. Each time the server receives a packet from the client, the connection may wait in the server&#8217;s connection queue to be processed, which can further increase the interactive protocol latency. Consider the impact of high latency on a connection like this. Suppose that it takes 0.2 seconds for each round trip. That connection would have a total throughput of 727 bytes downloaded in 0.8 seconds. That&#8217;s a rate of 909 Bytes/sec. Maybe your internet connection is 15 Mb/sec. bandwidth did not matter. Latency caused the throughput to be low.</p>
<p>Now, you might be wondering why we can&#8217;t just improve networking technology to make latency lower. We can, but that&#8217;s not going to help much, because we are still bounded by the speed of light, among other factors. <strong>The speed of light is slow when you consider the distance it has to travel to cross continents on the earth.</strong> Let&#8217;s look at some match to explain that:</p>
<ul>
<li>The speed of light in vacuum is 299,792,458 m/s.</li>
<li>The speed of light in fiber optic cable is ~200,000,000 m/s.</li>
<li>The distance from Anaheim, CA to New York is 4,494,898 meters</li>
<li>The one-way latency to New York is  4,494,898 / 200,000,000 = 22.47ms</li>
<li>The round-trip time between Anaheim, CA and New York is 44.95ms</li>
<li>The current ping time from Anaheim, CA to New York is 72 ms</li>
<pre>Tracing the route to sl-gw33-nyc.sprintlink.net (144.228.243.82)
  1 sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 0 msec
    sl-crs2-ana-0-14-2-0.sprintlink.net (144.232.11.11) 0 msec
    sl-crs1-ana-0-14-2-0.sprintlink.net (144.232.11.9) 4 msec
  2 sl-crs2-fw-0-13-3-0.sprintlink.net (144.232.19.197) 28 msec
    sl-crs2-fw-0-9-5-0.sprintlink.net (144.232.20.130) 28 msec
    sl-crs1-fw-0-3-3-0.sprintlink.net (144.232.9.65) 28 msec
  3 sl-crs2-kc-0-0-0-2.sprintlink.net (144.232.19.141) 40 msec
    144.232.20.57 40 msec
    sl-crs1-kc-0-5-5-0.sprintlink.net (144.232.24.9) 40 msec
  4 sl-crs2-chi-0-13-5-0.sprintlink.net (144.232.20.109) 52 msec
    sl-crs1-chi-0-1-0-3.sprintlink.net (144.232.18.214) 56 msec
    sl-crs2-chi-0-15-2-0.sprintlink.net (144.232.24.206) 52 msec
  5 sl-crs1-nyc-0-8-0-3.sprintlink.net (144.232.18.123) 72 msec
    sl-crs2-nyc-0-8-0-1.sprintlink.net (144.232.20.119) 72 msec
    sl-crs1-chi-0-10-3-0.sprintlink.net (144.232.9.148) 72 msec
  6 sl-gw33-nyc-14-0-0.sprintlink.net (144.232.6.56) 72 msec *
    sl-gw33-nyc-15-0-0.sprintlink.net (144.232.6.58) 72 msec
</pre>
</ul>
<p>This round trip time includes all of the switching and routing to get the packet through its full round trip. That means that even if all switching and routing were instantaneous, and we had a perfectly straight fiber path between all points on the earth, that we could only reduce latency by about 40%. We can not accelerate the speed of light, so without a significant advance in data transmission technology (perhaps a quantum physics approach) we must accept the speed of light as a performance boundary.</p>
<h3>Making Web Sites Faster</h3>
<p>If you&#8217;re a web content publisher, you can set up your systems to work around these natural limitations. One way to make interactive web performance faster is to place copies of your data in various geographic locations that are physically closer to your end users. Using a <a href="http://en.wikipedia.org/wiki/Content_delivery_network" target="_blank">CDN</a> for your media content is one way to do this. You can also make your web server as fast as possible so that your dynamically generated content can be processed as quickly as possible. Using <a href="http://memcached.org/" target="_blank">memcached</a> to speed up your web application can help. Also, take a look at some <a href="http://developer.yahoo.com/performance/rules.html" target="_blank">best practices</a> for web developers for good performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/03/bandwidth-network-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Coding in the Cloud</title>
		<link>http://adrianotto.com/2009/09/coding-in-the-cloud/</link>
		<comments>http://adrianotto.com/2009/09/coding-in-the-cloud/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 07:00:35 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[best practices]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=22</guid>
		<description><![CDATA[I have been writing a 10-part series on the Rackspace Cloud Blog. I&#8217;ll be keeping a running list of the posts here as they are published. Rule 1 &#8211; Cache is Your Friend Rule 2 &#8211; Don’t write to the database in real time Rule 3 &#8211; Use a “Stateless” design whenever possible Rule 4 [...]]]></description>
			<content:encoded><![CDATA[<p>I have been writing a 10-part series on the <a href="http://www.rackspacecloud.com/blog/">Rackspace Cloud Blog</a>. I&#8217;ll be keeping a running list of the posts here as they are published.</p>
<p><a href="http://www.rackspacecloud.com/blog/2009/06/coding-in-the-cloud-%e2%80%93-rule-1-cache-is-your-friend/">Rule 1 &#8211; Cache is Your Friend</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-2-dont-write-to-the-database-in-real-time/">Rule 2 &#8211; Don’t write to the database in real time</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/07/coding-in-the-cloud-rule-3-use-a-stateless-design-whenever-possible/">Rule 3 &#8211; Use a “Stateless” design whenever possible</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/coding-in-the-cloud-rule-4-avoid-external-dependencies/" target="_blank">Rule 4 &#8211; Avoid Unnecessary External Dependencies</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/08/19/coding-in-the-cloud-rule-5-cms-plugins/" target="_blank">Rule 5 &#8211; CMS Plugins</a></p>
<p><a href="http://www.rackspacecloud.com/blog/2009/09/22/coding-in-the-cloud-rule-6-http-includes/" target="_blank">Rule 6 &#8211; HTTP Includes</a></p>
<p>Rule 7 &#8211; Coming Soon</p>
<p>Rule 8 &#8211; Coming Later</p>
<p>Rule 9 &#8211; Coming Later</p>
<p>Rule 10 &#8211; Coming Later</p>
<p>Yep, if you follow all 10 of the rules, you&#8217;ll probably have a really good cloud based web app.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/09/coding-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 3/20 queries in 0.039 seconds using disk: basic
Object Caching 334/382 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-05-18 14:28:28 -->
