<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog</title>
	<atom:link href="http://adrianotto.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Thu, 20 Oct 2011 14:35:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Maximizing Elasticity in the Cloud</title>
		<link>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/</link>
		<comments>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 14:35:33 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=571</guid>
		<description><![CDATA[Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more [...]]]></description>
			<content:encoded><![CDATA[<p>Running a production application in the cloud can be great because it&#8217;s possible to add and remove servers from a cluster dynamically using a provisioning API. These automatic additions and removals can be triggered by system utilization levels that you measure, such as concurrent network connections, memory utilization, or CPU utilization. When you need more capacity, you can add more servers, and when they are not needed anymore, you simply turn them back off. You only pay for the time those servers were running, so it&#8217;s more economic than having a large number of servers deployed all the time.</p>
<p>Most simple web clusters rely on a single database sever that all the application servers connect to. This way, all of the application servers have concurrent access to the same data. This can be problematic in the elastic use case when workloads increase, and more servers are added to the cluster. If the work is bottle-necked on storing or accessing data in the database server, adding additional application servers will not help. It will actually make the problem worse.</p>
<p>I spoke on a panel at Zendcon yesterday, which was covered in an <a title="Infoworld Article" href="http://www.infoworld.com/d/cloud-computing/security-remains-top-concern-cloud-app-builders-176707" target="_blank">Infoworld article</a> where my remarks were published. The article says:</p>
<blockquote><p>Panelists also debated use of SQL and database connectivity in clouds. SQL as a design pattern for storage &#8220;is not ideal for cloud applications,&#8221; said Adrian Otto, senior technical strategist for Rackspace Cloud. Afterward, he described SQL issues as &#8220;typically the No. 1 bottleneck&#8221; to elasticity in the cloud. With elasticity, applications use more or fewer application servers based on demand. Otto recommended that developers who want elasticity should have a decentralized data model that scales horizontally. &#8220;SQL itself isn&#8217;t the problem. The problem is row-oriented data in an application,&#8221; which causes performance bottlenecks, said Otto.</p></blockquote>
<p>The author Paul Krill did a good job here of accurately reporting my position on this subject. Data stored in databases are arranged in tables of rows and columns. A new piece of data adds a new row. Each row has multiple columns that separate fields of a single record of data in the table. The truth is that most web applications work very well with this data design pattern. Those should continue to use SQL databases with row oriented data. However, there are some applications where data may be arranged differently to make reading the data more efficient.</p>
<p>If you have a big table of data, and you want to pull out just a little bit of it using a query, the database server must determine the location of that data in the table by consulting the table&#8217;s index, and return the desired portion that matches the constraints given in the query. This makes the reading of data relatively expensive from a computational perspective. If data were instead arranged in lots of columns instead, it could be retrieved more efficiently, and the data could be more easily distributed over a larger number of servers yielding the horizontal scalability that cloud applications want. This works very well in cases where the number of reads are very high, but the data is not updated very frequently in proportion to the reads.</p>
<p>Let&#8217;s use a blog application as an example. Blog posts are written once, and maybe updated a few times, possibly once each time a comment is submitted. However, on a busy web site, a blog post may be read millions of times. If the posts were stored in a column oriented storage system like <a title="Cassandra" href="http://cassandra.apache.org/" target="_blank">Cassandra</a>, they could be quickly and easily retrieved using the id number of the blog post. The listing of recent blog posts can also be arranged in a column so that the front page of the blog site with the listing of the articles can be generated. Using this approach requires that the data be properly arranged as it&#8217;s stored, putting the computational burden on the (infrequent) write rather than on the (frequent) read.</p>
<p>Using a distributed system to store data in columns allows the data to be evenly distributed over an arbitrary number of servers, eliminating the central data bottleneck. Adding more servers in the correct proportion of application servers and storage servers can result in true horizontal scalability, meaning that the capacity increases as a direct proportion of how many servers are in the cluster.</p>
<p>Why doesn&#8217;t everyone do this already? For some good reasons:</p>
<ol>
<li>The concept of running applications in clouds is still relatively new. The related technology is still maturing.</li>
<li>Existing software tends to use SQL already. If you want to use an existing CMS platform, chances are it will require a central SQL database.</li>
<li>Most heavy-read workloads can be scaled well using data caching techniques. If applications don&#8217;t write data very often, it may not be necessary to scale beyond a single database server.</li>
<li>You must anticipate exactly how the application will use the data, and arrange it just right.</li>
<li>It may be harder to analyze the data. Once your data is arranged in a column store, you may not be able to query it in arbitrary ways. You may only be able to pull it out using it&#8217;s id numbers, or by systematically scanning all of it to find the parts you want.</li>
<li>Distributed data storage (aka: NoSQL) systems like Cassandra, Hbase, Redis, etc. are complicated, and there is a considerable learning curve associated with setting them up and maintaining them. In some cases these systems are not as good in terms of data durability or data consistency as the prevailing SQL database systems. These tradeoffs can be difficult to navigate.</li>
<li>Today&#8217;s software developers are very familiar with SQL as a data storage and access paradigm. They can very quickly develop software that relies on the ACID qualities of a SQL database.</li>
</ol>
<p>If you have an application that you want to deploy into a cloud, and you want it to be very elastic, you should think about the subject of how you arrange your data. If you use a centralized data design, you will probably have scalability bottlenecks when you add lots of servers. You should aim to decentralize the data in a way that you can easily add more servers to horizontally scale the environment, and not stumble on the limits of the database server. This is particularly important in situations where you need the application to write a lot of data, and a cache is not a suitable solution for you.</p>
<p>Over time, the reasons why not to use column oriented data will begin to shrink, and better tools and services will make it easier to do. Until then, I suggest that you carefully consider if you need maximum elasticity. If not, then it&#8217;s perfectly appropriate to keep using the same centralized row-oriented data paradigm. Use a cache like memcached in cases where you have heavy reads, and when it&#8217;s acceptable to show slightly outdated information to readers. The truth is that traditional solutions work really well for most web applications. However, if you have one of the more unique situations where you need true horizontal scalability, take a good look at a different arrangement for your data, and the systems and tools to make that possible for you in the cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/maximizing-elasticity-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Better Luhn Formula CC Validator for PHP</title>
		<link>http://adrianotto.com/2011/10/better-luhn-formula-validator-for-php/</link>
		<comments>http://adrianotto.com/2011/10/better-luhn-formula-validator-for-php/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 04:33:31 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=527</guid>
		<description><![CDATA[I was doing some work integrating with a payment gateway in a PHP application, and decided it would be a good idea to validate credit card numbers using a Luhn Algorithm formula prior to forwarding them to the payment gateway for processing. I looked for existing PHP ones, and found a few. The more I [...]]]></description>
			<content:encoded><![CDATA[<p>I was doing some work integrating with a payment gateway in a PHP application, and decided it would be a good idea to validate credit card numbers using a <a href="http://en.wikipedia.org/wiki/Luhn_algorithm" title="Luhn Algorithm" target="_blank">Luhn Algorithm</a> formula prior to forwarding them to the payment gateway for processing. I looked for existing PHP ones, and found a few.</p>
<p>The more I <a href="http://javier.rodriguez.org.mx/index.php/2005/12/26/luhn-algorithm-in-php" target="_blank" title="Bad Example">found</a> the less I liked any of them. Some of them actually had bugs or typos and did not work at all, and most of them would incorrectly validate a credit card number that was all zeros.</p>
<p>I wrote my own that I&#8217;m pretty happy with. It&#8217;s a good deal more efficient that most that I found. It does not repeat the same math on the same figures like some of them out there do.</p>
<div style="font-size: 0.8em"><code><span style="color: #000000"><br />
<span style="color: #0000BB">&lt;?php</p>
<p></span><span style="color: #FF0000">/*<br />
&nbsp;*&nbsp;&nbsp;&nbsp;Copyright&nbsp;2011&nbsp;Adrian&nbsp;Otto<br />
&nbsp;*<br />
&nbsp;*&nbsp;&nbsp;&nbsp;Licensed&nbsp;under&nbsp;the&nbsp;Apache&nbsp;License,&nbsp;Version&nbsp;2.0&nbsp;(the&nbsp;"License");<br />
&nbsp;*&nbsp;&nbsp;&nbsp;you&nbsp;may&nbsp;not&nbsp;use&nbsp;this&nbsp;file&nbsp;except&nbsp;in&nbsp;compliance&nbsp;with&nbsp;the&nbsp;License.<br />
&nbsp;*&nbsp;&nbsp;&nbsp;You&nbsp;may&nbsp;obtain&nbsp;a&nbsp;copy&nbsp;of&nbsp;the&nbsp;License&nbsp;at<br />
&nbsp;*<br />
&nbsp;*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0<br />
&nbsp;*<br />
&nbsp;*&nbsp;&nbsp;&nbsp;Unless&nbsp;required&nbsp;by&nbsp;applicable&nbsp;law&nbsp;or&nbsp;agreed&nbsp;to&nbsp;in&nbsp;writing,&nbsp;software<br />
&nbsp;*&nbsp;&nbsp;&nbsp;distributed&nbsp;under&nbsp;the&nbsp;License&nbsp;is&nbsp;distributed&nbsp;on&nbsp;an&nbsp;"AS&nbsp;IS"&nbsp;BASIS,<br />
&nbsp;*&nbsp;&nbsp;&nbsp;WITHOUT&nbsp;WARRANTIES&nbsp;OR&nbsp;CONDITIONS&nbsp;OF&nbsp;ANY&nbsp;KIND,&nbsp;either&nbsp;express&nbsp;or&nbsp;implied.<br />
&nbsp;*&nbsp;&nbsp;&nbsp;See&nbsp;the&nbsp;License&nbsp;for&nbsp;the&nbsp;specific&nbsp;language&nbsp;governing&nbsp;permissions&nbsp;and<br />
&nbsp;*&nbsp;&nbsp;&nbsp;limitations&nbsp;under&nbsp;the&nbsp;License.<br />
&nbsp;*/</p>
<p></span><span style="color: #007700">function&nbsp;</span><span style="color: #0000BB">luhn_validate</span><span style="color: #007700">(</span><span style="color: #0000BB">$s</span><span style="color: #007700">)&nbsp;{<br />
&nbsp;&nbsp;if(</span><span style="color: #0000BB">0</span><span style="color: #007700">==</span><span style="color: #0000BB">$s</span><span style="color: #007700">)&nbsp;{&nbsp;return(</span><span style="color: #0000BB">false</span><span style="color: #007700">);&nbsp;}&nbsp;</span><span style="color: #FF0000">//&nbsp;Don't&nbsp;allow&nbsp;all&nbsp;zeros<br />
&nbsp;&nbsp;</span><span style="color: #0000BB">$sum</span><span style="color: #007700">=</span><span style="color: #0000BB">0</span><span style="color: #007700">;<br />
&nbsp;&nbsp;</span><span style="color: #0000BB">$i</span><span style="color: #007700">=</span><span style="color: #0000BB">strlen</span><span style="color: #007700">(</span><span style="color: #0000BB">$s</span><span style="color: #007700">);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF0000">//&nbsp;Find&nbsp;the&nbsp;last&nbsp;character<br />
&nbsp;&nbsp;</span><span style="color: #007700">while&nbsp;(</span><span style="color: #0000BB">$i</span><span style="color: #007700">--&nbsp;&gt;&nbsp;</span><span style="color: #0000BB">0</span><span style="color: #007700">)&nbsp;{&nbsp;</span><span style="color: #FF0000">//&nbsp;Iterate&nbsp;all&nbsp;digits&nbsp;backwards<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$sum</span><span style="color: #007700">+=</span><span style="color: #0000BB">$s</span><span style="color: #007700">[</span><span style="color: #0000BB">$i</span><span style="color: #007700">];&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF0000">//&nbsp;Add&nbsp;the&nbsp;current&nbsp;digit<br />
&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;If&nbsp;the&nbsp;digit&nbsp;is&nbsp;even,&nbsp;add&nbsp;it&nbsp;again.&nbsp;Adjust&nbsp;for&nbsp;digits&nbsp;10+&nbsp;by&nbsp;subtracting&nbsp;9.<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">(</span><span style="color: #0000BB">0</span><span style="color: #007700">==(</span><span style="color: #0000BB">$i</span><span style="color: #007700">%</span><span style="color: #0000BB">2</span><span style="color: #007700">))&nbsp;?&nbsp;(</span><span style="color: #0000BB">$s</span><span style="color: #007700">[</span><span style="color: #0000BB">$i</span><span style="color: #007700">]&nbsp;&gt;&nbsp;</span><span style="color: #0000BB">4</span><span style="color: #007700">)&nbsp;?&nbsp;(</span><span style="color: #0000BB">$sum</span><span style="color: #007700">+=(</span><span style="color: #0000BB">$s</span><span style="color: #007700">[</span><span style="color: #0000BB">$i</span><span style="color: #007700">]-</span><span style="color: #0000BB">9</span><span style="color: #007700">))&nbsp;:&nbsp;(</span><span style="color: #0000BB">$sum</span><span style="color: #007700">+=</span><span style="color: #0000BB">$s</span><span style="color: #007700">[</span><span style="color: #0000BB">$i</span><span style="color: #007700">])&nbsp;:&nbsp;</span><span style="color: #0000BB">false</span><span style="color: #007700">;<br />
&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;return&nbsp;(</span><span style="color: #0000BB">0</span><span style="color: #007700">==(</span><span style="color: #0000BB">$sum</span><span style="color: #007700">%</span><span style="color: #0000BB">10</span><span style="color: #007700">))&nbsp;;<br />
}&nbsp;</p>
<p></span><span style="color: #0000BB">?&gt;</span><br />
</span><br />
</code></div>
<p>The function contains 7 lines of code. Can you make this function better without making it harder to read and understand? Please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/better-luhn-formula-validator-for-php/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>I&#8217;m Paranoid, just like you!</title>
		<link>http://adrianotto.com/2011/10/im-paranoid-just-like-you/</link>
		<comments>http://adrianotto.com/2011/10/im-paranoid-just-like-you/#comments</comments>
		<pubDate>Fri, 07 Oct 2011 00:04:05 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://www.adrianotto.com/?p=13</guid>
		<description><![CDATA[By: Adrian Otto Over the years I’ve administered email systems that provided service to thousands of end user’s mailboxes. In the early years in the 1990’s most woes of a mail system administrator were about how to instrument the setting up of email accounts and related client settings, and changing passwords when they were forgotten [...]]]></description>
			<content:encoded><![CDATA[<p>By: Adrian Otto</p>
<p>Over the years I’ve administered email systems that provided service to thousands of end user’s mailboxes. In the early years in the 1990’s most woes of a mail system administrator were about how to instrument the setting up of email accounts and related client settings, and changing passwords when they were forgotten by end users.</p>
<p>As the internet became more and more commercialized, spam exploded in our face. Everyone hates spam. Mail administrators hate it with a passion. They are doing everything they can to try and fight it&#8230; they filter, they black-hole, they tattle to abuse@whatever.com about it. Sometimes their own users send spam, and they get black-holed and need to jump through hoops to undo the damage.</p>
<p>At the time I reached my breaking point I managed email for about a dozen domain names, probably about two hundred mailboxes in total. I hated it. I hated every waking moment of it. The RBL’s that worked one day did not work the next. I’m convinced that e-mail system administration is the nastiest dirtiest job there is for a sysadmin.</p>
<p>People kept suggesting to me that I outsource email, which I shrugged off. I had problems with outsourcing:</p>
<p style="padding-left: 30px;"><strong>1) I’m Paranoid about Uptime.<br />
</strong></p>
<p style="padding-left: 30px;">It’s hard for me to trust other people, let alone trust a company. And trusting a company with something as important as my email??? No way. I’m a control freak, and I was going to keep control at all costs. Yes, I hated email system administration. I wasn’t even a sysadmin any more, but I still did it just so that I could control it. It needed to be highly available. I simply could not trust anyone to do it better than me.</p>
<p style="padding-left: 30px;"><strong>2) I’m Paranoid about Security.<br />
</strong></p>
<p style="padding-left: 30px;">Although email is inherently an insecure communication mechanism, all sorts of highly sensitive information is in there anyway. What would happen if a competitor would somehow get control of our email and read it. They could learn all of our secrets. No way, I’m keeping control of the security so that I know it’s locked down as much as humanly possible.</p>
<p style="padding-left: 30px;"><strong>3) I’m Paranoid about Reliability and Control.<br />
</strong></p>
<p style="padding-left: 30px;">If something goes wrong, I want to be able to fix it quick. If I host it, I have full control of everything in the system. I can find what’s wrong and fix it fast. I’m really good at that.</p>
<p>I became a source code contributor for an open source email filtering system called bogofilter that uses Bayes filters to learn what’s spam and what’s not and filter based on that. I thought my spam filtering setup was the bomb! It worked great!</p>
<p>I got busier and busier with my work. I administered my email systems less and less. The better they worked, the less I would work on them because I had other fish to fry. The spammers got smarter and smarter, and soon enough my super cool spam filtering setup was becoming less and less effective.</p>
<p>So in 2006 something happened. I got super frustrated with spam administration. I was tired of having to keep finding or inventing better mouse traps to trap that nasty spam. So I thought to myself&#8230; There is an unlimited desire to send spam. Why? Because it works. If it did not work, the spammers would not be so determined to keep doing it. They are doing everything they can to outsmart you to get mail in your inbox. They keep getting smarter and smarter.</p>
<p>I thought some more&#8230; It’s like viruses. The hackers keep making better viruses, and the virus scanner software companies keep making their virus scanners better to clean them up and block them out. I needed something like virus scan, but for my email. I thought about all the technical ways to do it. I started hunting the web to find answers. I just wanted SOMEONE&#8230; anyone to handle this spam nonsense for me.</p>
<p>In the process, I stumbled across a company called “Webmail.us” (Later acquired by <a href="http://www.rackspace.com" target="_blank">Rackpace</a> and now called “<a href="//www.rackspace.com/email_hosting" target="_blank">Rackspace Email</a>”). They had a great web site, said (at the time) they had 700,000 mailboxes in service. They had a complete spam filtering solution built in. The mailbox hosting was cheap. So cheap I could not ignore it. They were charging less for complete hosting of mailboxes than I was willing to pay for outsourced spam filtering.</p>
<p>In 2006 I did an experiment. I put my own domain name where I get my home email on webmail.us to see how it worked. I told myself that if it worked really well that I might switch all my email over to it, and wash my hands of email sysadmin work and all the spam nonsense that goes along with it. I did it for a month. It worked great. It was fast, it never went down. I got no spam. I was thrilled!</p>
<p><strong>I did the unthinkable. I outsourced my email!</strong></p>
<p>One by one I migrated all of my domains, and all my mail users over to the hosted system. I have never looked back. The system has been rock solid. The few problems I’ve seen over the past three years have been really minor, and solved more quickly than I would have been able to solve using my own systems. I had been converted.</p>
<p>I was so happy to finally be free of all the nuisance of administering email and spam filtering systems. It was great. Years later I ended up working with Rackspace, and told them the story of how I used and loved the email platform. I later met the people behind the system, and it was no wonder that it works as well as it does.</p>
<p><strong>If you are still administering your own email&#8230;</strong> especially if you are running an Exchange system in your own office building. You need to take a serious look in the mirror and ask yourself why you are not outsourcing it to <a href="//www.rackspace.com/email_hosting" target="_blank">Rackspace Email</a>. The truth is:</p>
<p style="padding-left: 30px;">1) It’s more expensive to host it internally. Run the numbers.<br />
2) Your uptime it a lot worse. Measure it.<br />
3) Your security is no stronger. Audit it.<br />
4) You are paranoid, just like me. Yes, you are.</p>
<p>You trust your bank with your money. You trust your phone company not to spy on all your phone calls. You do this stuff without worrying about it. These things are much bigger leaps of trust than outsourcing your email.</p>
<p>From me to you&#8230; do yourself a favor. Run the same experiment I did. You’ll be delighted. I work for Rackspace now, so my view is corrupt, right? Don&#8217;t take my word for it, because you&#8217;re paranoid. Just try it and see.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/10/im-paranoid-just-like-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is a Cloud Platform?</title>
		<link>http://adrianotto.com/2011/02/cloud-platform/</link>
		<comments>http://adrianotto.com/2011/02/cloud-platform/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 22:02:51 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=462</guid>
		<description><![CDATA[Definition of Cloud Platform: A system where software applications may be run in an environment composed of utility cloud services in a logically abstract environment. Definition of PaaS: Platform as a Service. A Cloud Platform offered by a service provider as a hosted service which facilitates the deployment of software applications without the cost and [...]]]></description>
			<content:encoded><![CDATA[<p>Definition of <strong>Cloud Platform</strong>:</p>
<p>A system where software applications may be run in an environment composed of utility cloud services in a logically abstract environment.</p>
<p>Definition of <strong>PaaS</strong>:</p>
<p>Platform as a Service. A Cloud Platform offered by a service provider as a hosted service which facilitates the deployment of software applications without the cost and complexity of acquiring and managing the underlying hardware and software layers.</p>
<p>Examples of PaaS:</p>
<ol>
<li><a href="http://code.google.com/appengine/">Google AppEngine</a></li>
<li><a href="http://force.com">Force.com</a></li>
</ol>
<p>The key drawback of most PaaS services that exist today is vendor lock-in. Implementing an application using a proprietary platform means that it becomes difficult or impossible to move a deployed application from one service provider to another, unless they have compatible Cloud Platforms</p>
<p><a href="http://www.openstack.org"><img class="alignright size-full wp-image-356" title="OpenStack" src="http://cdn.adrianotto.com/wp-content/uploads/2010/09/openstacklogo.jpg" alt="" width="174" height="179" /></a>I plan to address this problem by embracing open source solutions that allow numerous service providers to host applications in a compatible PaaS model where applications can be easily moved with little or no modification required.</p>
<p>Keep an eye on the <a href="http://www.openstack.org">OpenStack</a> project in the upcoming weeks and months for blueprints for a PaaS system for the whole world to use. I invite you to participate in the design and future implementation of the cloud platform that will end vendor lock-in concerns, and simplify application hosting for software developers.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/02/cloud-platform/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>One Step Closer to Ideal</title>
		<link>http://adrianotto.com/2011/02/one-step-closer-to-ideal/</link>
		<comments>http://adrianotto.com/2011/02/one-step-closer-to-ideal/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 20:04:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=459</guid>
		<description><![CDATA[Rackspace announced today that they are no longer charging per-request fees for access to their data on the Cloud Files service. This is good news for those of you who want to closely integrate an application with a cloud storage service. The leasing service in this space is Amazon&#8217;s S3 which has a rather convoluted [...]]]></description>
			<content:encoded><![CDATA[<p>Rackspace <a href="http://www.rackspacecloud.com/blog/2011/02/04/cloud-files-puts-an-end-to-request-charges/" target="_blank">announced today</a> that they are no longer charging per-request fees for access to their data on the <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/">Cloud Files</a> service. This is good news for those of you who want to closely integrate an application with a cloud storage service. The leasing service in this space is Amazon&#8217;s S3 which has a rather convoluted pricing scheme that&#8217;s pretty hard to understand. I&#8217;m happy to see that Rackspace is helping to keep things simple. Now all you pay for when using Cloud Files is your used storage capacity and the bandwidth for the data transfer.</p>
<p>This announcement is on the heels of another <a href="http://www.rackspacecloud.com/blog/2011/01/12/big-news-for-cloud-files-users-akamai-is-coming/">recent announcement</a> that the public web delivery of files from the Cloud Files service now uses the world&#8217;s leading content distribution network from <a href="http://www.akamai.com/">Akamai</a>. This means that if you are looking for somewhere to host an extensive media or video archive, Cloud Files is definitely worth your consideration.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/02/one-step-closer-to-ideal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Congestion, Convoy Effect, and Thundering Herd</title>
		<link>http://adrianotto.com/2011/01/convoy-effect/</link>
		<comments>http://adrianotto.com/2011/01/convoy-effect/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 00:52:13 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=441</guid>
		<description><![CDATA[When you have a limited supply of some resource, and a demand for that resource that exceeds the supply, you have something economists call a shortage. In this article I explain how we deal with shortages of resources (called congestion) in software systems. In economics we learn about the concept of Supply and Demand. Simply [...]]]></description>
			<content:encoded><![CDATA[<p>When you have a limited supply of some resource, and a demand for that resource that exceeds the supply, you have something economists call a shortage. In this article I explain how we deal with shortages of resources (called congestion) in software systems.</p>
<p>In economics we learn about the concept of <a href="http://en.wikipedia.org/wiki/Supply_and_demand" target="_blank">Supply and Demand</a>. Simply put, as something becomes more scarce, the price will adjust upward until an equilibrium is met between the supply and the demand. This holds true for situations where the pricing can be adjusted.</p>
<p>So, what happens when the pricing is fixed? You need some strategy to deal with the scarcity. A discipline known as <a href="http://en.wikipedia.org/wiki/Queueing_theory" target="_blank">queueing theory</a> can describe this. Let&#8217;s think about a common example of a system where the demand (work) exceeds the supply (capacity). I will explain these in terms of system engineering, as they would pertain to a software system.</p>
<p>In the grocery store there are a finite number of checkout clerks. We will call the lines of waiting shoppers <em>queues</em>. If queue length increases, the system is experiencing <em>congestion</em>. When this happens the store manager calls in more <em>workers</em> (store employees) to help with checkouts. This capacity expansion process may continue until all checkstands are open, or until there are no more available workers to add. This condition is the <em>maximum capacity</em> of the system.  If we are at maximum capacity, we hope that the congestion is eliminated. If expanding the capacity did not alleviate the congestion, then the shoppers must wait. New shoppers entering the store may see the long queues, and decide to leave. Some people standing in line may <em>abandon</em> the line, and leave the store without completing their intended purchase.</p>
<p>Having multiple shorter queues gives shoppers the illusion that the congestion is lower than it actually is. Some checkout queues move more quickly than others, so having one queue per checkstand is not a fair system. A shopper may wait longer than they deserve because of what queue they select. Sometimes shoppers will jump from one queue to another if they feel they won&#8217;t have to wait as long.</p>
<p><strong>Lesson One: Use a combined queue instead of multiple separate queues<br />
</strong></p>
<p>When all shoppers have equal priority, it&#8217;s better to use a common queue that&#8217;s serviced by multiple workers. This way every shopper in the queue waits a fair share of the congestion backlog. Nobody is forced to wait longer because of queue selection, since there is only one. This also eliminates the inefficiency of changing between queues when one is slow.</p>
<p><strong>Lesson Two: When using a priority queue, limit the number of priority waiters</strong></p>
<p>This leads to the temptation to have multiple classifications of shared queues. If you want to have a sense of a premium service for select <em>priority</em> shoppers, you can create two queues. The <em>priority queue</em> is for your select priority shoppers, and the main queue is for everyone else. This works when the population of priority shoppers is smaller than the general population and they arrive in the line infrequently. In this situation, the next available worker can select a shopper to service from the priority queue before they service someone from the main queue. If there are too many priority shoppers, the main queue does not get serviced, which leads to further congestion, and eventually the waiters will abandon the main queue.</p>
<p><strong>Lesson Three: Use Admission Control</strong></p>
<p>Now, you are familiar with the example of using the store, and simple queues, and the concept of consolidating multiple queues into a single queue, or the combination of a single queue and a priority queue. These are generally easy concepts to implement in software, and for most cases where congestion is rare, they work fine. However there are some conditions where they don&#8217;t work at all. Remember that in software systems, queued jobs may or may not be able to abandon the queue. Here are examples of when a simple queue with no admission control will fail:</p>
<ol>
<li>When you have the ability to create an unlimited number of workers, but they all rely on some shared pool or pools of limited resources. This is called a <em>concurrency bottleneck</em>. For example, when you have a finite amount of CPU resources, but the ability to create an effectively unlimited number of threads. There is an upper limit to the number of additional <em>throughput</em> gained by adding more threads when the CPU is congested.</li>
<li>When there is an unlimited number of <em>jobs</em> (waiters) to service, that arrive suddenly in bursts or steadily at a rate that&#8217;s faster than the combined capacity of all your workers. This is called a <em>work overflow</em>.</li>
</ol>
<p>The concept of <em>admission control</em> allows you to implement a policy controlling how much work you will accept into your queue(s) and at what rate. One simple policy is a maximum queue length. This length<em> (limit)</em> can be chosen by considering your desired maximum wait time <em>(max)</em> and your average service time (<em>t)</em> using the formula:</p>
<p><em>limit = ( max / t )</em></p>
<p>When <em>limit</em> jobs are queued, you must not queue new work, but instead refuse the work. Some protocols have a response code that can be used for this. For example, in HTTP applications, you can return a 502 Server Busy response with an optional Retry-After header that indicates to the client that they may retry the request at the specified later time when the congestion may be gone.</p>
<p>A more advanced policy may use a strategy where the source of work and the type of request, and apply a different policy. For example, you may limit the rate at which you accept requests from a given IP Address or geography. You may prioritize write requests over read requests. If an individual resource is congested, you may reject requests for that resource, but accept other requests.</p>
<p>There are a number of advantages to using an admission control policy:</p>
<ol>
<li>You don&#8217;t end up overloading your system to the point where service quality degrades for everyone due to extreme congestion conditions.</li>
<li>You can temporarily smooth out your work pattern so that the work can be completed at some later time when a temporary surge in demand has subsided.</li>
<li>It is very easy to monitor for congestion. A simple external check of your service will return an error during congestion conditions.</li>
<li>If the system has elastic capability to add more worker resources, this can be triggered when queue lengths are consistently non-zero over a span of time, and reduced when queue lengths are consistently zero over a span of time.</li>
</ol>
<p><strong>Lesson Four: Beware of the Thundering Herd and the Convoy Effect</strong></p>
<p>If you have a workload that experiences sudden bursts where new work is rapidly queued, and you have not properly limited the number of workers to a level of concurrency that you can manage with your available system resources, then you may experience a condition known as <a href="http://en.wikipedia.org/wiki/Resource_starvation" target="_blank"><em>resource starvation</em></a>. This can lead to a complete system failure, typically in a cascading series of related failures. When your system is in a state of resource starvation, it is too busy handling multitudes of concurrent work requests that it&#8217;s unable to make meaningful progress servicing the work. Nothing completes in a reasonable time because you are too busy switching between all the requests making tiny increments of progress on each.</p>
<p>A <em>Convoy Effect</em> is where you have numerous threads or processes that are <em>blocked</em> (suspended making no progress) on some resource (like an I/O operation, or a lock) and then cause additional congestion when the blocking condition clears. At this point all the runnable threads are synchronized together in a convoy, requesting other scarce resources (like Disk I/O bandwidth to write a lock file) and that the existence of the convoy is causing the delays to be further delayed than if there were simply a long queue of work. For example:</p>
<ol>
<li>A long queue of requests is present.</li>
<li>All workers dequeue the requests rapidly, but all of them need to read and write to the same file.</li>
<li>Concurrent reads are allowed with a shared read lock, and concurrent writes are protected by an exclusive write lock.</li>
<li>The underlying lock implementation uses a <a href="http://en.wikipedia.org/wiki/Spinlock">spinlock</a>.</li>
<li>Multitudes of concurrent readers cause write access to the file to be substantially slowed down because the disk drive is chattering doing all the reads.</li>
<li>All workers become synchronized on the exclusive write lock, slowing down the rate of progress, and essentially causing the system to be in a state of <a href="http://www.webopedia.com/TERM/L/livelock.html" target="_blank">live lock</a>.</li>
</ol>
<p>A <a href="http://en.wikipedia.org/wiki/Thundering_herd_problem"><em>Thundering Herd</em></a> is a situation where a multitude of serialized processes are blocked waiting on an event. When the event happens, all processes become runnable, but only one of them can be serviced at a time, so all the others must become blocked again waiting on a new event. This condition causes throughput of work to be suboptimal because of the wasted effort of the blocked processed getting woken up and going back to sleep. This is generally solved by only waking one of the blocked processes at a time.</p>
<p>So, a Thundering Herd will typically lead to a Convoy Effect, leaving your system in a critical state of congestion. Solve this by using sensible admission control, and more sophisticated queuing strategies.</p>
<p><strong>Lesson Five: Multiplex de-queue of identical requests</strong></p>
<p>If you have a queue of requests that are likely to have multiple identical queued entries, you can optimize your service of that queue by using a <a href="http://en.wikipedia.org/wiki/Multiplexing">Multiplexing</a> technique. Instead of each worker simply taking one request off the queue and servicing it, consider this approach instead:</p>
<ol>
<li>Read the first request on the queue.</li>
<li>Scan remaining entries in the queue for other identical requests, de-queuing them together as a batch. You might de-queue up to some maximum number per batch, or de-queue all matches in a single batch. This depends on how large your responses will be, and how many items you have in queue.</li>
<li>Process the request, sending the response to all clients in the batch at once.</li>
</ol>
<p>Note: this is only appropriate in use cases where out-of-order responses are gracefully handled by the client.</p>
<p>You will need to be careful that your queue implementation will allow you to safely de-queue any entry in the queue without leading to a <a href="http://en.wikipedia.org/wiki/Race_condition" target="_blank">race condition</a> where multiple workers are trying to de-queue the same entries simultaneously. You may decide that batching requests when they are queued up is more efficient than searching for matching requests when it&#8217;s time to de-queue. It would work in a similar way:</p>
<ol>
<li>Treat all queued entries as batches, with at least one request in each, up to some maximum.</li>
<li>Hash the request as you receive it, and check to see if you have an existing entry or entries in the queue for the given request.</li>
<li>If have a matching entry in the queue, then temporarily lock that entry, and add the new request&#8217;s client details (or connection handle) to the existing entry.</li>
<li>If the batch is full, iterate to the next, creating a new batch when you reach the end of the list.</li>
<li>Workers simply read the next batch from the queue, and send the response to all the associated clients.</li>
</ol>
<p>In general, you want your workers de-queueing work faster than you can queue it up. If you do your batching upon receipt of the request, you avoid the risk that the efficiency cost of the batching does not lead to a Convoy Effect.</p>
<p>Feedback welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/01/convoy-effect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Safe Investing with Banks?</title>
		<link>http://adrianotto.com/2011/01/safe-banking/</link>
		<comments>http://adrianotto.com/2011/01/safe-banking/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 20:01:31 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=426</guid>
		<description><![CDATA[This morning, I found a web site highlighted by Refdesk, my favorite reference web site. The link was: FDIC: Failed Bank List The FDIC is often appointed as receiver for failed banks. This page contains useful information for the customers and vendors of these banks. This includes information on the acquiring bank (if applicable), how [...]]]></description>
			<content:encoded><![CDATA[<p>This morning, I found a web site highlighted by <a href="http://www.refdesk.com" target="_blank">Refdesk</a>, my favorite reference web site. The link was:</p>
<blockquote><p><a href="http://www.fdic.gov/bank/individual/failed/banklist.html" target="_blank">FDIC: Failed Bank List</a></p>
<p>The FDIC is often appointed as receiver for failed banks. This page contains useful information for the customers and vendors of these banks. This includes information on the acquiring bank (if applicable), how your accounts and loans are affected, and how vendors can file claims against the receivership. This list includes banks which have failed since October 1, 2000.</p></blockquote>
<p>The data had a link to download the data in CSV format, so I decided to quickly plot it and take a look.</p>
<div id="attachment_436" class="wp-caption alignnone" style="width: 591px"><a href="http://cdn.adrianotto.com/wp-content/uploads/2011/01/bank_failures_2000_to_2010.png"><img class="size-full wp-image-436 " title="Bank Failures betwween 2000 and 2011" src="http://cdn.adrianotto.com/wp-content/uploads/2011/01/bank_failures_2000_to_2010.png" alt="Bank Failures betwween 2000 and 2011" width="581" height="213" /></a><p class="wp-caption-text">Bank Failures betwween 2000 and 2011</p></div>
<p>The interesting part of the data happens between 2006 and 2011. Because we are only part way through January 2011 at the time of this post, I excluded January&#8217;s data from the chart. Already two banks have failed in 2011.</p>
<div id="attachment_425" class="wp-caption alignnone" style="width: 591px"><a href="http://cdn.adrianotto.com/wp-content/uploads/2011/01/bank_failures_2006_to_2010.png"><img class="size-full wp-image-425 " title="Bank Failures between 2006 and 2011" src="http://cdn.adrianotto.com/wp-content/uploads/2011/01/bank_failures_2006_to_2010.png" alt="Bank Failures between 2006 and 2011" width="581" height="363" /></a><p class="wp-caption-text">Bank Failures between 2006 and 2011</p></div>
<p>As you can clearly see, we had a lot of volatility between Mid 2009 and today. The trend is clearly visible back through 2008 as well. Almost all of these bank closures resulted in an acquisition of the accounts by another bank. Of the 351 failures shown above, only 24 resulted in &#8220;No Acquirer&#8221;. Those were:</p>
<pre>+--------------------------------+------------------+-------+------+
| bank_name                      | city             | state | year |
+--------------------------------+------------------+-------+------+
| NextBank, NA                   | Phoenix          | AZ    | 2002 |
| New Century Bank               | Shelby Township  | MI    | 2002 |
| AmTrade International Bank     | Atlanta          | GA    | 2002 |
| Bank of Alamo                  | Alamo            | TN    | 2002 |
| Dollar Savings Bank            | Newark           | NJ    | 2004 |
| MagnetBank                     | Salt Lake City   | UT    | 2009 |
| Omni National Bank             | Atlanta          | GA    | 2009 |
| FirstCity Bank                 | Stockbridge      | GA    | 2009 |
| First Bank of Beverly Hills    | Calabasas        | CA    | 2009 |
| New Frontier Bank              | Greeley          | CO    | 2009 |
| Silverton Bank, NA             | Atlanta          | GA    | 2009 |
| Community Bank of West Georgia | Villa Rica       | GA    | 2009 |
| Community Bank of Nevada       | Las Vegas        | NV    | 2009 |
| Platinum Community Bank        | Rolling Meadows  | IL    | 2009 |
| Citizens State Bank            | New Baltimore    | MI    | 2009 |
| RockBridge Commercial Bank     | Atlanta          | GA    | 2009 |
| Barnes Banking Company         | Kaysville        | UT    | 2010 |
| Advanta Bank Corp.             | Draper           | UT    | 2010 |
| Centennial Bank                | Ogden            | UT    | 2010 |
| Waterfield Bank                | Germantown       | MD    | 2010 |
| Lakeside Community Bank        | Sterling Heights | MI    | 2010 |
| Arcola Homestead Savings Bank  | Arcola           | IL    | 2010 |
| Ideal Federal Savings Bank     | Baltimore        | MD    | 2010 |
| First Arizona Savings, A FSB   | Scottsdale       | AZ    | 2010 |
+--------------------------------+------------------+-------+------+
</pre>
<p>According to a <a href="http://en.wikipedia.org/wiki/Banking_in_the_United_States" target="_blank">Wikipedia page about US Banking</a>, this is from a total population of 8430 FDIC insured banks. The total failure rate is 4.13%, and the total that had ho acquiring institution was 0.3%. So, it&#8217;s not quite as shocking as it seems at first glance.</p>
<p>Now, I wonder to myself in today&#8217;s economic climate who&#8217;s banking with the likes of First Arizona Savings in Scottsdale, AZ. I suppose it does not matter too much I suppose, since your deposits are insured by the FDIC up to $250,000 and the FDIC will repo the bank if it&#8217;s at serious risk. So if you happen to have a huge bank balance (&gt;$250K), consider finding some sensible investment strategy for that money. The risk of leaving it in a little community bank is pretty frightening.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2011/01/safe-banking/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Drizzle Lines of Code vs. MySQL</title>
		<link>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/</link>
		<comments>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/#comments</comments>
		<pubDate>Wed, 13 Oct 2010 23:07:38 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=408</guid>
		<description><![CDATA[In my recent post about Drizzle I suggested that because Drizzle has fewer lines of code (less than half compared to MySQL) that is has a lower intrinsic risk of software defects. Of course it has bugs of its own, but because Drizzle is focused squarely on OLTP use cases, it can be substantially smaller [...]]]></description>
			<content:encoded><![CDATA[<p>In my <a href="../2010/09/drizzle-is-now-beta/" target="_blank">recent post about Drizzle</a> I suggested that because <a href="http://drizzle.org" target="_blank">Drizzle</a> has fewer lines of code (less  than half compared to <a href="http://www.mysql.com">MySQL</a>) that is has a lower intrinsic risk of  software defects. Of course it has bugs of its own, but because Drizzle  is focused squarely on OLTP use cases, it can be substantially smaller in terms of lines of code. In fact, when I used <a href="http://www.ohloh.net/p/compare" target="_blank">Ohloh</a> to compare the source code line count of MySQL and Drizzle the picture was very clear:</p>
<p><img class="size-full wp-image-409 alignnone" title="mysql-drizzle-loc" src="http://cdn.adrianotto.com/wp-content/uploads/2010/10/mysql-drizzle-loc.png" alt="" width="593" height="298" /></p>
<p>Part of the reason that the code count is smaller is because the non-relevant parts have simply been removed, but there was also a lot of effort put into modernizing the code base, and using C++. Using Boost and other foundational software projects as building blocks, the need for internal implementations for basic things is greatly reduced. For example, the MySQL code base has a REGEX implementation in it. This was removed in Drizzle, and replaced by PCRE, which is expected to change again soon to use Boost. By using libraries that are leveraged by numerous projects, the probability of a lower bug count is an additional benefit. Simply put, with more smart people looking at and using the software, and improving its weaknesses, the more likely it is to be high quality.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/10/drizzle-lines-of-code-vs-mysql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Drizzle is now BETA</title>
		<link>http://adrianotto.com/2010/09/drizzle-is-now-beta/</link>
		<comments>http://adrianotto.com/2010/09/drizzle-is-now-beta/#comments</comments>
		<pubDate>Wed, 29 Sep 2010 19:21:00 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=376</guid>
		<description><![CDATA[Today Drizzle enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drizzle.org"><img class="alignright size-full wp-image-222" title="Drizzle Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/drizzle64.png" alt="" width="64" height="64" /></a>Today <a href="http://drizzle.org" target="_blank">Drizzle</a> enters BETA. Drizzle is an evolution of MySQL that&#8217;s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that&#8217;s not good for web applications. This idea has been endorsed by large corporate sponsors, including <a href="http://www.sun.com" target="_blank">Sun Microsystems</a> in the early days, and now <a href="http://www.rackspace.com/">Rackspace</a>. Most of the code is contributed by the <a href="https://launchpad.net/drizzle">developer community</a>, which is made up of of a very talented group of open source developers with core committers from four different companies. More about the Drizzle project:</p>
<h3>Charter</h3>
<ul>
<li>A database optimized for Cloud infrastructure and Web applications</li>
<li>Design for massive concurrency on modern multi-cpu architecture</li>
<li>Optimize memory for increased performance and parallelism</li>
<li>Open source, open community, open design</li>
</ul>
<h3>Scope</h3>
<ul>
<li>Re-designed modular architecture providing plugins with defined APIs</li>
<li>Simple design for ease of use and administration</li>
<li>Reliable, ACID transactional</li>
</ul>
<p>There are many exciting changes, such as optimizing everything for 64-bit CPU&#8217;s and Multi-Core. You can&#8217;t hardly even buy 32-bit and Single Core servers nowadays if you want them. It makes no sense to have software that&#8217;s optimized for these antiquated hardware designs. No effort is spent optimizing software to work with rotational hard drives because SSD drives are the way of the future. All the language collations have simply been replaced with UTF-8 only, because the web uses UTF-8. Plus, this is tested with 41 different language translations. Drizzle has a new scheduler. The legacy MySQL scheduler was designed to work for a thread-per-session setup. In Drizzle, sessions are handled independently from the threads. The new scheduler allows this to work.</p>
<p>Drizzle uses InnoDB as its default storage engine, which is great for OLTP. It also supports the <a href="http://www.primebase.org/" target="_blank">PBXT</a> storage engine. There are available plugins for the InnoDB Embedded Engine and <a href="http://www.haildb.com/" target="_blank">HailDB</a> which will soon be the new default. DDL Operations (like ALTER TABLE) can actually roll back in the event that something goes wrong in the process, rather than leaving you with incomplete or corrupt data.</p>
<p>The code base in Drizzle has been fully modernized, and brought up to today&#8217;s standards of C++ with extensive use of the <a href="http://en.wikipedia.org/wiki/Standard_Template_Library">C++ STL</a> to replace MySQL&#8217;s usage of obscure custom data type implementations that offered no real benefit compared to what the STL has today. Another example of improvements in this area is the replacement of the legacy REGEX implementation with a more standard library. All of these changes reduce the amount of Drizzle source code dramatically compared to MySQL. Less code and simpler code means less bugs, plain and simple. Drizzle is well on its way to being an ideal fit for web applications that need a reliable, and high performance transactional database.</p>
<h3>Features in Drizzle7 Beta</h3>
<ul>
<li>New micro kernel</li>
<li>Migration Tool</li>
<li>Instance Catalog Support</li>
<li>Universal Replication</li>
<li>User query analysis</li>
<li>Mutli-core Support</li>
</ul>
<h3>What &#8220;Beta&#8221; means</h3>
<ul>
<li>Your data is safe. Transactional engine by default and stable for over 2 years.</li>
<li>Upgrade the system in-place without exporting/importing data.</li>
<li>Replication is still being tested.</li>
</ul>
<p>In Microsoft terms, it means that this project would have launched about a year ago. In Google terms, it probably would have launched six months ago. Simply put, if you trust your data to a MySQL system today running InnoDB, you should feel comfortable trying Drizzle. There have been some changes to the InnoDB setup, such as the elimination of the FRM files from disk which eliminate possible inconsistency between the state on disk and the state in InnoDB. I am in the process of moving a few of my produciton applications to use the Drizzle Beta. If you&#8217;re an accomplished system administrator and DBA, you should seriously consider putting at least one of your production applications on Drizzle now, and see how it works for you.</p>
<h3>What&#8217;s Next?</h3>
<ul>
<li>Beta <a href="https://launchpad.net/drizzle/+announcement/6840" target="_blank">announced today 2010-09-29</a>.</li>
<li>GA February 2011</li>
<li>GA May 2011 for Multi-Tenancy features that allow an arbitrary number of logical databases (Schemas, Tables, etc.) to exist concurrently with full data isolation between them. This allows for individual security and resource controls (Threads, Memory, IO), and individual database backups, rather than system level backups. This feature will be called &#8220;Catalogs&#8221;.</li>
</ul>
<h3>Download Drizzle</h3>
<p>Time to get started with the beta. Download <a href="https://launchpad.net/drizzle/elliott/2010-09-27">the beta</a> today!</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/drizzle-is-now-beta/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>OpenStack Object Storage is Great For&#8230;</title>
		<link>http://adrianotto.com/2010/09/openstack-os-is-great-for/</link>
		<comments>http://adrianotto.com/2010/09/openstack-os-is-great-for/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 18:42:16 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[OpenStack]]></category>
		<category><![CDATA[swift]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=346</guid>
		<description><![CDATA[Soon, the OpenStack Object Storage software will be released. It&#8217;s available now as a Developer Preview if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance. About [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.openstack.org/"><img class="alignright size-full wp-image-356" title="OpenStack" src="http://cdn.adrianotto.com/wp-content/uploads/2010/09/openstacklogo.jpg" alt="" width="139" height="143" /></a>Soon, the <a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> software will be released. It&#8217;s available now as a <a href="https://launchpad.net/swift" target="_blank">Developer Preview</a> if you would like to contribute, or perhaps if you&#8217;re just curious. The first release is expected later this month. <strong>This is a fantastic piece of software that really hits the mark for scalability, high availability, and performance.</strong></p>
<h3>About OpenStack Object Storage</h3>
<p><a href="http://www.openstack.org/projects/storage/" target="_blank">OpenStack Object Storage</a> was originally developed by <a href="http://www.rackspace.com/" target="_blank">Rackspace</a>, and was released as <a href="http://www.apache.org/licenses/LICENSE-2.0.html" target="_blank">Open Source Software</a> earlier this year as part of the <a href="http://www.openstack.org/" target="_blank">OpenStack Project</a>. It was written for hosting the <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/">Rackspace Cloud Files</a> service. It&#8217;s original project code name was <em>swift</em>, so you may see references to that in various documentation.</p>
<blockquote><p>OpenStack Object Storage aggregates commodity servers to work together  in clusters for reliable, redundant, and large-scale storage of static  objects. Objects are written to multiple hardware devices in the  datacenter, with the OpenStack software responsible for ensuring data  replication and integrity across the cluster. Storage clusters can scale  horizontally by adding new nodes, which are automatically configured.  Should a node fail, OpenStack works to replicate its content from other  active nodes. Because OpenStack uses software logic to ensure data  replication and distribution across different devices, inexpensive  commodity hard drives and servers can be used in lieu of more expensive  equipment. [<a href="http://www.openstack.org/projects/storage/" target="_blank">1</a>]</p></blockquote>
<p>The system uses a flat namespace, and has a concept an <em>account</em> (how you access the system),  a <em>container</em> (like a directory) and an <em>object</em> (like a file). You can have an arbitrary number accounts each with an arbitrary number of containers. Each container can hold an arbitrary number of objects.</p>
<p>OpenStack Object Storage is very good for is storing unstructured data using an object name as  a lookup key (like a filename). You access your data from a web client  using the web service <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">REST API</a>, not like a filesystem. Download an object (like a file) using an HTTP GET request, fetch object metadata with an HTTP HEAD request, delete an object with an HTTP DELETE request, etc. There are multiple <a href="http://www.rackspacecloud.com/cloud_hosting_products/files/api" target="_blank">language bindings</a> so you can access your files in OpenStack Object Storage from your favorite language natively (Java, Python, Perl, PHP, .NET, etc.).</p>
<p>The system has no central point of failure, so it&#8217;s extremely fault tolerant, and the data and related metadata are distributed throughout the system, so there are no central scalability constraints. You can store arbitrary amounts of data in the system in both large and small sizes. It performs very well, even under very high levels of concurrency. It keeps multiple replicas of each object, so it&#8217;s reliable, and the storage is very durable, without any expensive hardware. You don&#8217;t need any RAID on any of the servers unless you want it for additional performance.</p>
<h3>Use OpenStack Object Storage For&#8230;</h3>
<p>Here are some good use cases for OpenStack Object Storage:</p>
<ul>
<li>Storing media libraries (photos, music, videos, etc.)</li>
<li>Archiving video surveillance files</li>
<li>Archiving phone call audio recordings</li>
<li>Archiving compressed log  files</li>
<li>Archiving backups (&lt;5GB each object)</li>
<li>Storing and loading of OS Images, etc.</li>
<li>Storing file populations that grow continuously on a  practically infinite basis.</li>
<li>Storing small files (&lt;50 KB). OpenStack Object Storage is great at this.</li>
<li>Storing billions of files.</li>
<li>Storing Petabytes (millions of Gigabytes) of data.</li>
</ul>
<h3>Recognize the Limitations</h3>
<p><strong>Objects must be &lt;5GB</strong></p>
<p>This is an arbitrary size limit, but it can not be set to an unlimited value because of the system design.  If you want to store a backup something larger than 5GB, you&#8217;ll need to   have a way of breaking it up into chunks, and storing some manifest of   the parts so you can later join them back together again when you want   to download the data and use it again.</p>
<p><strong>Not a Filesystem</strong></p>
<p>Uses a REST API, or a language binding that consumes the REST API. It does not use the typical POSIX filesystem semantics like open(), read(), write(), seek(), and close().</p>
<p><strong>No User Quotas</strong></p>
<p>There are no maximums that can be configured on a per-user basis to limit how much storage is used.</p>
<p><strong>No Directory Hierarchies</strong></p>
<p>You can create an arbitrary number of containers, but there is no nested container capability. You can simulate a directory structure using creative object names, but this is limited to a maximum string length. If you only need a shallow hierarchy, or don&#8217;t have long directory names, this might be fine. Just remember that I warned you this is generally a bad idea.</p>
<p><strong>No writing to a byte offset in a file</strong></p>
<p>The only way to update a file is to essentially overwrite it. The system creates a new version of an object each time you upload one with the same name.</p>
<p><strong>No ACL&#8217;s</strong></p>
<p>Per-Container ACL&#8217;s will probably be added in a later release. Per-Object ACL&#8217;s will probably not be supported, but maybe.</p>
<p><strong>No Append Support</strong></p>
<p>It&#8217;s possible that this may be added at a later time using a versioning trick.</p>
<p><strong>No File Locking</strong></p>
<p>Most filesystems integrate with the kernel to offer advisory locking. This is not possible with OpenStack Object Storage.</p>
<p><strong>Eventual Consistency</strong></p>
<p>Don&#8217;t expect version consistency between multiple nodes when data is being updated.</p>
<p>If you upload a new version of an object, and immediately GET that object from another client, you may get a previous version of the file. There is no way to know which version of a given object the system is responding with, unless you set version metadata on each object yourself. If there is any problem with the network, you may get outdated versions of objects, or be able to see objects that were deleted, but the local node may not yet know are deleted.</p>
<p><strong>No Support for Data Encryption</strong></p>
<p>You must encrypt the data yourself. The current version does not have SSL support either. Use an SSL proxy to work around this by terminating the SSL sessions on the same network where the OpenStack Object Storage system runs.</p>
<p><strong>Not Compatible With Web Browsers</strong></p>
<p>You must supply a storage token header to authorize each request. Regular web browsers can&#8217;t do this. This can be solved using a proxy between the client and the system to handle token authentication. This is not a problem is you are using one of the language bindings. They will take care of this when you integrate your web app with the system.</p>
<p><strong>Not a Database</strong></p>
<p>It supports no querying or processing of data on the servers. All you can do is list the objects within a given container. There is no way to search based on object metadata. You need to keep your own external search indexes.</p>
<p><strong>Don&#8217;t try to frequently update large objects.</strong></p>
<p>All updates produce a new version of an object, because objects are <a href="http://en.wikipedia.org/wiki/Immutable_object" target="_blank">immutable</a>.</p>
<p><strong>Don&#8217;t store unlimited objects per container</strong></p>
<p>You can store as many objects in a container as you wish. However, your per-object upload latency will increase considerably one you reach a certain point. I found the optimal number of objects per container to be just under one million. This number will vary depending on your equipment, and how heavy of a workload it&#8217;s subjected to.</p>
<p><strong>Changing Swift Into a Filesystem</strong></p>
<p>You might think of using FUSE to access objects and containers in OpenStack Object Storage as files and directories with a filesystem interface, but you&#8217;ll quickly discover that this is only really good for very simple use cases. Most of the things you need to implement what we think of as a filesystem are missing.</p>
<p>If you are a developer, and you are thinking of building a filesystem on top of OpenStack Object Storage using objects as blocks, that could possibly work, but would probably not perform very well compared to existing alternatives that are actually designed for distributed block storage. The blocks would need to be pretty large to keep the network/protocol overhead down. Frequent writing is not likely to work well. Most users of filesystems are not expecting eventual consistency behavior. They want strong data consistency. You would also want some strategy to handle read/write concurrency with some locking capability. Plus, you would need to have a way to keep track of the blocks like a filesystem does in some data structure or database. Frankly speaking, OpenStack Object Storage is probably not the right tool for the job.</p>
<p><strong>Conclusion</strong></p>
<p>You should probably only use OpenStack Object Storage for use cases it&#8217;s intended for. If what you really want is a clustered filesystem, you&#8217;re probably better off looking at other solutions like <a href="http://en.wikipedia.org/wiki/Lustre_%28file_system%29" target="_blank">Lustre</a>, <a href="http://en.wikipedia.org/wiki/GlusterFS" target="_blank">GlusterFS</a>, <a href="http://en.wikipedia.org/wiki/Global_File_System">GFS</a>, <a href="http://en.wikipedia.org/wiki/OCFS" target="_blank">OCFS</a>, etc. Keep in mind that each of these have their own strengths and weaknesses. Pay particular attention to what they are designed for, and use them accordingly. If you want to use OpenStack Object Storage for something that it was designed for, then you will probably be <strong>very happy with it</strong>. Keep in mind that it&#8217;s a blob storage system. It&#8217;s not a filesystem, not a file server, not a database, etc. To learn more about OpenStack Object Storage, please check out the <a href="http://swift.openstack.org/" target="_blank">Developer Documentation</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/09/openstack-os-is-great-for/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching using disk: basic
Object Caching 594/594 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-02-05 11:56:02 -->
