<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Adrian Otto&#039;s Blog &#187; Xen</title>
	<atom:link href="http://adrianotto.com/tag/xen/feed/" rel="self" type="application/rss+xml" />
	<link>http://adrianotto.com</link>
	<description>For those who care about technical details</description>
	<lastBuildDate>Thu, 20 Oct 2011 14:35:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>CPU Time stolen from a virtual machine?</title>
		<link>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/</link>
		<comments>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 16:42:59 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=258</guid>
		<description><![CDATA[Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;Time stolen from a virtual machine&#8220;. More specifically: It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you studying the vmstat(8) man page may be wondering what the &#8216;st&#8217; figure is in the CPU column. The manual refers to it as &#8220;<em>Time stolen from a virtual machine</em>&#8220;. More specifically:</p>
<p>It&#8217;s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.</p>
<p>There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that&#8217;s got a very low cpu workload on it:</p>
<pre>
vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7
</pre>
<p>As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the <em>id</em> and <em>st</em> columns.</p>
<p>In this example, I really don&#8217;t care that I have a nonzero <em>st</em> column because my workload is basically idle all the time anyway.</p>
<p>If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2010/02/time-stolen-from-a-virtual-machine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Putting Entropy in the Cloud</title>
		<link>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/</link>
		<comments>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 04:48:56 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Random]]></category>
		<category><![CDATA[RNG]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=247</guid>
		<description><![CDATA[I was browsing through twitter mentions of @adrian_otto and found one posted by Ian Thompson mentioning an article about weak randomness in the cloud. It suggests that because there may be insufficient entropy sources on a Cloud Server or instance that it may make it easier to guess random number sequences because different cloud servers [...]]]></description>
			<content:encoded><![CDATA[<p>I was browsing through twitter mentions of <a href="http://twitter.com/adrian_otto" target="_blank">@adrian_otto</a> and found one posted by <a href="http://twitter.com/MystirrE" target="_blank">Ian Thompson</a> mentioning <a href="http://bit.ly/34Wom8" target="_blank">an article</a> about weak randomness in the cloud. It suggests that because there may be insufficient entropy sources on a Cloud Server or instance that it may make it easier to guess random number sequences because different cloud servers may have similar or even identical entropy pools (or worse yet identical host keys) when created, and therefore easier to break encryption algorithms that depend on them.</p>
<p>Yes, if you have similar entropy pools it is easier to break encryption dependent on it. It&#8217;s reasonably easy to work around this and make sure your entropy pool is uniquely initialized. You can consult the <a href="http://linux.die.net/man/4/random" target="_blank">random manual for the Linux Kernel</a> for information about how to seed your entropy pool with a particular set of data. If you are running an application in the cloud that utilizes encryption, and you are concerned about the initial state of your entropy pool, you can solve that. Use this procedure:</p>
<p>1) Seed your own pool from a long running system that has sufficient entropy in it, rather than relying on what you read from the kernel at startup.</p>
<p>2) Produce a network service that you use to seed your initial entropy pools. This service could be as simple as an entropy file that you create on pseudo-random time intervals, and just discard them as you serve them to cloud server instances (as they boot up) so you never serve the same one twice. At boot time from your VM, simply connect to wherever you run this service and download an input file to seed your entropy pool with. Restrict access to this so that it&#8217;s only available to your own server instances.</p>
<p>3) Make sure that your custom entropy pool initialization takes place prior to starting your encryption software.</p>
<p>4) If you are creating an AMI, or other server image that you plan to clone, be sure that it does not have a host key generated yet. Delete it and allow your initialization scripts to create it when the server is created (after step rather than making copies of the same one.</p>
<p>If you don&#8217;t trust what /dev/random or /dev/urandom emit, you can optionally use OpenSSL with <a href="http://prngd.sourceforge.net/" target="_blank">prngd</a> or <a href="http://egd.sourceforge.net/" target="_blank">egd</a> as alternate entropy sources, and potentially feed in your own sensory input data. If you want to go hardcore, you could add environmental noise such as resistor noise on the microphone input of a sound card, or some other sensory data. There is <a href="http://vanheusden.com/aed/" target="_blank">existing software for doing just that</a>. There&#8217;s all sorts of possibilities. Among them are a number of hardware solutions for RNG, most of which are pretty expensive and are not options for a cloud environment. There are sources of random numbers provided <a href="http://random.irb.hr/">as a service</a> from <a href="http://www.random.org/" target="_blank">various sources</a>.</p>
<p>There are things that we can do as Cloud Computing service providers to pre-initialize your entropy pools for you when the given server instance is created so the procedure above would be redundant. This still leaves the question as to the quality of the <a href="http://en.wikipedia.org/wiki/Random_number_generator" target="_blank">RNG</a> available to you on a cloud server.</p>
<p>There are two standard randomness sources that you should know about:</p>
<p>/dev/random   = produces actual entropy, if you have some, and blocks otherwise.<br />
/dev/urandom = produces available entropy regardless of quality, but does not block.</p>
<p>The Linux kernel has a paravirtual entropy driver which provides kernel-side support for the virtual <a href="http://en.wikipedia.org/wiki/Random_number_generator" target="_blank">RNG</a> hardware. The kernel compile option CONFIG_HW_RANDOM_VIRTIO enables it, and it can be built as a kernel module. There are drivers that run within the hypervisor host kernel that connect this with the RNG hardware available on the server (if any).</p>
<p>drivers/char/hw_random/amd-rng.ko = H/W RNG driver for AMD chipsets<br />
drivers/char/hw_random/intel-rng.ko = H/W RNG driver for Intel chipsets<br />
drivers/char/hw_random/virtio-rng.ko = VirtIO Random Number Generator support</p>
<p>How it works is the hypervisor host (dom0) runs <a href="http://linux.die.net/man/8/rngd/" target="_blank">rngd</a> to read data from /dev/hwrandom (using the Intel or AMD modules mentoined above) and feeds it into /dev/random, then the guest VM (domU) does the same thing. The rngd can mixes data from both /dev/random and /dev/urandom so you get as much random data as you need in a non-blocking fashion. You can consult the kernel <a href="http://lwn.net/Articles/282721/" target="_blank">source code</a> to learn more. Then you run rngd in the guest VM to feed that into the kernel.</p>
<p>What happens if multiple guest VM&#8217;s are reading this data at the same time using this arrangement? I&#8217;m not sure if it&#8217;s possible to deplete the entropy pool of the hypervisor host and produce <a href="http://en.wikipedia.org/wiki/Pseudorandom_number_generator" target="_blank">PRNG</a> patterns that are therefore less random. So if one guest VM emptied the entropy pool by aggressively reading from the /dev/hwrandom device, you might cause someone else&#8217;s guest VM to get less data. This could be solved if there were a simply a rate limit enforced on the consumption of RNG data allowed per guest VM. There is <a href="http://lwn.net/Articles/283103/" target="_blank">further discussion</a> of that as well.</p>
<p>The truth is that for most needs you can have reasonably secure encryption by simply having an ordinary PRNG source like /dev/urandom that&#8217;s properly initialized with random data. I suggest that you use that approach in your cloud deployments.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/putting-entropy-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Remus Project: Full Memory Mirroring!</title>
		<link>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/</link>
		<comments>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 22:30:10 +0000</pubDate>
		<dc:creator>Adrian Otto</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[VoIP]]></category>
		<category><![CDATA[Remus]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://adrianotto.com/?p=163</guid>
		<description><![CDATA[Imagine that you have a cluster with two machines side by side in an active/standby configuration. Let&#8217;s say you have your data replicated, and the systems are basically identical except for the IP address and hostname. You can use heartbeat to share an IP address such that if the primary fails, the secondary takes over. [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-166" title="Mirrored Servers" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/server-mirror.jpg" alt="Mirrored Servers" width="130" height="90" />Imagine that you have a cluster with two machines side by side in an active/standby configuration. Let&#8217;s say you have your data replicated, and the systems are basically identical except for the IP address and hostname. You can use heartbeat to share an IP address such that if the primary fails, the secondary takes over. You can also perform the equivalent using &#8220;live migration&#8221; features in a Xen or VMWare hypervisor. The problem with these sorts of fail-overs is that any active TCP/IP sessions end up getting broken, and new connections must be established between clients and the application.</p>
<p>Okay, here&#8217;s something that fixes that problem: the <a href="http://dsg.cs.ubc.ca/remus/" target="_blank">Remus Project</a>. The approach is brilliant. On regular intervals it ships the changed memory registers from one host to the other. Memory reading does not need to be replicated, only writes, and writes to the same location don&#8217;t all need to be replicated, only the most recent write. The primary node simply delays its response to TCP/IP packets (output buffering) until after it has confirmed that the standby node has received the replicated memory data. Very very clever.</p>
<p>Here are the key features listed on the Remus web site:</p>
<ul>
<li>The backup VM is an <em>exact copy</em> of the primary VM. When     failure happens, it continues running on the backup host as if     failure had never occurred.</li>
<li>The backup is <em>completely up-to-date</em>. Even active TCP     sessions are maintained without interruption.</li>
<li>Protection is <em>transparent</em>. Existing guests can be     protected without modifying them in any way.</li>
</ul>
<p><a href="http://www.xen.org/"><img class="alignright size-full wp-image-170" title="Xen Logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/xen_logo.gif" alt="Xen Logo" width="149" height="67" /></a>Okay, I&#8217;ve been running HA systems in multiple geographies now for about a decade. I&#8217;ve experimented with lots and lots of clustering and replication technology. Most of the time when I hear about something new, I cringe and wonder if it&#8217;s just another thing that&#8217;s using the same old tricks I&#8217;ve been using for years, or if its something truly innovative and truly <a href="http://en.wikipedia.org/wiki/Open_source" target="_blank">open source</a>. Before you go making comments that VMWare has this feature or that feature, relax. This post is not about VMWare. It&#8217;s about open source Xen.</p>
<p>Now, you might already be wondering if this would work if you separated the two nodes to run in separate locations. The short answer is maybe. You would still need a very clever network configuration to re-route your traffic dynamically to the new location. For those of us that do operate our own Autonomous Systems, that may seem possible with a BGP route update. But here&#8217;s the bummer&#8230; The additional latency it would introduce would bring your performance to a screeching halt. You could probably afford to have about 25ms of average latency between two locations and get away with it. The cut-over would still be better than nothing, but you&#8217;d better have a rock solid network in there, and you&#8217;d better be ready to pump lots of bandwidth over it. Plan for 100Mb/sec if you checkpoint every 100ms.</p>
<p><a href="http://www.memcached.org/"><img class="size-full wp-image-164 alignright" style="margin-left: 10px; margin-right: 10px;" title="memcached logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/memcache_logo.png" alt="memcache_logo" hspace="10" width="76" height="75" /></a>This would be great for a high read application like a web cache, or some <a href="http://www.memcached.org" target="_blank">memcached</a> applications. People ask on the memcached mailing list all the time how they can set up replication and HA. The answer is always &#8220;it&#8217;s a cache&#8230; not a database.&#8221;. Well, for those of you that want to do HA for a memcached system, give Remus a try.</p>
<p><img class="alignright size-full wp-image-174" title="trixbox logo" src="http://cdn.adrianotto.com/wp-content/uploads/2009/11/trixbox_logo.png" alt="trixbox logo" />Let&#8217;s not stop there. Imagine you have a SIP call control platform or <a href="http://www.trixbox.org/" target="_blank">Trixbox</a> system, and you don&#8217;t want to lose all your active calls in the event of a system crash? Pretty much any mission critical application that supports long running connections over TCP/IP</p>
<p>Remus has been around for some time, so why am I so excited now? It&#8217;s now part of <a href="http://www.xen.org" target="_blank">Xen</a>! You don&#8217;t need to do anything special on the master or slave node to use it! Whoot! Now I&#8217;m impressed. Anyone out there have experience running it? I&#8217;d love to hear your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 1/14 queries in 0.009 seconds using disk: basic
Object Caching 360/384 objects using disk: basic
Content Delivery Network via cdn.adrianotto.com

Served from: adrianotto.com @ 2012-02-05 12:26:03 -->
