Latest Publications

Put WiFi on your cell phone’s SIM Card!

Have you ever wanted to surf the web from your laptop using the internet connection on your cell phone without connecting any wires, and with no hassle goofing around with software? Well guess what, for you happiness is close at hand!

Today Sagem Orga made a press release that raised my eyebrows. They have a new SIM card (the identification chip in your GSM cell phone) that has WiFi capability right on the chip. This is exciting, because it would enable otherwise ordinary cell phones to be used as WiFi internet gateways, running both WiFi and 3G data connections at the same time.

This is something that most phones simply can not do. The ones that can do it require that a software program must be running on the phone to make it into a router that can relay WiFi signals over the web through a 3G data connection over the cell phone network. Getting this on a Blackberry, for example is a huge nuisance, if your service provider supports it at all.

Well, that nuisance may be a thing of the past once the new “SIMFi” technology hits the market. Imagine just plugging in the snazzy new card into your phone, joining its WiFi network from your laptop, and accessing the internet from practically anywhere. How cool is that!?!

There has been a discussion on Slashdot about this. One of the interesting commentary was about the need for a 2.4 GHz antenna, which can actually fit fine on the SIM card itself, as long as it’s bent around a bit. An obvious question with any WiFi product is “what’s the implication on battery life?”. It will definitely be shorter. Hopefully this device will have some sort of a tunable transmit power adjustment for the WiFi signal so power consumption can be kept to a minimum. After all, your laptop and your cell phone will only be an arm’s length apart when you are using this setup anyway, so range is not a major concern.

Yes, I do love technical gadgets. The thought of where this could go is very exciting. I’ll be the first on the waiting list for this!

CPU Time stolen from a virtual machine?

Those of you studying the vmstat(8) man page may be wondering what the ’st’ figure is in the CPU column. The manual refers to it as “Time stolen from a virtual machine“. More specifically:

It’s the time the hypervisor scheduled something else to run instead of something within your VM. This might be time for another VM, or for the Hypervisor host itself. If no time were stolen, this time would be used to run your CPU workload or your idle thread.

There is some disagreement circulating about whether the Hypervisor will steal idle time, or only preempted time. In other words, it has been suggested that stolen time is where your local kernel scheduler within the VM wanted to run something but the Hypervisor made that impossible. I have found that stolen time does in fact count borrowed idle time, where the local scheduler actually had nothing to run. For example, here are some vmstat values from a VM that’s got a very low cpu workload on it:

vmstat -S M 1 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    121     42     53    460    0    0     0     1    0    1  0  0 89  0 10
 0  0    121     42     53    460    0    0     0    28 1014   39  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1024   32  0  0 93  0  7
 0  0    121     42     53    460    0    0     0     0 1019   40  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1015   32  0  0 90  0 10
 0  0    121     42     53    460    0    0     0     0 1022   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1016   36  0  0 91  0  9
 0  0    121     42     53    460    0    0     0     0 1013   34  0  0 92  0  8
 0  0    121     42     53    460    0    0     0     0 1028   43  0  0 93  0  7

As you can see, user time (us), system time (sy), and iowait time (wa) are zero, but idle time is not 100%. This normally indicates that your system is doing something, but in this case idle time is actually the sum of the id and st columns.

In this example, I really don’t care that I have a nonzero st column because my workload is basically idle all the time anyway.

If you are on a cloud host where you purchase a small sliver of a server, you should expect to see nonzero values in this column when you run vmstat. If you have a heavy CPU load and need more processing power, you can solve this problem by upgrading to a larger VM server size so that you command a larger portion of the physical host.

ED Strikes Again?

It’s not the ED you are thinking of. Nope, it’s actually the External Dependency.

One piece of advice that I continually dispense is to try to reduce dependencies on remote web sites when coding your own. The problem strikes most dramatically when you run a very busy site, and you have some feed or resource that you download from a remote site. That remote site crashes, and oops, so does yours. It also happens when your busy site gets more traffic than the corresponding requests to the remote site can handle.

I ran into this again today. One site that I host was consuming a remote feed from a site that has a much smaller capacity than my customer does. The site on my end gets over 10 million page views a day (peak ~2000 page views per second). The capacity mismatch became very apparent when something went wrong on the remote end.

The code logic was:

  1. If you have a cached version of the feed, and its fresh, then use it.
  2. If the cached entry is expired, then fetch a new one, and replace the one in cache.

This logic is fundamentally flawed for busy sites. It seems sensible, but think about what happens when the cached entry expires, and the remote site is responding very slowly. All of a sudden a stampede of requests start stacking up, all trying to get the feed in parallel. It crashes the remote site even worse. The remote site tries to reboot, and you quickly crash it again. The sequence repeats indefinitely.

Why? Because the window of time during which the cache is invalid gets wider and wider as the remote site gets slower and slower. The longer that window is open, the more traffic the remote site will get from cache misses.

A clean solution is to update the cache asynchronously using a scheduled batch job that keeps a local cache of the data. Only attempt to update the cache when it has actually changed. The logic in the web appication changes to:

  1. Always use the data in the cached file.

The feed site is consulted on regular intervals using a scheduled batch job (cron), and the cached data is updated if it’s able to get a response. If the remote site is down or too slow, then the application simply continues to use the version it had before. Problem solved!

Why is this not a best practice for all web developers? Because most web sites don’t get enough traffic for it to matter much. But, if you’ve got a busy site, and you don’t want it to crash when your remote feeds do, then you might want to consider getting that data asynchronously, or at least use a cache update procedure that’s serialized.

Here is an example of a non-blocking serialization approach that works for PHP applications.

So all you web developers out there who like to consume RSS feeds on the server-side of your web application… don’t say I didn’t warn you. Go look at all your code and make sure you don’t have an dependency on a remote site. If you do, you now know at least two ways to solve that problem.