Drizzle is now BETA

Today Drizzle enters BETA. Drizzle is an evolution of MySQL that’s been simplified, streamlined, and modernized. This long awaited database started from an idea in 2005 to fork MySQL, keep the good parts, and rip out or replace all the stuff that’s not good for web applications. This idea has been endorsed by large corporate sponsors, including Sun Microsystems in the early days, and now Rackspace. Most of the code is contributed by the developer community, which is made up of of a very talented group of open source developers with core committers from four different companies. More about the Drizzle project:

Charter

  • A database optimized for Cloud infrastructure and Web applications
  • Design for massive concurrency on modern multi-cpu architecture
  • Optimize memory for increased performance and parallelism
  • Open source, open community, open design

Scope

  • Re-designed modular architecture providing plugins with defined APIs
  • Simple design for ease of use and administration
  • Reliable, ACID transactional

There are many exciting changes, such as optimizing everything for 64-bit CPU’s and Multi-Core. You can’t hardly even buy 32-bit and Single Core servers nowadays if you want them. It makes no sense to have software that’s optimized for these antiquated hardware designs. No effort is spent optimizing software to work with rotational hard drives because SSD drives are the way of the future. All the language collations have simply been replaced with UTF-8 only, because the web uses UTF-8. Plus, this is tested with 41 different language translations. Drizzle has a new scheduler. The legacy MySQL scheduler was designed to work for a thread-per-session setup. In Drizzle, sessions are handled independently from the threads. The new scheduler allows this to work.

Drizzle uses InnoDB as its default storage engine, which is great for OLTP. It also supports the PBXT storage engine. There are available plugins for the InnoDB Embedded Engine and HailDB which will soon be the new default. DDL Operations (like ALTER TABLE) can actually roll back in the event that something goes wrong in the process, rather than leaving you with incomplete or corrupt data.

The code base in Drizzle has been fully modernized, and brought up to today’s standards of C++ with extensive use of the C++ STL to replace MySQL’s usage of obscure custom data type implementations that offered no real benefit compared to what the STL has today. Another example of improvements in this area is the replacement of the legacy REGEX implementation with a more standard library. All of these changes reduce the amount of Drizzle source code dramatically compared to MySQL. Less code and simpler code means less bugs, plain and simple. Drizzle is well on its way to being an ideal fit for web applications that need a reliable, and high performance transactional database.

Features in Drizzle7 Beta

  • New micro kernel
  • Migration Tool
  • Instance Catalog Support
  • Universal Replication
  • User query analysis
  • Mutli-core Support

What “Beta” means

  • Your data is safe. Transactional engine by default and stable for over 2 years.
  • Upgrade the system in-place without exporting/importing data.
  • Replication is still being tested.

In Microsoft terms, it means that this project would have launched about a year ago. In Google terms, it probably would have launched six months ago. Simply put, if you trust your data to a MySQL system today running InnoDB, you should feel comfortable trying Drizzle. There have been some changes to the InnoDB setup, such as the elimination of the FRM files from disk which eliminate possible inconsistency between the state on disk and the state in InnoDB. I am in the process of moving a few of my produciton applications to use the Drizzle Beta. If you’re an accomplished system administrator and DBA, you should seriously consider putting at least one of your production applications on Drizzle now, and see how it works for you.

What’s Next?

  • Beta announced today 2010-09-29.
  • GA February 2011
  • GA May 2011 for Multi-Tenancy features that allow an arbitrary number of logical databases (Schemas, Tables, etc.) to exist concurrently with full data isolation between them. This allows for individual security and resource controls (Threads, Memory, IO), and individual database backups, rather than system level backups. This feature will be called “Catalogs”.

Download Drizzle

Time to get started with the beta. Download the beta today!

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

11 Comments »

 
  • I disagree with this claim. While some failure cases in MySQL are not possible in Drizzle (FRM files are gone) that just means it is more reliable on paper. Reliability in production requires many servers running in production for many months and recovering from many failures. MySQL 5.1 with the InnoDB plugin does great by that standard. We won’t know whether Drizzle does until it has been widely deployed.

    >>>
    Simply put, if you trust your data to a MySQL system today running InnoDB, Drizzle is actually a step up in terms of data reliability
    >>>

    Does Drizzle keep the replication log and InnoDB in sync during crash recovery? MySQL with sync_binlog=1 will do that.

  • Adrian Otto says:

    Thanks for the feedback. I agree with your comment, and have updated my post accordingly to soften the claim accordingly.

    You raise an interesting question about the replication log. The replication code in Drizzle is a complete rewrite from what MySQL has, so I’ll need to check to find out.

  • Adrian Otto says:

    To answer your question:

    >>>
    Does Drizzle keep the replication log and InnoDB in sync during crash recovery? MySQL with sync_binlog=1 will do that.
    >>>

    I asked @krow and @Shrews about this, and they said…

    In the Drizzle Beta, the replication code guarantees that the transaction log contains whatever is committed to InnoDB. The sequence is PREPARE, COMMIT to replication transaction log, COMMIT to InnoDB. There is a rare case where the transaction could be AHEAD of InnoDB if InnoDB decides that it actually cannot commit after the transaction log is written, or the server crashes in between the two COMMIT events. The developers aware of this and have a solution planned that allows the replication log to be rolled back so that it remains consistent with InnoDB.

  • I don’t think Drizzle is a step up in reliability for deployments using InnoDB with the binlog enabled. I bet that most people running MySQL in production agree with me.

    Recovery from a crash between the write to the replication log and the commit to InnoDB is not rare. Drizzle doesn’t handle this today. Official MySQL does.

    Dare I ask whether slave replication state and InnoDB are also not recovered together? While official MySQL doesn’t do this, it is available in the Facebook patch, the Google patch, Percona server and probably MariaDB via XtraDB.

  • Brian Aker says:

    Hi!

    In a case where you are using the file replicator, a crash can occur today where Innodb is told to prepare, a sync occurs for the log, but before the final commit occurs Innodb is killed. In this case the log is ahead Innodb. What we haven’t added to our startup process is where we check the last stored Innodb transaction and cancel the replication event, we are aware that we need to do this before we leave beta.

    As far as your concern about testing, I would agree. We need to do more testing before we announce a GA. What we are comfortable with saying now is that the file formats are set, and an upgrade can occur with without any form for dump/restore.

    In reference to your comment about about reliability, by your measure, that testing needs to be widespread, you do create a bit of a catch 22 for anyone releasing software. Percona server has very few installations in comparison to MySQL, and certainly the same can be said of MariaDB. The Facebook patch is running at how many locations? Just because it fits the pattern of queries that Facebook has, you can’t assume that it is really production quality either.

    For us the beta is about establishing for users that we intend to keep our current formats to make upgrades simpler.

    Cheers,
    -Brian

  • For any of the Drizzle appliers, I surely hope/plan/will lobby/insist that slave state is stored in InnoDB along with the transactions being applied. It just makes so little sense to do otherwise.

    I also think we need to change how the replication log is stored on the master so that it’s impossible to have the two out of sync.

  • Harrison says:

    In the Drizzle Beta, the replication code guarantees that the transaction log contains whatever is committed to InnoDB. The sequence is PREPARE, COMMIT to replication transaction log, COMMIT to InnoDB. There is a rare case where the transaction could be AHEAD of InnoDB if InnoDB decides that it actually cannot commit after the transaction log is written, or the server crashes in between the two COMMIT events. The developers aware of this and have a solution planned that allows the replication log to be rolled back so that it remains consistent with InnoDB.

    This sounds like a 2PC solution (same as MySQL uses with the binary log). In MySQL, if a transaction is committed to the transaction log (binlog) and a crash occurs before the commit in InnoDB, on startup, it will finish the commit to InnoDB.

    Why are you choosing to do rollback the COMMIT, rather than finish the COMMIT? By definition, a successful PREPARE means that InnoDB says that it can commit, so on restart it should be fine (InnoDB already has the transaction data written to it from the PREPARE).

  • Harrison says:

    In a case where you are using the file replicator, a crash can occur today where Innodb is told to prepare, a sync occurs for the log, but before the final commit occurs Innodb is killed. In this case the log is ahead Innodb. What we haven’t added to our startup process is where we check the last stored Innodb transaction and cancel the replication event, we are aware that we need to do this before we leave beta.

    The problem with removing it from the replication log is that a connected slave may have already downloaded it. So rolling back on restart doesn’t work very well. The only way to prevent this is to have some sort of lock that is held on the file to prevent the slaves from downloading until it is all the way committed to InnoDB which seems less than ideal.

  • Adrian, your blog STILL claims that Drizzle is a step up in terms of data reliability. I’m sorry if you disagree, but you are wrong. Mark Callaghan is right. You should remove that claim, pure and simple.

  • Adrian Otto says:

    Baron,

    Thanks for taking time to share your concern. I value your feedback and I value Mark’s comments very much. My intent with this post was not to stir up controversey or misrepresent Drizzle. Although I doubt Drizzle data reliablity differs dramatically from MySQL, Its probably best to bring this disagreement to conclusion and let actual testing answer this for us. I like making decisions and forming opinions based on facts. In this spirit I am planning to strike that section from the post and leave this as an invitation to try the beta. You can draw your own conclusions about the merits and pitfalls of Drizzle based on your own experience. Keep the great feedback coming.

    Thanks,

    Adrian

  • If Drizzle uses the 2PC code, what is done for transactions in state PREPARED during recovery? If unilateral rollback then InnoDB can lose a transaction that is in the replication log. If unilateral commit then InnoDB can gain a transaction that is not in the replication log. If nothing then InnoDB can eventually get stuck from resources held by the PREPARED transaction.