Facebook, LinkedIn, Twitter and Google have cooperated on the creation of a custom version of MySQL designed specifically for use in apps that need to scale massively.
WebScaleSQL is a custom version of MySQL designed for large scale web applications. The changes the companies have made to the database are to be made available as open source, and the information is being shared with the main MySQL project.
Writing on the project’s website, the WebScaleSQL contributors said:
“we know we’re not the only ones who are trying to solve these particular challenges. So we will keep WebScaleSQL open as we go, to encourage others who have the scale and resources to customize MySQL to join in our efforts.”
In a blog post Steaphan Greene of Facebook said that the goal in launching WebScaleSQL is
“to enable the scale-oriented members of the MySQL community to work more closely together in order to prioritize the aspects that are most important to us.”
Changes to WebScaleSQL compared to the main branch of MySQL include improvements to the performance with buffer pool flushing improvements; optimizations to certain types of query including prefix index queries; and support for NUMA interleave policy.
The optimization to prefix index queries is to get them to skip cluster index lookup when possible. Currently InnoDB will always fetch the clustered index for all prefix columns in an index, even when the value of a particular record is smaller than the prefix length. This change optimizes that case to use the record from the secondary index and avoid the extra lookup.
NUMA (Non-Uniform Memory Access) supports hardware with multiple system buses, each with a small set of processors where each group of processors has its own memory. Each CPU can access memory associated with the other groups, and each group forms a NUMA node. MySQL has traditionally had ‘issues’ dealing with NUMA nodes – you can read a good summary in a blog post by MySQL architect Mikael Ronstrom’s blog or on Jeremy Cole’s blog.
The WebScaleSQL option lets you set startup options either to flush and purge buffers and caches, or to run Mysql with its memory interleaved on all CPUs.
WebScaleSQL also includes new features for working at web scale, such as super_read_only, and the ability to specify sub-second client timeouts.
If you are interested in contributing to the project, WebScale SQL is on GitHub.