VMware's in-memory distributed SQL database, vFabric SQLFire, can load 8 million rows in 88 seconds.
SQLFire is designed to work as an in-memory distributed SQL-based cache that can be used alongside a traditional database handling the disk-based side of the process. vFabric SQLFire has JDBC and ADO.NET interfaces that you use to query the data store using SQL.
The keys and indexes are stored in memory to provide high scalability, availability and performance. It can also use a RDBMS where data persistence is required, and there are plans to provide support for major databases including Oracle, MySQL, Sybase, DB2, SQL Server and postGres.
One concern of using data onto a memory based data system with large data sets is the potential time taken to actually load it. Pas Apicella of the vFabric Cloud Application Platform team has posted an interesting blog post about just how well SQLFire achieves this task, loading 8 million rows in 88 seconds. The team estimates that this means SQLFire would be able to load around 40GB of data in an hour. The data was in CSV format, and the team used a multi-threaded load approach.
The team used SQLFire 1.0.3, with two virtual machines each with 18G of memory. The virtual machines shared the same home directory as well as having separate disks for their own disk stores and log files. A separate disk store was used for persistence, and any overflow data would be removed from memory and not written to disk. The in-memory table asynchronously writes its changes to the disk store. This means it only has to do the memory insert prior to moving onto the next change. Only six threads were used, and Apicella says making the thread count too high can have an adverse effect.
Other than ensuring the process loads only for the client exists and preventing the second node from attempting to read the file as well, the rest of the setup looks remarkably straightforward. You can see the scripts and output here.