TokuDB has a big advantage over B-trees when trickle loading data into existing tables. However, it is possible to preprocess the data when bulk loading into empty tables or when new indexes are created. TokuDB release 4 now uses a parallel algorithm to speed up these types of bulk insertions. How does the parallel loader performance compare with the serial loader? We use the Air Traffic Control (ATC) data and queries described in a Percona blog and also used in an experiment with TokuDB 2.1.0 to gain some insight.
Our ATC data is about 122M rows in size, is stored in a 40GiB CSV file, and can be found in our Amazon S3 public bucket. See the end of this blog for details. We …
[Read more]