Showing entries 331 to 335
« 10 Newer Entries
Displaying posts with tag: TokuDB (reset)
Using Gearman for Nightly Build and Test

At Tokutek, Rich Prohaska used Gearman to automate our nightly build and test process for TokuDB for MySQL. Rich is busy working on TokuDB, so I’m writing up an overview of the build and test architecture on his behalf.

Build and Test Process

Rich created a script, nightly.bash, that gets kicked off every night as a cron job. Nightly.bash creates a separate Gearman job for each build target. We have a separate build target (unique binary) for each combination of operating system (e.g. Linux, Windows, etc.) and HW architecture (e.g. i686, x86_64) supported by TokuDB. As we support more operating systems over time, the number of build targets grows quickly so we needed a build and test architecture that scales, and Gearman makes it easy.

Gearman then automatically distributes the build jobs to a set of systems set up as …

[Read more]
Attempting to Quantify Fragmentation Effects

We often hear from customers and MySQL experts that fragmentation causes problems such as wasting disk space, increasing backup times, and degrading performance. Typical remedies include periodic "optimize table" or dump and re-load (for example, see Project Golden Gate). Unfortunately, these techniques impact database availability and/or require additional administrative cost and complexity. Tokutek's Fractal Tree algorithms do not not cause fragmentation, and we're looking for ways to measure the effects of fragmentation to quantify TokuDB's benefits.

I ran some tests using the iiBench benchmark as an experiment to try and quantify the impact of fragmentation, and observed some interesting …

[Read more]
Cache Miss Rate as a function of Cache Size

I saw Mark Callaghan’s post, and his graph showing miss rate as a function of cache size for InnoDB running MySQL.  He plots miss rate against cache size and compares it to two simple models:


  • A linear model where the miss rate is (1-C/D)/50, and
  • A inverse-proportional model where the miss rate is D/(1000C).

He seemed happy (and maybe surprised) that that the linear model is a bad match and that inverse-proportional model is a good match.  The linear model is the one that would make sense if every page were equally likely to have a hit.

I’ll argue here that it’s not so surprising.  Suppose that miss rate has a heavy-tailed distribution, such as Zipf’s law. An …

[Read more]
Extended covering indexes

As you can probably guess, I’m catching up on reading my blogs. I’ve just read with interest about TokuDB’s multiple clustering indexes. It’s kind of an obvious thought, once someone has pointed it out to you. I’ve only been around products that insist there can be Only One clustered index (and then there’s ScaleDB, who say “think differently already”).

Anyway, we already know that there are quite a few database products that use clustered indexes and to avoid update overhead, require every non-clustered index to store the clustered key as the “pointer” for row lookups. Thus there are “hidden columns” which are present at the leaf nodes, but not the non-leaf nodes, of secondary indexes. Why not take that idea and run with it a little? Here’s what I mean:

[Read more]
The cache-oblivious algorithms inside Tokutek’s TokuDB

Tokutek have said they are working towards explaining their indexing algorithms. I spoke to some of the Tokutek people over the last 14 months or so about this, although I didn’t really start to pay attention until the beginning of the year. While Vadim, Peter and I were writing our blog post on TokuDB, I asked them to provide scholarly references, and they did, but warned me it would be dense reading, in part because it’s so academic. Mark Callaghan also told me he had gotten them to walk him through the math behind their indexing algorithm and found it hard.

Here’s a blog post with links to the research behind their work. I’m happy to say that after …

[Read more]
Showing entries 331 to 335
« 10 Newer Entries