In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I explain why another alternative to normal inserts, “insert … on duplicate key update” is no better in MySQL, because the command incurs disk seeks.
The reason “insert ignore” and “replace into” can be made fast with TokuDB’s fractal trees is that the semantics of what to do in case a duplicate key is found is simple. …
[Read more]