Back in December, I did a detailed analysis for getting data into Vertica from MySQL using Tungsten Replicator, all within the Kodiak MemCloud.
I got some good numbers towards the end – 1.9 million rows/minute into Vertica. I did this using a standard replicator deployment, plus some tweaks to the Vertica environment. In particular:
- Integer hash for a partition for both the staging and base tables
- Some tweaks to the queries to ensure that we used the partitions in the most efficient manner
- Optimized the batching within the applier to hit the right numbers for the transaction counts
That last one is a bit of a cheat because in a real-world situation it’s much harder to be able to identify those transaction sizes and row counts, but for testing, we’re trying to get the best performance!
Next what I wanted to do was set up some bare metal and AWS servers that were of an …
[Read more]