Today is my last day at Tokutek. On Monday I'm starting a new opportunity
as VP/Technology at CrunchTime!. If you are a web developer, database
developer, or quality assurance engineer in the Boston area and
looking for a new opportunity please contact me or visit the
CrunchTime! career page.
I've really enjoyed my time at VoltDB and Tokutek. Working for
Mike Stonebraker (at VoltDB) was on my career
"bucket list" and in these past 3.5 years at Tokutek I've
experienced the awesomeness of the MySQL ecosystem and the
surging NoSQL database …
Welcome to blog #2 in a series about the benefits of the Fractal Tree. In this post, I’ll be explaining Big Data, why it poses such a problem and how Tokutek can help. Given the fact that I am a lifelong fan of both Hip-hop and Big Data, the title was a no-brainer and, given the artist, a bit of a pun.
I am as tired as you of hearing the term “Big Data.” It’s so overused, that it ceases to have specific meaning anymore. You see, data hardly ever starts as “big” or a “problem.” Rather, it starts small and easily manageable, but gradually grows to some unimaginable size and becomes a beast in need of slaying, like the irradiated ant from a sci-fi film, growing to the size of a cruise ship. The nature of tackling such a tough problem means that the initial understanding of the factors involved is, oftentimes, incomplete at best; Catch-22 exemplified. During the course of problem it is …
[Read more]In my recent travels, I’ve been speaking with database users at various meetups and trade shows worldwide. Very often, I got questions centering around the best use cases for our products, be it TokuDB, our MySQL storage engine, or, TokuMX, our distribution of MongoDB. Over 90% of the time, I responded Cloud, Big Data or both. You see, in the software industry we’re like kindergartners, we like things to fit into neat categories. If you know any software sales people, you’ll recognize this as a fitting analogy (at least in terms of energy and attention span), but I digress. This strategy helps allocate resources where they are most likely to make an impact, and, thus, optimize our return on investment. In this blog series, I’m going to go slightly against the grain and explain why the Fractal Tree makes databases work in these environments.
Before I get into more detail, let me …
[Read more]
"Should vegetarians open steakhouse restaurants?"
Though someone will probably give me several examples of why they
should, I'll argue that they absolutely should not. How can
someone who doesn't eat steak convince others to eat at their
"steak-only" restaurant?
But this is something a "professional technology benchmarker"
(PTB) struggles with on a regular basis. Hello, I'm Tim
Callaghan, and I'm a PTB.
professional technology benchmarker, or PTB (noun) : One
who compares two technologies as part of their job. One of these
technologies is usually the product of the PTB's employer, the
other is almost always not. In a past experience I was tasked
with comparing the performance of a fully in-memory database with
Oracle and MySQL on a "TPC-C like" workload. At the time I was an
Oracle expert and working for the in-memory database company, but
had never started a single MySQL server in my life. At …
I'm starting off 2015 with the following New Year's Resolution,
to improve the state of benchmarking. About a month ago I
noticed the following tweet:
Hey @tokutek, please look at this: http://stssoft.com/products/stsdb-4-0/benchmark ….
Are the benchmarks rigged or correctly done? I'm curious to know!
While I've never met Ian Campbell (@iamic) he
certainly knew how to call me to action. I immediately checked
out the STSsoft
website, the benchmark results page, and the benchmark code itself. My first reaction …
The MySQL 5.6 Release has introduced some changes to how two phase commit works and is managed. In particular, the commit phase of transactions to the binary log is now serialized and this behavior is something we identified fairly immediately. We implement a group commit algorithm that needed to be altered so that TokuDB’s group commit to its recovery log would function effectively.
As part of our effort to verify the new Binary Log Group Commit functionality introduced in TokuDB 7.5.4 for Percona Server, we wanted to demonstrate the substantial increase in throughput scaling but also show the bottleneck caused by the skewed interaction between the binary log group commit algorithm in MySQL 5.6 and the transaction commit mechanism used in TokuDB 7.5.3 for Percona Server. During our testing, we noticed that the throughput scaling was diminished when we turned on the binlog.
Here are the relevant system …
[Read more]TokuDB offers high throughput for write intensive applications, and the throughput scales with the number of concurrent clients. However, when the binary log is turned on, TokuDB 7.5.2 throughput suffers. The throughput scaling problem is caused by a poor interaction between the binary log group commit algorithm in MySQL 5.6 and the way TokuDB commits transactions. TokuDB 7.5.4 for Percona Server 5.6 fixes this problem, and the result is roughly an order of magnitude increase in SysBench throughput for in memory workloads.
MySQL uses two phase commit protocol to synchronize the MySQL binary log with the recovery logs of the storage engines when a transaction commits. Since fsync’s are used to ensure the durability of the data in the various logs, and fsync’s can be very slow, the fsync can easily become a bottleneck. A …
[Read more]In a few weeks I’m presenting “Performance Benchmarking: Tips, Tricks, and Lessons Learned” at Percona Live London 2014 (November 3-4). I continue to learn lessons and improve my benchmarking capabilities, so the content is a full upgrade from my presentation at Percona Live Santa Clara in April 2013. Anyone interested in achieving and sustaining the best performance out of their software/hardware/application should attend.
Also, Tokutek is sponsoring so we’ll be available in the expo hall throughout the show.
If you are attending or in the area and want to learn more about …
[Read more]MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:
TokuDB:
mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: TokuDB VERSION: 10 ROW_FORMAT: tokudb_zlib TABLE_ROWS: 40960 AVG_ROW_LENGTH: 10003 DATA_LENGTH: 409722880 MAX_DATA_LENGTH: …[Read more]
The biggest innovation in TokuDB v7.5 is Read Free Replication (RFR). I blogged a few days ago posting a benchmark showing how much additional throughput can be achieved on a replication slave, while at the same time lowering the read IO operations to almost zero. The official documentation on the feature is available here.
In this second blog I want to cover the requirements for RFR, as well as some interesting use-cases for the technology.
RFR Requirements The only requirement on the master is that …[Read more]