Showing entries 281 to 290 of 335
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: TokuDB (reset)
Don’t Thrash: How to Cache your Hash on Flash

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like a negative cache, …

[Read more]
Don’t Thrash: How to Cache your Hash on Flash

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like a negative cache, …

[Read more]
Query Planner Gotchas

Indexes can reduce the amount of data your query touches by orders of magnitude. This results in a proportional query speedup. So what happens when you define a nice set of indexes and you don’t get the performance pop you were expecting? Consider the following example:

mysql> show create table t;
| t     | CREATE TABLE `t` (
  `a` varchar(255) DEFAULT NULL,
  `b` bigint(20) NOT NULL DEFAULT '0',
  `c` bigint(20) NOT NULL DEFAULT '0',
  `d` bigint(20) DEFAULT NULL,
  `e` char(255) DEFAULT NULL,
  PRIMARY KEY (`b`,`c`),
  KEY `a` (`a`,`b`,`d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Now we’d like to perform the following query:

select sql_no_cache count(d) from t where a = 'this is a test' and b between 8000000 and 8100000;

Great! We have index a, which cover this query. Using a should be really fast. You’d expect to use the index to jump to the beginning of the ‘this is a test’ values for …

[Read more]
Understanding Indexing – SF MySQL Meetup

At this week’s SF MySQL Meetup, I will give a talk: “Understanding Indexing: Three rules on making indexes around queries to provide good performance.” The meetup is 7 pm tomorrow (Wednesday, 6/22), and will be held at CBS Interactive (235 2nd St., San Francisco). Thanks to hosts Erin O’Neill and Mike Tougeron for the invitation and location.

Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.

This is a general discussion applicable to all databases using indexes and is not specific to any particular MySQL storage engine (e.g., InnoDB, TokuDB, …

[Read more]
Speeding Up TPCC Table Loads by 8x with TokuDB v5.0

Percona’s TPCC for MySQL toolset allows one to measure the query performance for an OLTP workload on various MySQL storage engines.  The toolset includes a program to load the database tables, and a program to run queries and measure performance.  We have found Percona’s TPCC toolset to be extremely useful for tuning our software.  However, we want to take advantage of TokuDB’s bulk load capability when loading the database.

We created a new tool, a simple variant of the existing code, that generates CSV files for the TPCC database.  These CSV files can be bulk loaded into TokuDB with a “LOAD DATA INFILE” statement. TokuDB’s bulk loader uses a parallel merge sort algorithm that is implemented in CILK, an extension to the C language that …

[Read more]
Percona Live, NYC

Yesterday, Percona held Percona Live NYC, which they describe as an “intensive one-day MySQL summit.” They meant it. It was like drinking from a firehose. There was too much for me to give a complete report, so I’d like to highlight two sessions that stuck out for me.

Why SQL Wins

Sergei Tsarev (Clustrix) gave a great overview of the last 50 years of database development. He talked about the early days, in which what we now think of as database functionality had to be implemented in each application. Programmer productivity was therefore low.

As modern SQL databases emerged, productivity shot up since databases bundled up common functionality with an easy-to-code interface. This now seems like a golden age of databases, in which transactional semantics were hashed out.

Fast forward to today. Database performance has failed to keep up with …

[Read more]
Covering Indexes: How many indexes do you need?

I’ve recently been blogging about how partitioning is a poor man’s answer to covering indexes. I got the following comment from Jaimie Sirovich:

“There are many environments where you could end up creating N! indices to cover queries for queries against lots of dimensions.”

[Just a note: this is only one of several points he made. I just wanted to dig into this one in some detail. Here goes...]

Although it is, in theory, possible to generate a workload that would take N! indexes, this is not a realistic (or useful) bound (leaving aside that this workload would kill partitioning!). For one thing, it would take N! queries to exercise all those indexes. And the queries would have to include every field in the where clause — as we’ll get into below.

So what is a reasonable bound on the number of covering indexes that …

[Read more]
Elephants on a Trapeze: Keeping Big Data Agile

On April 1st, the Department of Computer Science at Rutgers University, where I am a professor, held an open house. I gave a talk called “Elephants on a Trapeze: Keeping Big Data Agile”.

The talk is an introduction to performance issues related to big data without getting too technical. You’ll have to decide if I succeeded with the “not too technical” part. My take is on how to keep big data indexed — not surprising since the work in this talk is the basis for TokuDB®, Tokutek’s MySQL storage engine for keeping large data indexed. A video of my talk can be found here.

Elephants on a Trapeze: Keeping Big Data Agile from Tokutek on …

[Read more]
Effective MySQL, a New York City Meetup

Kudos to Ronald Bradford for creating a new MySQL meetup group in New York city and giving MySQL related talks. The next one is tonight, titled “MySQL Idiosyncrasies That Bite”. Information on it can be found at http://ny.effectivemysql.com/events/16884850/.

We’ll have a contingent from our New York office there this evening. We went to the last one on indexing (a favorite topic of ours) in March and it was excellent.

We look forward to seeing folks there as well as at upcoming NY events, including Percona Live (May 26th) and future Effective MySQL meetups.

OldSQL Tricks or NewSQL Treats

Why do B-trees need “Tricks” to work?

Marko Mäkelä recently posted a couple of “tips and tricks” you can use to improve InnoDB performance. Tips and tricks. A general purpose relational database like MySQL shouldn’t need “tips and tricks” to perform well, and I lay the blame on design choices that were made in the early ’70s: the B-tree data structure underlying all OldSQL databases. B-trees were designed for machines that had very different performance characteristics than the machines of today. Hardware has changed, but B-trees are the same. Tips and Tricks are an attempt to make up the difference.

So B-tree implementers — InnoDB, Oracle, MS SQL Server — are fighting an uphill battle; they’re fighting the future. B-trees just aren’t meant to cope with high-bandwidth, slow-seek-time storage systems, because …

[Read more]
Showing entries 281 to 290 of 335
« 10 Newer Entries | 10 Older Entries »