Does having small data-sets really help? Of course it does! Are memory lookups faster that disk lookups. Of course ! So many times I have seen people complain about queries taking too long now, while they were not taking that long earlier. There is one big reason for this, earlier the size of data-set was small so it could fit into memory. Now that the data-set has grown large enough that it cannot fit entirely into memory, the disk seeks really have slowed down the queries significantly. What to do now? Vertical partitioning. Divide the data-set into separate data-sets vertically....
We had a one of our slave servers frequently stop replicating with the “Innodb Lock Wait Timeout” error. The slave IO thread would continue to fetch the binlogs while the slave SQL thread kept stopping with the above mentioned error. The teams initial inclination was to change the innodb lock wait timeout variable from 50 secs to a higher value. It was a read-only slave. Our expectation was there would be no competing writes. Then we started listing what are the next steps possible and what could be wrong.
- There could be a user with “super” privilege in the system that was running updates directly on the slave
- A backup script that could be locking the tables out for backup
- Increase the “innodb lock wait timeout variable“
- Enable the innodb lock monitor
While we were working on the system, we noticed that there were few select queries that …
[Read more]
You know already that InnoDB in MySQL 5.5 has great
improvements in performance and scalability. You will have to
wait a few months for that, though, because MySQL 5.5 is not
GA yet. But if you need some extra performance in MySQL 5.1, you may want to use the Innodb Plugin instead of the built-in one. As of version 5.1.47, the Innodb plugin is of GA quality, and it comes with a good out-of-the-box improvement compared to the built-in engine. |
To test my assumptions, I used one of my test Linux servers to
perform a sysbench on 5.0.91, 5.1.47 built-in and plugin, and
5.5.4. The MySQL servers were all configured with
(No three-part series is complete without a part 4, right?)
Here is an analogy that describes well what group commit does. We
have a bus driving back and forth transporting people from A to B
(corresponding to fsync()
"transporting" commits to
durable storage on disk). The group commit optimisation is to
have the bus pick up everyone that is waiting at A before driving
to B, not drive people one by one. Makes sense, huh? :-)
It is pretty obvious that this optimisation of having more than one person in the bus can dramatically improve throughput, and it is the same for the group commit optimisation. Here is a graph from a benchmark comparing stock MariaDB 5.1 vs. MariaDB patched …
[Read more]
*edit* I want to point out that this test was done on a single
database server which used MySQL partitioning. This is a
demonstration of how Shard-Query can improve performance in
non-sharded databases too.*edit*.
Over the weekend I spent a lot of time improving my new
Shard-Query tool (code.google.com/p/shard-query) and the
improvements can equate to big performance gains on partitioned
data sets versus executing the query directly on MySQL.
I'll explain this graph below, but lower is better (response
time) and Shard-Query is the red line.
MySQL understands that queries which access data in only certain
partitions don't have to read the rest of the table. This
partition elimination works well, but MySQL left a big
optimization out of partitioning: …
I created a new tool this week:
http://code.google.com/p/shard-query
As the name Shard-Query suggests, the goal of the tool is to run
a query over multiple shards, and to return the combined results
together as a unified query. It uses Gearman to ask each server
for a set of rows and then runs the query over the combined set.
This isn't a new idea, however, Shard-Query is different than
other Gearman examples I've seen, because it supports
aggregation.
It does this by doing some basic query rewriting based on the
input query.
Take this query for example:
select c2, sum(s0.c1), max(c1) from t1 as s0 join t1 using (c1,c2) where c2 = 98818 group by c2;
The tool will split this up into two queries.
This first query will be sent to each shard. Notice that …
MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the entire contents of the view before any filter is applied. Computing the entire contents requires a temporary table and many more rows may be accessed than otherwise would had the filter been merged into the view query.
As long as a view avoids aggregation, DISTINCT, GROUP BY, ORDER BY, etc, then it can use the MERGE algorithm. Unfortunately, this means that views of any significant complexity will almost always use the TEMPTABLE algorithm.
This blog post demonstrates a workaround that allows the convenience of complex views, including those with aggregation (that is views …
[Read more]The problem I am going to describe is likely to be around since the very beginning of MySQL, however unless you carefully analyse and profile your queries, it might easily go unnoticed. I used it as one of the examples in our talk given at phpDay.it conference last week to demonstrate some pitfalls one may hit when designing schemas and queries, but then I thought it could be a good idea to publish this on the blog as well.
To demonstrate the issue let’s use a typical example – a sales query. Our data is a tiny store directory consisting of three very simple tables:
PLAIN TEXT SQL:
- CREATE TABLE `products` (
- `prd_id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
- `prd_name` varchar(32) NOT NULL,
- PRIMARY KEY (`prd_id`),
- KEY …
InnoDB Plugin 1.1 doesn’t add any recovery specific improvements on top of what we already have in Plugin 1.0.7. The details on the latter are available in this blog. Yet, when I tried to recover another big recovery dataset I created, I got the following results for total recovery time:
- Plugin 1.0.7: 46min 21s
- Plugin 1.1: 32min 41s
Plugin 1.1 recovery is 1.5 times faster. Why would that happen? The numerous concurrency improvements in Plugin 1.1 and MySQL 5.5 can’t really affect the recovery. The honor goes to Native Asynchronous IO on Linux. Let’s try without it:
- Plugin 1.1 with –innodb-use-native-aio=0: 49min 07s
which is about the same as 1.0.7 time. My numerous other recovery runs showed that the random fluctuations account for 2-3min of a …
[Read more]
We were discussing the recommendations we issue each quarter
around MySQL and the question of using InnoDB plugin came up. We
usually follow Planet MySQL closely, so we read what the blogs
had to say and it was all good, but we decided to provide our
users some data of our own. We used our own sysbench
tests on to get the information we needed.
A Word About BenchmarksI don't trust most of the benchmarks that
are published online because they really apply to the use case of
whomever is writing the article. They are usually many factors
that can influence them and I find it difficult to apply them
as-is to our environment.
I do trust the benchmarks published online as a reference on how
to create and run our own benchmarks. So this article is based on
this premise. I recommend you to do your own homework to …