I started this as a response to Keith Murphy’s post at http://www.paragon-cs.com/wordpress/?p=54, but it
got long, so it deserves its own post. The basic context is
figuring out how not to cause duplicate information if a large
INSERT statement fails before finishing.
Firstly, the surefire way to make sure there are no duplicates if
you have a unique (or primary) key is to use INSERT IGNORE
INTO.
Secondly, I just experimented with adding an index to an InnoDB
table that had 1 million rows, and here’s what I got (please
note, this is one experience only, the plural of “anecdote” is
*not* “data”; also I did this in this particular order, so there
may have been caching taking place): More »
So, O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best
Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html.
Sadly, I find most “best practice” list do not thoroughly explain
the “why” enough so that people can make their own
decisions.
For instance, #3 is “Protect the MySQL installation directory
from access by other users.” I was intrigued at what they would
consider the “installation” directory. By reading the tip, they
actually mean the data directory. They say nothing of the log
directory, nor that innodb data files may be in different places
than the standard myisam data directories. More »
It's been a while since I've written about progress on the book. I actually stopped working on it as much at the beginning of the month, because on October 31(st) I managed to finish a first draft of the last big chapter! Now I'm back to full-time work at my employer, and I'm working on the book in the evenings and weekends only. Read on for details of what I've been working on and what's next in the pipeline.
aka…..”when good queries go bad!”
So, today the developers were debugging why a script was running
much longer than expected. They were doing text database inserts,
and got to the point where they realized that double the amount
of text meant the queries took double the amount of time.
You see, they were doing similar text inserts over and over,
instead of using connection pooling and/or batching them.
Apparently the other DBA explained that it was a limitation of
MySQL, but either the developers didn’t convey what they were
doing well, or the DBA didn’t think to mention batching. More »
Let’s look into selectivity, as this is an important topics when looking at index performance. (Oooh, I said “performance”, watch everyone’s ears perk up!).
This will probably answer the questions “Why isn’t MySQL using my index?” or “Why is my query so slow when I have an index on it?”
Selectivity describes how different values of a field are. It is a number from 0-1, although you can also think of it as a percentage. A value of 1, or 100%, means that each value in the field is unique. This happens with UNIQUE and PRIMARY keys, although non-unique fields may have a selectivity of 1 — for example, a timestamp value in a not-often-used table.
To calculate this, you take the total number of DISTINCT records and divide by the total number of records.
My company has a large Users table, so I grabbed some statistics off of that:
+----------+
| …
Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.
MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:
SELECT foo FROM t1 INNER JOIN t2 USING
(commonField);
The optimizer looks at the statistics for tables t1 and t2 and
decides which is better:
1) To go through each item in t1, looking for a matching
“commonField” in t2
or
2) To go …
Many people have some kind of reporting or auditing on their database. The problem is that the data grows very large, and lots of times there is data that can be purged. Sure, theoretically one never needs to purge data, but sometimes a “delete” flag just won’t work — when you search on the delete flag, a full table scan may be the most efficient way to go.
Of course, that’s not acceptable. And in many cases, say when you have users who no longer use the site but did in the past (and perhaps have billing data associated with them), you never want to get rid of them.
So what to do? Make a special reporting database, that gathers information from the production database(s). Use MyISAM tables, because a reporting server can afford to be behind the master, and MyISAM is better for reporting — better metadata. For something like a “Users” table, make 2 more tables:
1) DeletedUsers
2) AllUsers
…
[Read more]
So, in a previous post I talked about smart code.
Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of
values; similar to a pull-down menu, where the only values
allowed in that field are the ones defined, with the option of
also having a null column.
The ENUM field takes the idea of normalizing the data and eliminates the need
for a join on that data. It also makes data integrity easy — if
the value you’re trying to enter is not in the ENUM column
definition, MySQL throws an error.
ENUM is not a standard SQL data type. It is MySQL specific.
As an example, in the real world I run a database for an international not-for-profit. …
[Read more]So, the other day I was asked by a developer to come up with a table to tally votes. Basically, there are 6 choices, and customers can vote once per day. I asked if there were any other constraints, and there were none. I specifically asked if they wanted once per calendar date, or ‘it has to be 24 hours since the last vote’; they wanted ‘once per calendar date’. And updating the current day’s vote is not allowed. Once the vote is in, it cannot be changed.
So I came up with a simple table:
CREATE TABLE `ManOfMonth` (
`uid` int(10) unsigned NOT NULL default ‘0′,
`voteDate` date NOT NULL default ‘0000-00-00′,
`uidVoteFor` int(10) unsigned NOT NULL default ‘0′,
PRIMARY KEY (`uid`,`voteDate`),
KEY `countVotes` (`uidVoteFor`)
)
There’s no need for a timestamp, and you can select now() into the voteDate part, and MySQL truncates the data for you (we’re using …
[Read more]