MySQL always tries to make toast of my good ideas. This is the
only time is succeeds in making toast (see bug #2). |
This time I'm working on star schema optimzation (which will be
my next blog post). MySQL takes about .24 seconds to
project out a list of parts that match a filter, but it takes
43 (not .43, FORTY-THREE) seconds to turn it into a CSV
list using GROUP_CONCAT.
select P_PartKey from part where where P_MFGR =
'MFGR#1' or P_MFGR='MFGR#2';
399597 rows in set (0.24 sec)
mysql> select group_concat(P_PartKey) into @pkey
from part where P_MFGR = 'MFGR …
MySQL always tries to make toast of my good ideas. This is the
only time is succeeds in making toast (see bug #2). |
This time I'm working on star schema optimzation (which will be
my next blog post). MySQL takes about .24 seconds to
project out a list of parts that match a filter, but it takes
43 (not .43, FORTY-THREE) seconds to turn it into a CSV
list using GROUP_CONCAT.
select P_PartKey from part where where P_MFGR =
'MFGR#1' or P_MFGR='MFGR#2';
399597 rows in set (0.24 sec)
mysql> select group_concat(P_PartKey) into @pkey
from part where P_MFGR = 'MFGR#1' or
P_MFGR='MFGR#2';
Query OK, 1 row affected (43.25 sec)
So I decided to just stick the part table into FastBit instead
(using my FastBit_UDF tools):
fb_create('/var/lib/fastbit/part', …
In the very early days of Percona Vadim wrote very nice post about GROUP_CONCAT.
But I want to show you a bit more about it.
When is GROUP_CONCAT useful? Usually while working with Support customers I recommend it when you have aggregation of many-to-many info. It makes the view simpler and more beautiful and it doesn’t need much effort to make it work.
Some simple examples:
This is a test table:
CREATE TABLE `group_c` ( `parent_id` int(11) DEFAULT NULL, `child_id` int(11) DEFAULT NULL ) ENGINE=InnoDB; INSERT INTO …[Read more]
Many out there will have different ideas about this, some using
procs, some using a function, others using a shell script. Well I
didn’t want to spend much time on it so decided a
group_concat(concat would be enough.
There is no genius, rather laziness :) but what if you have a
hundred databases and you want to drop them all?
mysql Thu Mar 3 13:50:06 2011 > pager sed 's/,/ /g' PAGER set to 'sed 's/,/ /g'' mysql Thu Mar 3 13:50:32 2011 > select group_concat(concat('drop database ',SCHEMA_NAME,';')) from information_schema.schemata where SCHEMA_NAME !='mysql' and SCHEMA_NAME !='information_schema'; +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | group_concat(concat('drop database ' SCHEMA_NAME ';')) …[Read more]
In my opinion, one of the best things that happened to Planet MySQL
lately, is Explain Extended, a blog by Alex Bolenok (also
known as Quassnoi on Stackoverflow).
I never had the pleasure of meeting Alex in person, but his
articles are always interesting and of high quality, and the SQL
wizardry he pulls off is downright inspiring. I really feel
humbled by the creativity of some of his solutions and his
apparent experience with multiple RDBMS products.
Alex' most recent post is about aggregation, and
finding a top 3 based on the aggregate:
In …
I just read SQL: Ranking without self join, in which
Shlomi
Noach shares a nice MySQL-specific trick based on user-defined variables to compute rankings.
Shlomi's trick reminds me somewhat of the trick I came across
little over a year ago to caclulate percentiles. At that time, several
people pointed out to me too that using user-defined variables in
this way can be unreliable.The problem with user-defined
variablesSo what is the problem exaclty? Well, whenever a query
assigns to a variable, and that same variable is read in another
part of the query, you're on thin ice. That's because the …
Abstract - As described by Walter Heck, MySQL database size
can be visualized using Google
Charts. With a minor code improvement the URL for the chart
can be obtained twice as fast. With some more modification, the
number of lines can be cut down resulting in a function that is
half as long.
Hi!It's been a while since I posted - I admit I'm struggling for
a bit to balance time and attention to the day job, writing a
book, preparing my talks for the MySQL user's conference and of course family
life.
A month ago or so I …
After stepping off of the GROUP_CONCAT()
solution
for calculating quantiles I figured it would be
nice to find a better way to calculate the median too.
Solution
I previously wrote on how to calculate the median using
GROUP_CONCAT()
, but I think that this is a better
way:
[Read more]
SELECT AVG(length) AS median -- take the average of left and right median
, MIN(length) AS left_median --
, MAX(length) AS right_median --
, @l AS left_median_position --
, @r AS right_median_position --
FROM (
SELECT @n, length -- @n is just here to …
My previous post on calculating percentiles with
MySQL generated some comments and good discussion. In particular,
I got some very interesting comments from Vladimir.
Basically, Vladimir was doubtful whether the
GROUP_CONCAT()
solution would be optimal in
comparison to a JOIN
. His proposal is to solve it
like this:
SELECT SUM(g1.r) sr
, g2.length l
, SUM(g1.r)/(SELECT COUNT(*) FROM film) p
FROM (SELECT COUNT(*) r, length FROM film GROUP BY length) g1
JOIN (SELECT COUNT(*) r, length FROM film GROUP BY length) g2
ON g1.length < g2.length
GROUP BY g2.length
HAVING p > 0.9
ORDER BY p
LIMIT 1
First, this query sets up two identical subqueries in …
Yesterday, I was on the freenode ##pentaho irc channel when
Andres
Chaves asked me how to calculate the Nth percentile in MySQL. He saw a
solution somewhere using subqueries, but wasn't too happy about
it.
A while ago I wrote about calulating the median in MySQL, and it turns
out the Nth percentile can be calculated using a
similar, single-pass approach, not relying on subqueries, UDFs,
or user-defined variables.
The percentile....
So, what is a percentile exactly? Here's what the wikipedia
says:
A percentile is the value of a variable below which a certain
percent of observations fall. So the 20th percentile is the value
(or score) below which 20 …