The idea of storing data in a format that is specific to an
application should be summarily rejected. It is a throw back to
the earliest days of the computer industry when every application
stored its data uniquely.
Once general purpose data processing applications came along in
the form of database engines, and, in particular, SQL and many
applications that could work on any data through SQL statements
storing data in application unique formats was a no-no because
doing so eliminated the value of all those applications.
However the MySQL structure that separates the database engine
from the storage engine creates an opportunity to get the best of
both worlds. The storage engine is responsible only for the
physical storage and retrieval of the data, while the database
engine provides most of the functions. Data that is stored using
a custom storage engine works transparently as part of all the
function that MySQL …
The idea of storing data in a format that is specific to an
application should be summarily rejected. It is a throw back to
the earliest days of the computer industry when every application
stored its data uniquely.
Once general purpose data processing applications came along in
the form of database engines, and, in particular, SQL and many
applications that could work on any data through SQL statements
storing data in application unique formats was a no-no because
doing so eliminated the value of all those applications.
However the MySQL structure that separates the database engine
from the storage engine creates an opportunity to get the best of
both worlds. The storage engine is responsible only for the
physical storage and retrieval of the data, while the database
engine provides most of the functions. Data that is stored using
a custom storage engine works transparently as part of all the
function that MySQL …
ColdStore's performance advantage isn't magic, nor is it a
demonstration of a poor MyISAM implementation. It performs better
than MyISAM because it maintains the data on disk in a layout
that is optimal for the kinds of SELECTs that the ColdLogic
analysis applications typically require, bearing in mind how real
disks actually perform. It wraps together the effect of MyISAM's
INSERT and ALTER TABLE ... ORDER BY, but it is doing it at a
granularity that avoids any lock out of SELECTs.
(ALTER TABLE... ORDER BY locks out SELECTs for
weeks for a table with just a few hundred
million rows).
It performs better than ALTER TABLE...ORDER BY because it reads
the disk more "intelligently" for it's specification application
and is manages memory for rows itself rather than leaving it to
the operating system. (See "Second SELECT" …
ColdStore's performance advantage isn't magic, nor is it a
demonstration of a poor MyISAM implementation. It performs better
than MyISAM because it maintains the data on disk in a layout
that is optimal for the kinds of SELECTs that the ColdLogic
analysis applications typically require, bearing in mind how real
disks actually perform. It wraps together the effect of MyISAM's
INSERT and ALTER TABLE ... ORDER BY, but it is doing it at a
granularity that avoids any lock out of SELECTs.
(ALTER TABLE... ORDER BY locks out SELECTs for
weeks for a table with just a few hundred
million rows).
It performs better than ALTER TABLE...ORDER BY because it reads
the disk more "intelligently" for it's specification application
and is manages memory for rows itself rather than leaving it to
the operating system. (See "Second SELECT" …
ColdStore was produced to solve a particular problem - a
performance problem for a particular application. The purpose of
making the information available about ColdStore is to find out
if there is any interest in any broader use for the solution. If
there is interest, we will make it available in an appropriate
form.
If you think ColdStore might be of use to you, feel free to
contact us. Compare your application requirements with this. Let us know something about what you want to
d with it and let's figure out how to go forward.
ColdStore was produced to solve a particular problem - a
performance problem for a particular application. The purpose of
making the information available about ColdStore is to find out
if there is any interest in any broader use for the solution. If
there is interest, we will make it available in an appropriate
form.
If you think ColdStore might be of use to you, feel free to
contact us. Compare your application requirements with this. Let us know something about what you want to
d with it and let's figure out how to go forward.
We are continually measuring and tuning the performance of
ColdStore. The latest results show a 1100 times
improvement over the straightforward use of MyISAM and about
27 times faster than optimized MyISAM.
This is for the select
SELECT date,sum(sales) FROM product_list JOIN bigtable
USING(product) GROUP BY date
That is to say: determine the sales over time for a selection of
products. The time series was two thousand dates, and the product
list was one thousand items. Using ColdStore for 'bigtable' MySQL
inspected
486,898 rows per second. Simple MyISAM inspected 424 rows per
second. Optimized MyISAM 15,591.
We are continually measuring and tuning the performance of
ColdStore. The latest results show a 1100 times
improvement over the straightforward use of MyISAM and about
27 times faster than optimized MyISAM.
This is for the select
SELECT date,sum(sales) FROM product_list JOIN bigtable
USING(product) GROUP BY date
That is to say: determine the sales over time for a selection of
products. The time series was two thousand dates, and the product
list was one thousand items. Using ColdStore for 'bigtable' MySQL
inspected
486,898 rows per second. Simple MyISAM inspected 424 rows per
second. Optimized MyISAM 15,591.
For a test database of 182M rows(about 3BG of data file), the
optimized MyISAM index was about 1.5GB. The unoptimized index
file was about 2.5GB. The ColdStore index was a little over
half a megabyte, which is about 3000 times smaller than the
optimized MyISAM index.
You don't get this kind of size reduction by packing the data
more carefully. It is due to a completely different index design
conception.
The application for which the ColdStore Storage Engine was
developed has some special requirements, but requirements that
are by no means unique.
The application is essentially trivial. It stores data (such as
Sales or Stock levels) for a number of Products on a number of
Days, and it retrieves these data for analysis. The data arrives
in real time (eg daily). The analysis is typically by product.
The appication could be as simple as:
INSERT INTO BIGTABLE (product,date,sales) .....
SELECT product,date,sales FROM BIGTABLE where product in (X)
ORDER BY DATE
Obviously BIGTABLE is indexed on PRODUCT in order to make the
SELECT run fast.
What makes the application complicated is this:
1. There are millions of products.
2. The data arrives daily, including sales for hundreds of
thousands of products, for that date.
3.The analysis (and …