As Matt Asay recently mentioned in his post about Kickfire, the company just closed a Series B for $20 million. In today’s credit-scarce market where VC funding is flat/declining, $20 million is a lot of money, especially for a company whose product is still in beta. What’s more, there seems to be an investment bubble in the broader data warehousing space in which Kickfire participates (at last count, there were over two dozen vendors, the majority of which are relatively new entrants) and that bubble looks like it is starting to burst as witnessed by Microsoft’s recent acquisition of DATAllegro. So, are the Kickfire investors misguided or is there something more here …
[Read more]My name is Ravi Krishnamurthy - I am the Chief Software Architect here at Kickfire. I’ll be blogging about our thoughts on database technologies for data warehousing. More specifically I’ll be talking about current challenges, directions going forward, and the simplifications for wider market deployments and other ideas.
Data Warehouse (DW) queries are known to be more complex, more demanding, and longer running than OLTP queries. Some of the distinctive features of these DW queries that produce these characteristics are:
1) Table scan: Most OLTP queries are point queries updating or inserting a few transactional data. Most DW queries on the other hand are reporting or business intelligence (BI) queries which typically touch large numbers of rows of data, often computed by sequential table scans over the large data sets.
2) Many/complex joins: Multiple tables with many joins in the …
[Read more]Following on from the announcement at the MySQL conference where Sun and Kickfire jointly announced data warehousing benchmark records, we have just announced new TPC-H benchmark records. Specifically, the Kickfire Database Appliance 2400 is the highest price/performance offering at 300GB, again breaking the $1 barrier for the first time coming in at 89 cents per QphH (Queries per hour on the TPC-H benchmark). The 2400 is also the highest performance (non-clustered) offering at 300GB.
I’m not going to further dwell on the numbers in this post other than to quickly point out another aspect of this achievement that Justin noted in his blog related to the energy savings the Kickfire …
[Read more]
I need to generate large (1TB-3TB) synthetic MySQL datasets for
testing, with a number of requirements:
a) custom output formatting (SQL, CSV, fixed-len row, etc)
b) referential integrity support (ie, child tables should
reference PK values, no orphans,etc)
c) able to generate multiple tables in parallel
d) preferably able to operate without a GUI and/or manual
intervention
e) uses a well defined templating construct for data
generation
f) preferably open source
Does anyone out there know of a product that meets at least most
of these requirements?
*edit*
I found a PHP based data generation script (www.generatedata.com)
that is extensible in its output formatting, so it should do
everything I need it to do.
I finally finished my first data warehouse! and it only took me 3
days!
Well, to be fair, the data warehouse design was already planned
and it wasn't really that big anyway, but I am still happy about
it.
I was asked on Monday to do a data warehouse for my company's
head quarters in Germany. I work in Beijing, so its like.... very
slow to connect to there. They gave me the database design, some
SQL statements to generate a few dimensions and "rough" business
rules for the data.
Now, I haven't done anything like this before, but I really
wanted to try. So I did it my way.
My way is to use a lot of Views with long SQL statements instead of cursors or
stored procedures. I like it this way, because I feel like I can
see the data and catch problems instead of programming blindly to …
Today, we officially launched Kickfire. As part of our announcement we published, together with Sun Microsystems, record-breaking TPC-H benchmark numbers (data warehousing industry benchmarks) as well as a series of significant partnerships in the Open Source world.
There has been a lot of work here over the last two years to get us to this point and I am very proud of the team for getting us to where we are today. Two years ago we just had a vision; today that vision became reality – one substantiated by independent industry benchmarks.
For those of you unfamiliar with these benchmarks let me give you a brief overview to explain why we …
[Read more]I spent the day Thursday with some of Kickfire’s engineers at their headquarters. In this article, I’d like to go over a little of the system’s architecture and some other details.
Everything in quotation marks in this article is a quote. (I don’t use quotes when I’m glossing over a technical point — at least, not in this article.)
Even though I saw one of Kickfire’s engineers running queries on the system, they didn’t let me actually take the keyboard and type into it myself. So everything I’m writing here is still second-hand knowledge. It’s an unreleased product that’s in very rapid development, so this is understandable.
Kickfire’s TPC-H benchmarks are now published, so you can see the results of what I’ve been seeing them work on. They …
[Read more]Some of you have noticed Kickfire, a new sponsor at this year’s MySQL Conference and Expo. Like Keith Murphy, I have been involved with them for a while now. This article explains the basics of how their technology is different from the current state of the art in complex queries on large amounts of data.
Kickfire is developing a MySQL appliance that combines a pluggable storage engine (for MySQL 5.1) with a new kind of chip. On the surface, the storage engine is not that revolutionary: it is a column-store engine with data compression and some other techniques to reduce disk I/O, which is kind of par for the course in data warehousing today. The chip is the really exciting part of the technology.
The simplest description of their chip is that it …
[Read more]One of the enhancements I added to MySQL Archiver in the recent release was listed innocently in the changelog as "Destination plugins can now rewrite the INSERT statement." Not very exciting or informative, huh? Keep reading.
Progress on High Performance MySQL, Second Edition is coming along nicely. You have probably noticed the lack of epic multi-part articles on this blog lately -- that's because I'm spending most of my spare time on the book. At this point, we have significant work done on some of the hardest chapters, like Schema Optimization and Query Optimization. I've been deep in the guts of those hard optimization chapters for a while now, so I decided to venture into lighter territory: Backup and Recovery, which is one of the few chapters we planned to "revise and expand" from the first edition, rather than completely writing from scratch. I'd love to hear your thoughts and wishes -- click through to the full article for more details on the chapter and how it's shaping up.