I'm working on alternative strategies to make the use and maintenance of a multi-terabyte data warehouse implementation tolerably fast. For example, it's clear that a reporting query on a 275-million row table is not going to be fun by anyone's definition, but that for most purposes, it can be pre-processed to various aggregated tables of significantly smaller sizes.
However, what is not obvious is what would be the best strategy for creating those tables. I'm working with MySQL 5.0 and Business Objects' Data Integrator XI, so I have a couple of options.
I can just CREATE TABLE ... SELECT ... to see how things work out. This approach is simple to try, but essentially unmaintanable; no good.
I can define the process as a BODI data flow. This is good in many respects, as it creates a documented flow of how the aggregates are updated, is fairly easy to hook up to the workflows which pull in new data from source systems, and …
[Read more]