Jos, my
co-author for the "Building Pentaho Solutions" book just
pointed me to a recent article by Jeff Prenevost entitled
"The Problem with History".AbstractJeff's
topic, loading a hybrid Type 1 / Type 2 slowly changing dimension table is related to
data warehousing but maybe of interest outside of
that context as well.
As it turns out, the particular problem described by Jeff is
non-trivial, but can be solved quite elegantly in a single SQL
statment. This may be a compelling alternative to the multi-step, …
I just read this post on Matt Casters' blog. Here, Matt describes why
Element
61's Jan Claes is dead wrong in the way he assesses the maturity of open source ETL tools.
Well, I've just read Jan Claes' article in the "research and insights" area of the Element61
website, and frankly, it is pretty easy to see how
unsubstantiated it is. Some may be tempted to classify the
article as …
With open source software I can install reasonably complete software and try it with my data. This way I get to see how it works in a realistic setting without having to rely on benchmarks and hoping they are a good match for my environment. And I get to do this without having to deal with commercial software sales people.
So I glad to hear the Infobright had gone open source as I have been wanting test a column based database for a while. I was even happier that it was a MySQL based engine as I would already know many of the commands. I decided to run some of the same tests I had run when comparing InnoDB and MyISAM for reporting (http://dbscience.blogspot.com/2008/08/innodb-suitability-for-reporting.html ). InnoDB performed better than MyISAM in my reporting tests so I’m going to compare Infobright to InnoDB.
The …
[Read more]I started using Oracle, a MVCC database, to develop reporting (data warehousing, BI, take your pick) systems years ago. I’ve come to appreciate the scalability improvements that MVCC provides, particularly for pseudo real-time reporting applications, the ones where loads are occurring at the same time as report generation. So when people say InnoDB, partly due to MVCC, isn’t as good as MyISAM for reporting I had to look into this in more detail.
What I found is InnoDB is a good engine for reporting. In some ways, such as performance, it is at times better than MyISAM, and one of the downsides, such as a larger disk requirement, can be mitigated. The trick is to for the primary key to be the one predominant access path. In this example, the InnoDB clustered index, is purchaseDate and another column, such as orderId is added to make it unique. This has a number of advantages. In my experience, …
[Read more]