MySQL excels as a strong solution for web-based solutions. MySQL’s extremely fast read rates and ability to scale horizontally with replication makes MySQL a popular low cost of ownership platform for web-based applications. The next area I expect MySQL to encounter significant growth is in the data warehousing market. MySQL’s fast reads and horizontal scalability makes it a strong
When your company decides that "it is time to build a data warehouse", what thoughts come to mind?1) A magical fairy ice cream land where data is presented in chocolate shells for everyone to digest perfectly;2) A big literal warehouse in the industrial section of town with rusty old containers;3) Another place to put data, which means another place for you to track and monitor additional
Pentaho Data Integration (aka Kettle) can be used for ETL but it can also be used in EII scenarios. For instance, you have a report that can be run from a customer service application that will allow the customer service agent to see the current issues/calls up to the minute (CRM database) but also give a strategic snapshot of the customer from the customer profitability and value data mart (data warehouse). You’d like to look a this on the same report that with data coming from two different systems with different Operating Systems and databases.
Kettle can make short work of this using the integration Pentaho provides and the ability to SLURP data from an ETL transform into a report without the need to persist to some temporary or staging table. The thing that Pentaho has NOT made short work of, is being able to use the visual report authoring tools (Report Designer and Report Design Wizard) to be able to use a Kettle transform as a …
[Read more]Kettles secret in-memory database is
- Not actually secret
- Not actually Kettles
There. I said it, and I feel much better.
In most circumstances, Kettle is used in conjunction with a
database. You are typically doing something with a database:
INSERTs, UPDATEs, DELETEs, UPSERTs, DIMENSION UPDATEs, etc. While
I do know of some people that are using Kettle without a database
(think log munching and summarization) a database is something
that a Kettle developer almost always has at their disposal.
Sometimes there isn’t a database. Sometimes you don’t want the slowdown of persistence in a database. Sometimes you just want Kettle to just have an in memory blackboard across transformations. Sometimes you want to ship an example to a customer using database operations but don’t want to fuss with database install, dump files, etc.
Kettle ships with a Hypersonic driver, and therefore, …
[Read more]
Wow! It has really been 4 months since my last post?? Moving over
to development has cut into the time I had for blogging,
documenting, communicating, you name it! We are coding like
crazy.
Well, I'm back because I am heading out to ODTUG
Kaleidoscope next week, and in preparation for the show, I
decided to setup Pentaho on Oracle's Java Edition App Server,
which is OC4J, which is based on the Orion app server. I was
pleased that I managed the migration in less than a day, and I
wanted to share the steps with all those folks who are too
impatient to wait for this to get into our J2EE deployment
distribution :)
Mind you, it takes a bit of tweaking, but it is certainly very
do-able, and all server features are stable (minus the portal
stuff, I didn't get a chance to address moving the portal over).
Here is the repro of where I started, what I …
I’m visiting a Pentaho customer right now whose current “transaction” volume is 200 million rows per day. Relatively speaking, this puts their planned warehouse in the top quintile of size. They will face significant issues with load times, data storage, processing reliability, etc. Kettle is the tool they selected and it is working really well. Distributed record processing using Kettle and a FOSS database is a classic case study for Martens scale out manifesto.
This organization doesn’t have unlimited budget. Specifically, they don’t have a telecom type budget for their telecom like volume of data. One of the issues that has come up with their implementation has been the tradeoff between space, and keeping the …
[Read more]The standard Pentaho demo download is super quick and easy: there’s no installation and it just works. You double click start-pentaho.bat and then it’s running in http://localhost:8080.
However, sometimes you may want to share this demo with others. Roland Bouman has a nice blog entry on the specifics of how to change the demo install into a server.
I add the following line to my start-pentaho.sh to make the
hostname changing transparent.
sed -i -e “s/http:\/\/.*:8080/http:\/\/`hostname -f`:8080/” jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml
This allows one to move this “pentaho” to any system and it will startup properly with the …
[Read more]