The other day I was looking for a open source, feature-rich, high performance ETL tool to use in an enterprise environment. I was disappointed nothing really seemed to match my requirements. Have I overlooked something or is this really a niche where there aren’t any viable projects? After looking in the usual places like sourceforge.net and doing a bunch of Google searches. I could not find any products that fit the bill. Here are (some of) my criteria:
- Fast. The candidate tool has to be able to move huge amounts of information between the source and target databases quickly.
- Flexible error handling. Data errors occur all the time, and when errors are encountered, we should be able to stop processing or log the error to a file or push the record into a violations table for subsequent processing. There are probably other popular strategies for handling errors, such as changing the offending data and trying to insert it …