Showing entries 1 to 2
Displaying posts with tag: datamining (reset)
SQL to Hadoop and back again, Part 1: Basic data interchange techniques

I’ve got a new article, which is part of a new three-part series, on moving data between SQL and Hadoop, both the export to Hadoop and importing processed content back into an SQL store.

In this first one, we look at the basic mechanics and considerations before you start the migration of data, such as the data format, content, and export techniques.

Read: SQL to Hadoop and back again, Part 1: Basic data interchange techniques


Terabytes is not big data, petabytes is

I often wonder what's behind the increased trend behind Hadoop and other NoSQL technologies. I realize if you're Yahoo that such technology makes sense. I don't get why everyone else wants to use it.

Reading Stephen O'Grady's self-review of his predictions for 2010 for the first time gave me some insights into how such people think:

Democratization of Big Data

Consider that RedMonk, a four person analyst shop, has the technical wherewithal to attack datasets ranging from gigabytes to terabytes in size. Unless you’re making institutional money, budgets historically have not permitted this. The tools of Big Data have never been more accessible than they are today.

read more

Showing entries 1 to 2