As this topic came up a few times this week for discussion at various places, I thought of composing a post on “Data Scientist vs. Data Analytics Engineer”; even though[...]
“Some people even think that “Hadoop” and “Big Data” are synonymous (though this is an over-characterization). Unfortunately, Hadoop was designed based on a paper by Google in 2004 which was focused on use cases involving unstructured data (e.g. extracting words and phrases from Webpages in order to create Google’s Web index). Since it was not [...]
Apart from my consulting as part of ScaleIn, I also invest to bootstrap companies with really disruptive ideas; and in the process met few database specific companies who are already[...]
Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]
“Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes.
As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL.
The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to …
[Read more]Two cons against NoSQL data stores read like this: 1. It’s very hard to move data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot [...]
In a typical organization, all work together to bring out a common good for the outside world. It’s interesting to see how all of these entities blog about technology, and there is more and more interest shown by managerial technologists about the database. This Log Buffer Edition appeases their appetites along with the others in [...]
Introduction
"Improving MySQL performance using Hadoop" was the talk which I and Manish Kumar gave at Java One & Oracle Develop 2012, India. Based on the response and interest of the audience, we decided to summarize the talk in a blog post. The slides of this talk can be found here. They also include a screen-cast of a live Hadoop system pulling data from MySQL and working on the popular 'word count' problem.
MySQL and Hadoop have been popularly considered as 'Friends with benefits' and our talk was aimed
at showing how!
The benefits of MySQL to developers are the speed, reliability,
data integrity and …
Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"I know GigaOM
like to provoke scandals sometimes, we all remember some other
unforgettable piece, but there is something behind
it...
Hadoop today (after SOA not so long ago) is one of the worst case
of an abused buzzword ever known to men. It's everything,
everywhere, can cure illnesses and do "big-data" at the same
time! Wow! Actually Hadoop is a software framework that
supports data-intensive distributed applications, derived from
Google's MapReduce and Google File System (GFS) papers.
My take from the article is this: Hadoop is a foundation,
low-level platform. I used the word …
“Legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”– Dr. John Busch, Schooner Information Technology A super-set of MySQL suitable for Big Data? On this subject, I have interviewed Dr. John Busch, Founder, Chairman, [...]