Planet MySQL

Displaying posts with tag: big data (reset)

Aug

2011

Posted by Tony Bain on Mon 01 Aug 2011 02:50 UTC
Tags:

web 2.0, Data Integration, startups, hadoop, business intelligence, Database Management, Cloud Databases, big data, NoSQL, MySQL

In life there are really two major types of data analytics. Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting. This is broadly called discovery. Secondly, we already know what we want to know – we just need analytics to tell us this information, often repeatedly and as quickly as possible. This is called anything from reporting or dashboarding through more general data transformation and so on.

Typically we are using the same techniques to achieve this. We shove lots of data into a repository of some from (SQL, MPP SQL, NoSQL, HDFS etc) then run queries/ jobs/ processes across that data to retrieve the information we care about.

Now this makes sense for data discovery. If we don’t know what we want to know, having lots of data in a big pile that we can slice and dice in interesting ways is good. But when we already know what …

[Read more]

Jul

2011

What Scales Best?

Posted by Tony Bain on Fri 29 Jul 2011 04:07 UTC
Tags:

Oracle, web 2.0, startups, Microsoft SQL Server, business intelligence, Database Management, Cloud Databases, big data, Relational DB, NoSQL, MySQL

It is a constant, yet interesting debate in the world of big data. What scales best? OldSQL, NoSQL, NewSQL?

I have a longer post coming on this soon. But for now, let me make the following comments. Generally, most data technologies can be made to scale - somehow. Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin. Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty.

What is best? Well that comes down to the resulting complexity, cost, performance and other trade-offs. Trade-offs are key as there are almost always significant concessions to be made as you scale up.

A recent example of mine, I was looking at scalability aspects of MySQL. In particular, MySQL Cluster. It is …

[Read more]

Jul

2011

This Weekend in Japan

Posted by Tokuview Blog on Mon 25 Jul 2011 17:41 UTC
Tags:

storage engine, Tokutek, japan, big data, TokuDB, TokuView, Fractal Tree™ indexes, MySQL

We were happy to see a lot of folks from Japan on Twitter this weekend having a discussion about MySQL and Tokutek. While we always endeavor to explain ourselves as simply as possible, hearing what users and peers have to say and ask in their native language is very helpful. Here is a sampling of several of the 30+ tweets and re-tweets (translations courtesy of a colleague I know from frequent past visits to Tokyo and Yokohama):

First, @frsyuki provided a general overview:

“TokuDB” 新種のMySQLストレージエンジン。INSERTが20〜80倍ほど速い、パーティションなしで数TBのデータを突っ込める、MVCCサポートなど。Fractal Treeというアルゴリズムを実装しているらしい。http://www.tokutek.com/

…

[Read more]

Jul

2011

Don’t Thrash: How to Cache your Hash on Flash

Posted by Tokuview Blog on Thu 07 Jul 2011 17:05 UTC
Tags:

storage engine, Tokutek, big data, TokuDB, TokuView, Fractal Trees, MySQL

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like a negative cache, …

[Read more]

Jun

2011

SQL access to CouchDB views : Easy Reporting

Posted by Nicholas Goodman on Thu 23 Jun 2011 01:23 UTC
Tags:

Uncategorized, couchdb, big data, DynamoBI

Following up on my previous blog about enabling SQL Access to CouchDB Views I thought I’d share what I think the single, biggest advantage is: The ability to connect, run of the mill, commodity BI tools to your big data system.

While the video below doesn’t show a PRPT it does show Pentaho doing Ad Hoc, drag and drop reporting on top of CouchDB with LucidDB in the middle, providing the connectivity and FULL SQL access to CouchDB. Once again, the overview:

BI Tools are commoditized; consider all the great alternatives available inexpensively (either in Open Source for free, Open Core, or even simply proprietary). Regardless of what solution you choose, these tools have fantastic, easy to …

[Read more]

Jun

2011

0.9.4 did not hit the 1 year mark!

Posted by Nicholas Goodman on Mon 20 Jun 2011 21:01 UTC
Tags:

Open Source, big data, DynamoBI

Our last LucidDB release was now, just more than 12 months ago on June 16, 2010. We were really really trying to beat the 1 year mark for our 0.9.4 release but we just couldn’t. A tenet of good, open source development is early and often and we need to do better. Since the 0.9.3 release we’ve:

Built out an entire Web Services infrastructure
Developed a wicked cool Admin user interface
Developed cool connectors to Hive, CouchDB
Built a whole ton of extensions (auto indexing, DDL generation, improved load routines)
Scriptable …

[Read more]

Jun

2011

HPCC vs Hadoop at a glance

Posted by Roland Bouman on Sat 18 Jun 2011 08:22 UTC
Tags:

Open Source, gpl, Pentaho, ETL, hadoop, agpl, business intelligence, big data, sqoop, NoSQL, Pig, Hive, Roxie, Thor, Apache v2 license, ECL, HPCC Systems

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems …

[Read more]

Jun

2011

SQL access to CouchDB views

Posted by Nicholas Goodman on Wed 08 Jun 2011 23:27 UTC
Tags:

Pentaho, couchdb, big data, DynamoBI

Following up on my first post on an alternative, more SQL-eee metadata driven approach to doing BI on Big Data, I’d like to share an example on how we can enable easy reporting on top of BIg Data immediately for CouchDB users. We’re very keen on discussing with CouchDB/Hive/other Big Data users about their Ad Hoc and BI needs; please visit the forum thread about the connector.

We’ve been working with some new potential customers on how to leverage their investment in Big Data (specifically Big Couch provided by the fine folks at Cloudant. In particular, this prospects users are …

[Read more]

Jun

2011

A different vision for the value of Big Data

Posted by Nicholas Goodman on Wed 08 Jun 2011 22:08 UTC
Tags:

couchdb, big data, DynamoBI

UPDATE: Think we’re right? Think we’re wrong? Desperate to figure out a more elegant solution to self service BI on top of CouchDB, Hive, etc? Get in touch, an d please let us know!

There’s a ton of swirling about Hadoop, Big Data, and NoSQL. In short, these systems have relaxed the relational model into schema(less/minimal) to do a few things:

Achieve massive scalability, resiliency and redundancy on commodity hardware for data processing
Allow for flexible evolution and disparity in content of data, at scale, over time
Process semi-structured data and algorithms on these (token frequencies, social graphs, etc)
Provide analytics and insights into customer behaviors using an exploding amount of data now …

[Read more]

Jun

2011

MySQL for Big Data

Posted by Frank Mash on Wed 01 Jun 2011 21:45 UTC
Tags:

big data, MySQL

An excerpt from article on mysql for big data published in Dow Jones Venture Wire by Scott Denne.

There is one possible solution to the problem that doesn't include companies having to buy new software tools or even an all-new database: With the right expertise, MySQL can be engineered to handle almost any data-intensive application. The only problem is that there's a shortage of people who have the expertise to make it work.

"There's a big time gap until we, as an industry, think we have data under control," said Frank Mashraqi, chief technology officer at MyLawsuit.com and former database chief at Fotolog Inc., a photo blogging site. "The roadmap to getting that expertise is very difficult and time doesn't allow for it."

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links