Planet MySQL

Displaying posts with tag: mongodb (reset)

Jun

2014

Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

Posted by Severalnines on Mon 16 Jun 2014 08:15 UTC
Tags:

Other, Data Integration, ETL, Migration, analytics, hadoop, talend, data migration, big data, mongodb, MySQL, hdfs, tokumx, clickstream

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

Test Case

We have an application …

[Read more]

Jun

2014

Best Practices for Partitioned Collections and Tables in TokuDB and TokuMX

Posted by Tokuview Blog on Fri 13 Jun 2014 15:47 UTC
Tags:

partitioning, TokuDB, mongodb, TokuView, MySQL, tokumx, partitioned collection

In my last post, I gave a technical explanation of the performance characteristics of partitioned collections in TokuMX 1.5 (which is right around the corner) and partitioned tables in relational databases. Given those performance characteristics, in this post, I will present some best practices when using this feature in TokuMX or TokuDB. Note that these best practices are designed for TokuMX and TokuDB only, which use …

[Read more]

Jun

2014

Understanding the Performance Characteristics of Partitioned Collections

Posted by Tokuview Blog on Tue 10 Jun 2014 14:21 UTC
Tags:

partitioning, TokuDB, mongodb, TokuView, Fractal Trees, MySQL, tokumx

In TokuMX 1.5 that is right around the corner, the big feature will be partitioned collections. This feature is similar to partitioned tables in Oracle, MySQL, SQL Server, and Postgres. A question many have is “why should I use partitioned tables?” In short, it’s complicated. The answer depends on your workload, your schema, and your database of choice. For example, this Oracle related post states “Anyone with un-partitioned databases over 500 gigabytes is courting disaster.” That’s not true for TokuDB or TokuMX. Nevertheless, partitioned tables are valuable; it’s why we …

[Read more]

May

2014

Webinar Replay, Slides & Q&A: Introducing ClusterControl 1.2.6 - Managing your MySQL, MariaDB & MongoDB Clusters

Posted by Severalnines on Mon 19 May 2014 13:10 UTC
Tags:

MySQL Cluster, aws, mariadb, mongodb, OpenStack, MySQL, clustercontrol, Galera cluster, MariaDB Galera Cluster, tokumx

May 19, 2014 By Severalnines

Thanks to everyone who attended and participated last week’s joint webinar on ClusterControl 1.2.6! We had great questions from participants (thank you), most of which are transcribed below with our answers to them.

If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.

Webinar topics discussed:

Database Infrastructure Lifecycle
Deploy, Monitor, Manage, Scale
MySQL, MariaDB & MongoDB Clusters
ClusterControl Overview & Demo
ClusterControl New Features in 1.2.6 & Demo
Centralized Authentication using LDAP or Active Directory

[Read more]

May

2014

Thoughts on Small Datum – Part 3

Posted by Tokuview Blog on Fri 09 May 2014 13:00 UTC
Tags:

Tokutek, big data, TokuDB, NoSQL, mongodb, TokuView, newsql, MySQL, tokumx, small data

Background: If you did not read my first blog post about why I am sharing my thoughts on the benchmarks published by Mark Callaghan on Small Datum you may want to skim through it now for a little context: “Thoughts on Small Datum – Part 1”

~~~~~~~~~~~~~~~~~~~~~~~~

Last time, in “Thoughts on Small Datum – Part 2” I shared my cliff notes and a graph on Mark Callaghan’s (@markcallaghan) March 11th insertion rate benchmarks using flash storage media. In those tests he compares MySQL outfitted with the …

[Read more]

May

2014

Interview with John Partridge, President & CEO of Tokutek, Inc.

Posted by Roberto V. Zicari on Fri 09 May 2014 08:01 UTC
Tags:

Uncategorized, benchmark, Tokutek, shard, mongodb, B-Tree, John Partridge, MongoDB/10gen, TokuMX v1.4

“As the database gets used, shards can grow at an uneven rate and one shard might carry a majority of the load. MongoDB corrects this by balancing shards, but because of MongoDB’s lack of concurrency this operation can stall the database unacceptably.”–John Partridge.

I have interviewed John Partridge, President & CEO of Tokutek, Inc.

RVZ

Q1. Tokutek recently announced to have eliminated performance issues of MongoDB sharding. What was the problem?

John Partridge: The problem occurs after a shard is created. As the database gets used, shards can grow at an uneven rate and one shard might carry a majority of the load. MongoDB corrects this by balancing shards, but because of MongoDB’s lack of concurrency this operation can stall the database unacceptably (see the …

[Read more]

May

2014

New Release Webinar on May 13th: Introducing ClusterControl 1.2.6 - Live Demo

Posted by Severalnines on Wed 07 May 2014 12:07 UTC
Tags:

Other, MySQL Cluster, aws, mariadb, mongodb, OpenStack, MySQL, clustercontrol, Galera cluster, MariaDB Galera Cluster, tokumx

May 7, 2014 By Severalnines

Following the release of ClusterControl 1.2.6 a couple of weeks ago, we are now looking forward to demonstrating this latest version of the product on Tuesday next week, May 13th.

This release contains key new features (along with performance improvements and bug fixes), which we will be demonstrating live during the webinar.

Highlights include:

Centralized Authentication using LDAP or Active Directory
Role-Based Access Control
OpenStack: Galera Deployment Automation
Hybrid setups with Galera and Asynchronous MySQL Replication
Manage single instance …

[Read more]

May

2014

Eventual consistency of NoSQL marketing

Posted by Doron Levari on Sun 04 May 2014 01:18 UTC
Tags:

postgresql, Oracle, NoSQL, mongodb, MySQL

Yesterday I learnt an important lesson about an important difference between NoSQL and MySQL, at least when it comes to the marketing and hype.

I saw a tweet from around marketing of one of NoSQL leaders:

Most people apparently would just conclude from the tweet's text, however I actually clicked the link, and couldn't believe eyes:

I guess that in NoSQL, when it comes to the integrity of data as well as hype - it is eventually consistent...

May

2014

Maybe You Should Try Taking a Walk in My Shoes

Posted by Tokuview Blog on Thu 01 May 2014 16:24 UTC
Tags:

Tokutek, big data, TokuDB, NoSQL, mongodb, TokuView, newsql, MySQL, tokumx, Financial Times, M. I. T. Media Labs, MIT Media Labs, small data

The title of this post should really be, “Maybe He Should Try Taking a Walk in Your Shoes.”

The he I’m referring to is economist and author, Tim Harford. The you is the people who use NewSQL and NoSQL approaches to mine big data with database platforms like MySQL and MongoDB (or, preferably, our high-performance distributions of them, TokuDB and TokuMX).

Why should Mr. Harford take that walk? Well, he recently penned an article on big data in …

[Read more]

Apr

2014

Thoughts on Small Datum – Part 2

Posted by Tokuview Blog on Tue 29 Apr 2014 12:13 UTC
Tags:

innodb, Benchmarking, Tokutek, big data, TokuDB, NoSQL, mongodb, iibench, TokuView, Fractal Trees, fractal tree indexes, newsql, MySQL, tokumx

If you did not read my first blog post about Mark Callaghan’s (@markcallaghan) benchmarks as documented in his blog, Small Datum, you may want to skim through it now for a little context.

——————-

On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links