Planet MySQL

Displaying posts with tag: bigdata (reset)

Oct

2014

Posted by Robert Hodges on Mon 06 Oct 2014 15:37 UTC
Tags:

Replication, hadoop, tungsten, big data, mariadb, bigdata, MySQL

Computer science is like an enormous tool box you can rummage through whenever you have a problem to solve. Most of the tools are sturdy and practical, like algorithms for B-trees. Some are also elegant, like consistent hashing in Dynamo. Finally there are some tools that you never quite figure out even after years of reflection. That piece of steel you are looking at could be Excalibur. Or it could be a rusty knife.

The CAP theorem falls into the last category, at least for me. It was a major topic in the blogosphere a few years ago and Google Trends shows steadily increasing interest in the term since 2010. It's not my goal to explain CAP fully--a good informal description is …

[Read more]

Feb

2014

The MySQL ARCHIVE storage engine – Alternatives

Posted by MySQL Performance Blog on Mon 24 Feb 2014 16:01 UTC
Tags:

innodb, myisam, Benchmarks, TokuDB, bigdata, Insight for DBAs, MySQL, Przemysław Malkowski, archive storage engine

In my previous post I pointed out that the existing ARCHIVE storage engine in MySQL may not be the one that will satisfy your needs when it comes to effectively storing large and/or old data. But are there any good alternatives? As the primary purpose of this engine is to store rarely accessed data in disk space efficient way, I will focus here on data compression abilities rather then on performance.

The InnoDB engine provides compressed row format, but is it’s efficiency even close to the one from that available in archive engine? You can also compress MyISAM tables by using myisampack tool, but that also means a table will be read only after such operation.

Moreover, I don’t trust MyISAM nor Archive when it comes to data durability. Fortunately along came a quite new (open source since April …

[Read more]

May

2013

Exploring SAP HANA â€“ Powering Next Generation Analytics

Posted by Venu Anuganti on Thu 02 May 2013 20:19 UTC
Tags:

database, ETL, data warehouse, reporting, NoSQL, bigdata, MySQL, Data Analytics, Predictive Analytics, SAP HANA, Good and Bad about HANA, Hana Analytical Platform Overview, Hana History time travel table, Hana how to use for real time analytics, Hana MDX, Hana Predictive Analysis LIbrary, Hana R integration, Hana Rest API, Important features of SAP HANA, Pros and Cons of HANA, SAP HANA ANALYTICS, SAP HANA features, Sap Predictive Analytics How to

SAP HANA , having entered the data 2.0/3.0 space at the right time, has been getting traction lately; and there will be lot of users like me who wants to[...]

Dec

2012

Data Science vs. Data Analytics

Posted by Venu Anuganti on Mon 10 Dec 2012 19:33 UTC
Tags:

database, analytics, hadoop, data warehouse, bigdata, MySQL, Data Analytics, Data Science, DataAnalytics, DataScience, Difference between data science and data analytics, How to hire data scientist, ROle of data analytics, Role of Data Scientist, What is Data science

As this topic came up a few times this week for discussion at various places, I thought of composing a post on “Data Scientist vs. Data Analytics Engineer”; even though[...]

Dec

2012

Distributed Clustering Services

Posted by Venu Anuganti on Mon 03 Dec 2012 06:38 UTC
Tags:

postgresql, database, hadoop, NoSQL, bigdata, MySQL, Co-orinating systems, Distributed Clustering, Distributed co-ordination, Distributed systems, HA using Zookeeper, MySQL HA using Zookeeer, Spread, Zookeeper

Apart from my consulting as part of ScaleIn, I also invest to bootstrap companies with really disruptive ideas; and in the process met few database specific companies who are already[...]

Nov

2012

Typical “Big” Data Architecture

Posted by Venu Anuganti on Fri 30 Nov 2012 22:15 UTC
Tags:

postgresql, sql, database, scalability, ETL, hadoop, data warehouse, MapReduce, hbase, reporting, cloudera, NoSQL, vertica, Hive, bigdata, MySQL, SAS, Big Data Architecture, Big Data Warehouse, Data Architecture, Impala, NoSQL and BigData, Data Analytics, Data Science, kognitio, druid

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

Jun

2012

MySQL to Vertica Replication, Part 2: Setup and Operation

Posted by Robert Hodges on Mon 04 Jun 2012 06:58 UTC
Tags:

Replication, mariadb, vertica, bigdata, MySQL

As described in the first article of this series, Tungsten Replicator can replicate data from MySQL to Vertica in real-time. We use a new batch loading feature that applies transactions to data warehouses in very large blocks using COPY or LOAD DATA INFILE commands. This second and concluding article walks through the details of setting up and testing MySQL to Vertica replication.

To keep the article reasonably short, I assume that readers are conversant with MySQL, Tungsten, and Vertica. Basic replication setup is not hard if you follow all the steps described here, but of course there are variations in every setup. For more information on Tungsten check out the Tungsten Replicator project at code.google.com site well as …

[Read more]

Jun

2012

MySQL to Vertica Replication, Part 1: Enabling Real-Time Analytics with Tungsten

Posted by Robert Hodges on Mon 04 Jun 2012 06:41 UTC
Tags:

Replication, mariadb, vertica, bigdata, MySQL

Real-time analytics allow companies to react rapidly to changing business conditions. Online ad services process click-through data to maximize ad impressions. Retailers analyze sales patterns to identify micro-trends and move inventory to meet them. The common theme is speed: moving lots of information without delay from operational systems to fast data warehouses that can feed reports back to users as quickly as possible.

Real-time data publishing is a classic example of a big data replication problem. In this two-part article I will describe recent work on Tungsten Replicator to move data out of MySQL into Vertica at high speed with minimal load on DBMS servers. This feature …

[Read more]

Feb

2011

Presenting "Real-Life Use Cases From Data Administration Hell" at LAMySQL

Posted by Frank Mash on Tue 08 Feb 2011 00:12 UTC
Tags:

speaking, Meetup, Presentations, bigdata, LAMySQL

If you're in the Los Angeles area on Feb 15, come hear my talk at LAMySQL inspired by learnings from real-life experiences. In addition to hearing a very unique and interesting talk, you can win an AppleTV thanks to awesome folks at @NoodleYard.

Real-Life Use Cases From Data Administration Hell

Data is the most valuable asset of an organization because it's irreplaceable.

Yet, we hear about f**k ups related to data administration every day by startups and organizations of all sizes. Sometimes it's no one's fault. Sometimes it's the fault of a drunk friend who shouldn't have been [wherever he was] at the first place.

Yet, at other times, the disaster could have been prevented. Sometimes, these f**k ups are caused by bad design. Sometimes, it's a bad …

[Read more]

Dec

2010

Big Data: Freedom or Something Else?

Posted by Frank Mash on Wed 01 Dec 2010 14:35 UTC
Tags:

scalability, lowlatency, bigdata

Googling around, I came across Bradford Cross' article, Big Data Is Less About Size, And More About Freedom. Bradford writes, " The scale of data and computations is an important issue, but the data age is less about the raw size of your data, and more about the cool stuff you can do with it."

Even though the article makes some good points, I'm not sure I can agree with Bradford's point of view here. As an architect, when I think in terms of Big Data, the ability to do "cool stuff" is probably the last thing that crosses my mind. Big Data, to me, is about ensuring constant response time as the data grows in size without sacrificing functionality.

What do you think Big Data is about? Is it merely about being able to do 'cool stuff' with your data? Is it about ensuring constant access/response times? Or is it about something else? I'm eager …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links