One of the challenges of big data is that it is, well, big.
Computers are optimized for math on 64 bits or less. Any bigger,
and extra steps have to be taken to work with the data which is
very expensive. This is why a BIGINT is 64 bits. In MySQL
DECIMAL can store more than 64 bits of data using fixed
precision. Large numbers can use FLOAT or DECIMAL but those
data types are lossy.
DECIMAL is an expensive encoding. Fixed precision math is
expensive and you eventually run out of precision at which point
you can't store any more data, right?
What happens when you want to store a counter that is bigger than
the maximum DECIMAL? FLOAT is lossy. What if you need
an /exact/ count of a very big number without using very much
space?
I've developed an encoding method that allows you to store very
large counters in a very small amount of space. It takes
advantage of the fact that counters …
One of the challenges of big data is that it is, well, big.
Computers are optimized for math on 64 bits or less. Any bigger,
and extra steps have to be taken to work with the data which is
very expensive. This is why a BIGINT is 64 bits. In MySQL
DECIMAL can store more than 64 bits of data using fixed
precision. Large numbers can use FLOAT or DECIMAL but those
data types are lossy.
DECIMAL is an expensive encoding. Fixed precision math is
expensive and you eventually run out of precision at which point
you can't store any more data, right?
What happens when you want to store a counter that is bigger than
the maximum DECIMAL? FLOAT is lossy. What if you need
an /exact/ count of a very big number without using very much
space?
I've developed an encoding method that allows you to store very
large counters in a very small amount of space. It takes
advantage of the fact that counters …
“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]
Read the original article at The Needle in Big Data Noise
Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]
For more articles like these go to Sean Hull's Scalable Startups
Related posts:
[Read more]With this version, the source code is now freely available under the GPL License v2. For more details, see our blog here. Open source pioneer Mozilla has been using TokuDB to manage its MySQL-driven Datazilla Data cluster, an open-source system for managing and visualizing performance data.
Date: May 2nd
Time: 2 PM EST / 11 AM PST
REGISTER TODAY
In the past TokuDB has been free for evaluation; the new TokuDB Community Edition extends free use to deployed environments. With this release Tokutek is also planning on making available a TokuDB Enterprise Edition, which includes technical support, initial customer onboarding services, and advanced tools for backup and recovery.
We …
[Read more]
Those who are familiar with me know I've a dream.
5 years ago I decided to leave a systems
integrator where I was doing great. Why? I wanted to be in a
company with the same growth prospects that Oracle had in the
80s. I dreamed to be in the Oracle of 30 years
ago and, as time travel wasn't affordable, I decided to join
MySQL AB to help expand the business in Europe, the Middle East
and Africa.
A few years later my dream came true, but in a slightly different
sense. Sun acquired MySQL and was later swallowed by Oracle giving me the
opportunity to join the company I wished I could have helped
build.
Oracle is an amazing …
We wanted to thank everyone for naming Tokutek the Corporate Contributor of the Year 2013 for ongoing contribution to the MySQL community.
The MySQL Community Awards are given annually to the people and companies that support the MySQL ecosystem. The MySQL Community Award for Corporate Contributor of the Year recognizes a company or other organization or entity that has made valuable contributions to the MySQL ecosystem either in terms of open source code, knowledge, funding or other resources or sponsorship. The winners are selected by an independent community panel.
“Open Source is about collaborating and contributing to build …
[Read more]Continuent CEO Robert Hodges says that NoSQL solutions are oversold, but this is no reason for MySQL fans to become complacent. He kicked off Day 2 of the Percona Live MySQL Conference and Expo with his keynote, "How MySQL can thrive in the world of massive data hype."He said there are new challenges in data management, and relational databases must solve them or risk becoming irrelevant. This
Since we announced that TokuDB is now open source, there has been a lot of positive feedback (thanks!) and also some questions about the details. I want to take this opportunity to give a quick high level guide to describe what our repositories on Github are.
Here are the repositories:
-
ft-index. This repository is the “magic”. It contains
the Fractal Tree data structures we have been talking about for
years. This is also the main piece that was previously closed
source. Here are some interesting directories:
- src: This directory is a layer that implements an API that is similar to the BDB API.
- locktree: an in-memory data structure that maintains transactions’ row-level locks. …
MySQL replication enables data to be
replicated from one MySQL database server (the master) to one or
more MySQL database servers (the slaves). However, imagine the
number of use cases being served if the slave (to which data is
replicated) isn't restricted to be a MySQL server; but it can be
any other database server or platform with replication events
applied in real-time!
This is what the new Hadoop Applier empowers you to
do.
An example of such a slave could be a data warehouse system such
as Apache
Hive, which uses HDFS as a data store. If you have a Hive
metastore associated with HDFS(Hadoop Distributed File System), the Hadoop
Applier can populate Hive tables in real time. Data is …