Planet MySQL

Displaying posts with tag: length (reset)

Jun

25

2013

A new big data structure for streaming counters - bit length encoding

Posted by Justin Swanhart on Tue 25 Jun 2013 17:35 UTC
Tags:

streaming, bit, big data, bit length encoding, encoding., damn cool algorithms, length, sql streaming

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits. In MySQL DECIMAL can store more than 64 bits of data using fixed precision. Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL? FLOAT is lossy. What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]

Jun

25

2013

A new big data structure for streaming counters - bit length encoding

Posted by Justin Swanhart on Tue 25 Jun 2013 17:35 UTC
Tags:

streaming, bit, big data, bit length encoding, encoding., damn cool algorithms, length, sql streaming

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits. In MySQL DECIMAL can store more than 64 bits of data using fixed precision. Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL? FLOAT is lossy. What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]