Two cons against NoSQL data stores read like this: 1. It’s very hard to move data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot [...]
“For analytical things, eventual consistency is ok (as long as you can know after you have run them if they were consistent or not). For real world involving money or resources it’s not necessarily the case.” — Michael “Monty” Widenius. In a recent interview, I asked Justin Sheehy, Chief Technology Officer at Basho Technologies, maker [...]
“For analytical things, eventual consistency is ok (as long as you can know after you have run them if they were consistent or not). For real world involving money or resources it’s not necessarily the case.” — Michael “Monty” Widenius. In a recent interview, I asked Justin Sheehy, Chief Technology Officer at Basho Technologies, maker [...]
“While I believe that one size fits most, claims that RDBMS can no longer keep up with modern workloads come in from all directions. When people talk about performance of databases on large systems, the root cause of their concerns is often the performance of the underlying B-tree index”– Martín Farach-Colton. Scaling MySQL and MariaDB [...]
Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"I know GigaOM
like to provoke scandals sometimes, we all remember some other
unforgettable piece, but there is something behind
it...
Hadoop today (after SOA not so long ago) is one of the worst case
of an abused buzzword ever known to men. It's everything,
everywhere, can cure illnesses and do "big-data" at the same
time! Wow! Actually Hadoop is a software framework that
supports data-intensive distributed applications, derived from
Google's MapReduce and Google File System (GFS) papers.
My take from the article is this: Hadoop is a foundation,
low-level platform. I used the word …
In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I
reviewed the differences between OLTP and Analytics
databases.
Scale challenges are different between those 2 worlds of
databases.
Scale challenges in the Analytics world are with the growing
amounts of data. Most solutions have been leveraging those 3 main
aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and
focused. After that – it all goes down to the I/O - the faster
the I/O is, the faster the query will finish and bring results.
Faster disks and also SSD can play good role, but above all: RAM! …
Inspired by a post from Juice Analytics.
We are a conflicted people. We love our TV and movie violence but worry that it ruins our children’s minds. We want to reduce healthcare costs, but don’t want to restrict the free market.
Conflicts like these leave little room for a satisfactory answer. Basic principles are in conflict and deeply-rooted desires run up against painful consequences. We
So I’ve been doing a fair number of automated load tests these past six months. Primarily with Sysbench, which is a fine, fine tool. First I started using some simple bash based loop controls to automate my overnight testing, but as usually happens with shell scripts they grew unwieldy and I rewrote them in python. Now I have some flexible and easily configurable code for sysbench based MySQL benchmarking to offer the community. I’ve always been a fan of giving back to such a helpful group of people – you’ll never hear me complain about “my time isn’t free”. So, let me know what you want in an ideal testing environment (from a load testing framework automation standpoint) and I’ll integrate it into my existing framework and then release it via the BSD license. The main goal here is to have a standardized modular framework, based on sysbench, that allows anyone to compare their server performance via repeatable tests. It’s fun to see …
[Read more]Letting data speak for itself through analysis of entire data sets is eclipsing modeling from subsets. In the past, all too often what were once disregarded as "outliers" on the far edges of a data model turned out to be the telltale signs of a micro-trend that became a major event. To enable this advanced analytics and integrate in real-time with operational processes, companies and public sector organizations are evolving their enterprise architectures to incorporate new tools and approaches.
Whether you prefer "big," "very large," "extremely large," "extreme," "total," or another adjective for the "X" in the "X Data" umbrella term, what's important is accelerated growth in three dimensions: volume, complexity and speed.
Big data is not without its limitations. Many organizations need to revisit business processes, solve data silo challenges, and invest in visualization and collaboration tools to make big data understandable and …
[Read more]This is a follow up to my previous post titled “MySQL analytics: information_schema polling for table engine percentages”. Here’s an updated query with more output and quicker execution time. What you get: innodb table space utilization percentage, data+index usage total and per innodb/myisam engine, innodb data/index/percentage, myisam data/index/percentages, and overall percentage values. Rather useful for profiling your table engine usage.
Sample output:
innodb_tablespace_utilization_perc: 100
total_size_gb: 26.275011910126
index_size_gb: 2.994891166687
data_size_gb: 23.280120743439
innodb_total_size_gb: 6.751220703125
innodb_data_size_gb: 5.2576751708984
innodb_index_size_gb: 1.4935455322266
myisam_total_size_gb: 19.523791207001
myisam_data_size_gb: 18.02244557254
myisam_index_size_gb: 1.5013456344604
perc_index: 11.3982
perc_data: 88.6018
…