I just came across this: "Scaling Pinterest and adventures in
database sharding" (http://gigaom.com/data/scaling-pinterest-and-adventures-in-database-sharding/)
"Pinterest has learned about scaling the way most popular sites
do — the architecture works until one day it doesn’t"Pinterest
found out that "the architecture" is not scalable and they turned
to development of a Scale Out mechanism also called
Sharding.
I find it amazing that sharding, or in other words, the idea of
"scale out by splitting and parallelizing data across
shared-nothing commodity-hardware" is not supplied "out of the
box" by "the architecture" (such as database, load-balancer, any
other IT stuff). I'm wondering who was the one that
decided that an IT issue like scale-out should
be outsourced from the database to the …
Oh I love these things: http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/
Every day there are 2.5B content items shares, and 2.7B "Like"s.
I care less about GiGo content itself, but metadata, connections,
relations are kept transactionally in a relational database. The
above 2 use-cases generate 5.2B transactions on the database, and
since there are only 86400 seconds a day, we get over 60000 write
transactions per second on the database, from these 2 use-cases
alone, not to mention all other use-cases, such as new profiles,
emails, queries...
And what's the size of new data, on top of all the existing …
On the 8/16 I conducted a webinar titled: "Scale Up vs. Scale
Out" (http://www.slideshare.net/ScaleBase/scalebase-webinar-816-scaleup-vs-scaleout):
ScaleBase Webinar 8.16: ScaleUp vs.
ScaleOut from ScaleBase
The webinar was successful, we had many attendees and
great participation in questions and
answers throughout the session and in the
end. Only after the webinar it only occurred to me
that one specific graphic was missing from the webinar deck. It
was occurred to me after answering
several audience questions about "the difference
between …
In a previous post I wrote ARM based servers. Since then,
and thanks to all the comments and responses I got, I looked more
into this ARM thing and it's absolutely fascinating...
Look at this beauty (taken from the site of Calxeda,
the manufacturer):
What is it? A chip? A server? No, it's a cluster of 4
servers...
And this:
is HP Redstone Server, 288 chips, 1,152 cores (Calxeda
quad-core SoC) in a 4U server “Dramatically reducing the cost and
complexity of cabling and …
Yesterday I was asked by a customer for the reason why he had
failed to achieve scale with a state-of-the-art "shared-storage"
cluster. "It's a scale-out to 4 servers, but with a shared disk.
And I got, after tons of work and efforts, 130% throughput,
not even close to the expected 400%" he said.
Well, scale-out cannot be achieved with a shared storage and the
word "shared" is the key. Scale-out is done with
absolutely nothing shared or a "shared-nothing"
architecture. This what makes it linear and
unlimited. Any shared resource, creates a tremendous burden
on each and every database server in the cluster.
In a previous post, I identified database engine
activities such as buffer management, locking, thread
locks/semaphores, and recovery tasks - as the main bottleneck in
the OLTP …
Yesterday (4/19) I attended the AWS Summit in NYC (http://aws.amazon.com/aws-summit-2012/nyc).
I'm a big fan and also a heavy user of AWS especially S3, EC2,
and naturally, RDS. In every point in time I have several dozens
of AWS machines running for me out there in the East region, and
in some cases when we do some special benchmarks and tests,
number of EC2 and RDS machines can easily reach 3-digit. As I
said, I'm a fan...
A few quotes I was able to catch and document on my laptop, on my
laps...:
"When you develop an app for facebook, you must be prepared (and
be afraid) that to your party, not noone will show up, but
everybody will show up!" So true! Simple and true. We all want to
succeed, to have success with our app. We have to think about
scaling from day 1.
"Database was bottleneck for building of sophisticated apps. This
is …
There are ways to scale databases, unfortunately some are
limited, some introduce complexities, some are do not fit the
cloud...
By scaling solution I mean a solutions that help me scale my
existing environment, my existing RDBMS. Some magic or technology
that will take my existing Oracle or MySQL for example, to the
next level, without porting to a new DB engine/vendor and without
completely recoding my app.
Let's try to organize things a bit in this very summarized table,
just to get the hunch of it. I can't imagine to cover it all in 1
table or even 100 pages, but that should be a start of a
meaningful discussion to continue in next posts:
Solution | Scales reads? | Scales writes? | Scales data? | Scales sessions? | … |
In my heart, I'm a DBA, always was and always will be. People say
I'm a database guy by the way I think, keep my car, and file my
music and also bank statements... However I did great deal of
development, design, architecture on the apps side. I (hope to)
have some perspective.
Applications come and go. The second programming language I've
ever learned and worked on was COBOL, some still say most of
the world's lines of code are written in this language, maybe so,
but anyway I since then have known and written in dozens of
programming languages, from Assembly to Force.com, from Pascal to
Delphi, from functional C to Object
Oriented SmallTalk, C++, Java and , from compiled C/CGI
to interpreted Perl, ASP and Ruby back to compiled node.js... My
first applications ran on Main-Frame with green screen, later I
created beautiful graphic client-server applications, later I had
to create hideous white web applications …
This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.
New features in this release:
- Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
- Forced Slave Reads functionality. Now …
Shard-Query is an open source tool kit which helps
improve the performance of queries against a MySQL database by
distributing the work over multiple machines and/or multiple
cores. This is similar to the divide and conquer approach that
Hive takes in combination with Hadoop.
Shard-Query applies a clever approach to parallelism which allows
it to significantly improve the performance of queries by
spreading the work over all available compute resources. In this
test, Shard-Query averages a nearly 6x (max over 10x) improvement
over the baseline, as shown in the following graph:
One significant advantage of Shard-Query over Hive is that it works with existing MySQL data sets and queries. Another advantage is that it works with all MySQL …
[Read more]