The primary database architectures—shared-disk and
shared-nothing—each have their advantages. Shared-disk has
functional advantages such as high-availability, elasticity, ease
of set-up and maintenance, eliminates partitioning/sharding,
eliminates master-slave, etc. The shared-nothing advantages are
better performance and lower costs. What if you could offer a
database that is a hybrid of the two; one that offers the
advantages of both. This sounds too good to be true, but it is
fact what ScaleDB has done.
The underlying architecture is shared-disk, but in
many situations it can operate like shared-nothing.
You see the problems with shared-disk arise from the messaging
necessary to (a) ship data among nodes and storage; and (b)
synchronize the nodes in the cluster. The trick is to move the
messaging outside of the transaction so it doesn’t impact
performance. The way to achieve that is to exploit locality. Let …
The CAP Theorem has become a convenient excuse for throwing data
consistency under the bus. It is automatically assumed that every
distributed system falls prey to CAP and therefore must sacrifice
one of the three objectives, with consistency being the
consistent fall guy. This automatic assumption is simply false. I
am not debating the validity of the CAP Theorem, but instead
positing that the onset of CAP limitations—what I call the CAP
event horizon—does not start as soon as you move to a second
master database node. Certain approaches can, in fact, extend the
CAP event horizon.
Physics tells us that different properties apply at different
scales. For example, quantum physics displays properties that do
not apply at larger scale. We see similar nuances in scaling
databases. For example, if you are running a master slave
database, using synchronous replication with a single slave is no
problem. Add nine more slaves and it slows the …
For decades the debate between shared-disk and shared-nothing
databases has raged. The shared-disk camp points to the laundry
list of functional benefits such as improved data consistency,
high-availability, scalability and elimination of
partitioning/replication/promotion. The shared-nothing camp
shoots back with superior performance and reduced costs. Both
sides have a point.
First, let’s look at the performance issue. RAM (average access
time of 200 nanoseconds) is considerably faster than disk
(average access time of 12,000,000 nanoseconds). Let me put this
200:12,000,000 ratio into perspective. A task that takes a single
minute in RAM would take 41 days in disk. So why do I bring this
up?
Shared-Nothing: Since the shared-nothing database has sole
ownership of its data—it doesn’t share the data with other
nodes—it can operate in the machine’s local RAM, only writing
infrequently to disk (flushing the data …
Shared-disk databases can be virtualized—making them
cloud-friendly—while shared-nothing databases are tied to a
specific computer and a specific data set or data
partition.
The underlying principle of the shared-nothing RDBMS is that a
single master server owns its specific set of data. That data is
not shared, hence the name shared-nothing. Because there is no
ability to share the data, there is also no ability to virtualize
the computing of that data. Instead the shared-nothing RDBMS ties
the data and the computing to a specific computer. This
association with a physical machine is then reinforced at the
application level. Applications leveraging a shared-nothing
database, that is partitioned across more than one server, use
routing code. Routing code simply directs the various database
requests to the servers that own the data being requested. In
other words, the application must know which server owns which
piece of data. …