I've been doing a little more playing with Cassandra, an open source distributed database. It
has several features which make it very compelling for storing
large data which has a lot of writes:
- Write-scaling - adding more nodes increases write capacity
- No single point of failure
- configurable redundancy
And the most important:
- Key range scans
Key range scans are really important because they allow
applications to do what users normally want to do:
- What emails did I receive this week
- Give me all the transactions for customer X in time range Y
Answering these questions without range scans is extremely
difficult; with efficient range scans they become fairly easy
(provided you pick your keys right).
…