Showing entries 101 to 110 of 117
« 10 Newer Entries | 7 Older Entries »
Displaying posts with tag: blogging (reset)
Tradeoff: Insertions versus Point Queries

I’ve been waving my hands about lower bounds.  Well, sometimes I haven’t been waving my hands, because the lower bounds are tight.  But in other cases (lenient insertions, range queries), the lower bounds are very far from what we’re used to.

So now, for a bit of math:

Brodal and Fagerberg showed in 2003 that there’s a tradeoff between insertions and queries.  The insertions they consider are lenient.  Well, any lower bound for lenient is a lower bound for strict, but they also gave upper bounds, so it matters.  Also, they don’t know from lenient, but if you look at their upper bound, they are implementing lenient insertions.  The queries they consider are, unfortunately, point queries.  That’s too bad for us, because we’ve already seen that point queries are just too slow to be of interest on hard disks.

Still, they have matching upper and lower bounds, so let’s see …

[Read more]
Tradeoff: Insertions versus Point Queries

I’ve been waving my hands about lower bounds.  Well, sometimes I haven’t been waving my hands, because the lower bounds are tight.  But in other cases (lenient insertions, range queries), the lower bounds are very far from what we’re used to.

So now, for a bit of math:

Brodal and Fagerberg showed in 2003 that there’s a tradeoff between insertions and queries.  The insertions they consider are lenient.  Well, any lower bound for lenient is a lower bound for strict, but they also gave upper bounds, so it matters.  Also, they don’t know from lenient, but if you look at their upper bound, they are implementing lenient insertions.  The queries they consider are, unfortunately, point queries.  That’s too bad for us, because we’ve already seen that point queries are just too slow to be of interest on hard disks.

Still, they have matching upper and lower bounds, so let’s see …

[Read more]
Tradeoffs: Updates versus Range Queries

Sorry for the delay, now on to range queries and lenient updates.  Let’s call them queries and updates, for short.  So far, I’ve shown that B-trees (and any of a number of other data structures) are very far from the “tight bound.” I’ll say a bound is a tight if it’s a lower bound and you can come up with data structure that matches it.

So how do we match the bandwidth bound for queries and updates?  I already mentioned in passing how to do this, but let’s look more closely.

Fast Updates

The way to get fast updates is to log them.  You can easily saturate disk bandwidth by writing out the insertion, deletion and update requests with no index. 

A query now will typically start by sorting the data.  Even a point query requires looking at all the data, but a range query requires looking at all the data log times (in order to sort it), or using a large amount of extra …

[Read more]
Tradeoffs: Updates versus Range Queries

Sorry for the delay, now on to range queries and lenient updates.  Let’s call them queries and updates, for short.  So far, I’ve shown that B-trees (and any of a number of other data structures) are very far from the “tight bound.” I’ll say a bound is a tight if it’s a lower bound and you can come up with data structure that matches it.

So how do we match the bandwidth bound for queries and updates?  I already mentioned in passing how to do this, but let’s look more closely.

Fast Updates

The way to get fast updates is to log them.  You can easily saturate disk bandwidth by writing out the insertion, deletion and update requests with no index. 

A query now will typically start by sorting the data.  Even a point query requires looking at all the data, but a range query requires looking at all the data log times (in order to sort it), or using a large amount of extra …

[Read more]
don’t ask too many questions

chyrp is a nice looking piece of blog software. individual posts can have different styles, something it borrowed from the hosted tumblr service. i was interested to read about “the sql query massacre of january 19th, 2008” but the numbers gave me pause — 21 queries to generate the index page? that is down from an astounding 116, but that still seems ridiculous to me.

the number of queries to generate the index of this site? two. one of them is SET NAMES utf8. i could see boosting that to three or four if i moved some things like the list of links in the sidebar into the database, or added archive links. call it five if i had user accounts.

but right now, the number of queries used to load the index page …

[Read more]
How Fast Can Updates Run?

Last time, I introduced the notion of strict and lenient updates.  Now it’s time to see what the performance characteristics are of each.

Just to rehash, we are focusing on the storage engine (a la MySQL) level, and we are looking at a database on a single disk—the one we are using for illustration is the 1TB Hitachi Deskstar 7K1000.  It has a disk seek time 14ms and transfer rate of around 69MB/s [See tomshardware.com] We will insert and delete random pairs, each 8 bytes.  So that’s 62.5 billion pairs to fill the disk.

Strict Updates

These are the easier update types to analyze.  Please review the definition of strict updates from the last blog entry.  Now notice that each insertion or deletion requires a point query.  For example, during an insertion, in order to determine if there’s already a row with a particular key value in the database, one must look up that key.  In order …

[Read more]
How Fast Can Updates Run?

Last time, I introduced the notion of strict and lenient updates.  Now it’s time to see what the performance characteristics are of each.

Just to rehash, we are focusing on the storage engine (a la MySQL) level, and we are looking at a database on a single disk—the one we are using for illustration is the 1TB Hitachi Deskstar 7K1000.  It has a disk seek time 14ms and transfer rate of around 69MB/s [See tomshardware.com] We will insert and delete random pairs, each 8 bytes.  So that’s 62.5 billion pairs to fill the disk.

Strict Updates

These are the easier update types to analyze.  Please review the definition of strict updates from the last blog entry.  Now notice that each insertion or deletion requires a point query.  For example, during an insertion, in order to determine if there’s already a row with a particular key value in the database, one must look up that key.  In order …

[Read more]
Updates & Discipline

So far, I’ve analyzed point and range queries.  Now it’s time to talk about insertions and deletions.  We’ll call the combination updates.  Updates come in two flavors, and today we’ll cover both.

Depending on the exact settings of your database, the updates give a varying amount of feedback.  For example, when a key is deleted, all rows with that key are deleted (assuming the database allows duplicate keys).  The normal behavior is to return the number of rows deleted.  The normal behavior when deleting a key that has no corresponding rows in the database is to return an error message.  On insertion, one can allow duplicate or not.  In the latter case, the storage engine returns an error message if a duplication insertion is attempted. 

We’ll see that the details of error messages have a profound influence on the lower-bound arguments I’ve been making (and we’ll see a bit …

[Read more]
Updates & Discipline

So far, I’ve analyzed point and range queries.  Now it’s time to talk about insertions and deletions.  We’ll call the combination updates.  Updates come in two flavors, and today we’ll cover both.

Depending on the exact settings of your database, the updates give a varying amount of feedback.  For example, when a key is deleted, all rows with that key are deleted (assuming the database allows duplicate keys).  The normal behavior is to return the number of rows deleted.  The normal behavior when deleting a key that has no corresponding rows in the database is to return an error message.  On insertion, one can allow duplicate or not.  In the latter case, the storage engine returns an error message if a duplication insertion is attempted. 

We’ll see that the details of error messages have a profound influence on the lower-bound arguments I’ve been making (and we’ll see a bit …

[Read more]
Range Queries: Is the Bottleneck Seeks or Bandwidth?

Last time I talked about point queries.  The conclusion was that big databases and point queries don’t mix.  It’s ok to do them from time to time, but it’s not how you’re going to use your database, unless you have a lot of time.  Today, I’d like to talk about range queries, which seem much more useful for the analysis of big databases, say in a business intelligence setting.

Recall that the focus is on the storage engine (a la MySQL) level, and a database on a single disk—the one we are using for illustration is the 1TB Hitachi Deskstar 7K1000.  It has a disk seek time 14ms and transfer rate of around 69MB/s [See tomshardware.com] Now imagine filling the disk with random pairs, each 8 bytes.  So that’s 62.5 billion pairs.

Range Queries

Suppose the above data is stored in a B-tree, and that you’d like to iterate over all the data in order by key.  Further suppose that the …

[Read more]
Showing entries 101 to 110 of 117
« 10 Newer Entries | 7 Older Entries »