Did you do multiple runs and average?  And how are you measuring time?  Did you 
do buffered reads?
And you're MB/sec isn't accurate for the count() test.  I think it was already 
said that count() doesn't read the entire record.

The SQLite numbers are due to SQLite caching.  If you make page size 4096 you 
are 4 times less likely to need another disk read (SSD or otherwise).  That's 
different from what you're doing.  You're one layer down from that.

Your read makes perfect sense if I look at it as sampling error....which is 
what I would expect.

As you increase your read page size you get a curve that saturates at 50MB/sec.

Also...you either have a really slow machine or your timing is wrong.

I can do over 70MB/sec just reading from disk (2.6Ghz Xeon, 7200 SAS drives 
mirrored)
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
double tic(int flag)
{
    static double t1;
    double t2;
    struct timeval tv;
    static long base_seconds;
    gettimeofday(&tv,NULL);
    // to give us the resolution we need for high-speed we will subtract our 
starting seconds
    // otherwise 15 digits of accuracy aren't enough
    if (base_seconds==0) base_seconds=tv.tv_sec;
    if (flag==0) {
        t1 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)1000000;
    }
    t2 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)1000000;
    return t2-t1;
}
void test1(int bufsize)
{
        char *buf=malloc(bufsize);
        FILE *fp=fopen("test2.dat","r");
        int n,nn=0;
        while((n=fread(buf,1,bufsize,fp))) {
                nn+=n;
        }
        fclose(fp);
        printf("%d\n",nn);
}
int main(int argc,char *argv[])
{
        tic(0);
        test1(atoi(argv[1]));
        printf("%f MB/sec\n",512000000/tic(1));
        return 0;
}

dd if=/dev/zero of=test2.dat bs=1024 count=500000
sync; echo 3 > /proc/sys/vm/drop_caches
io 4096
512000000
73.071591 MB/sec

Now that it's cached
io 4096
512000000
1884.153971 MB/sec

I would hope your SSD can do better than my disk on raw throughput.


Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate



________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Max Vlasov [max.vla...@gmail.com]
Sent: Monday, February 28, 2011 8:52 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT :Re: COUNT() extremely slow first time!

Michael,
it's an interesting hypothesis. Maybe you can describe the following
results. Recently I made a test program that just reads a file sequentially
with different block sizes and results are still confusing. This was the
same SSD drive and the functions were basically from api:
CreateFile/ReadFile

128, 256: 5MB/Sec
512: 20MB/Sec
1024: 25MB/Sec.
2048: 41MB/Sec
4096 - 65536: ~50MB/Sec.

Strangely I didn't see such a big difference between slowest and fastest
scenario (if we exclude the sizes below 1024), as you see the difference is
only x2. I have only one explanation looking at the numbers. Although it's
well-known that sqlite reads only full pages, if it sometimes does partial
reading, this 5MB/Sec drop for <256 reading can affect linear speed of 25
MB/Sec to end up as 12MB/Sec. But it's just a guess.

Max


On Mon, Feb 28, 2011 at 4:43 PM, Black, Michael (IS) <michael.bla...@ngc.com
> wrote:

> Those numbers make sense to me.
>
> Since count is doing a linear walk through....correct?
>
> #1 SSDs more than likely don't do "read ahead" like a disk drive.  So what
> you're seeing is what read-ahead does for you.
> #2 Count is doing a linear scan of the table...Probability of 2K containing
> the next page after 1K -- 100% (2X performance improvment)
> #3 Probability of 4K containing the next page after 2K -- 100% (2X
> improvement).
> #4 Probability of 8K containing the next page after 4K -- here the
> probability either drops or we're starting to hit the bandwidth of
> SSD+Sqlite -- I'm tempted to say that it's not the probability that drops.
>
> ________________________________________
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
> behalf of Max Vlasov [max.vla...@gmail.com]
>
> my timing for Select Count for 400-500MB Bases and the table with about
> 1,000,000 records
>
> 1024:
>  40 Seconds, 98% system+user time, 500 MB Read, 12 MB/Sec
> 2048:
>  22 Seconds, 94% system+user time, 500 MB Read, 20 MB/Sec
> 4096 (This NTFS Cluster size)
>  11 Seconds, 96% system+user time, 450 MB Read, 32 MB/Sec
> 8192
>  8 Seconds, 87% system+user time, 420 MB Read, 50 MB/Sec
> 32768
>  8 Seconds, 56% system+user time, 410 MB Read, 50 MB/Sec
>
> Max
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to