Thanks Jay.

I guess you are right. I have very little experience in dealing with such
huge data-sets. So what I meant was,
initially I was doing this with c++ data structures like vectors and maps,
and I could not even store all the
entries. The program would crash out with bad_alloc exception. With sqlite,
I can at least store all the data,
for the cases I have tested till now (600 billion entries for now), but it
is awfully slow.

Also continuing with the mail, this is what I am doing:
<Continuing from last mail>
2. In a loop, read each address,
                   for that address find the next time this address was used
( to implement the replacement policy)

To do this:

1. I am creating a table: CREATE TABLE AddressSet (SeqNo UNSIGNED BIG INT
PRIMARY KEY, Address UNSIGNED BIG INT)
2. Inserting all the addresses INSERT INTO AddressSet VALUES(Seqno, Address)
...

3. In the loop I am doing this (for each of the seqeunce numbers):
                      SELECT SeqNo, Address FROM AddressSet WHERE SeqNo=n ,
n = 1,2, ......   (Let the address selected be m)
                      DELETE FROM AddressSet WHERE SeqNo=n
// delete it so that for the next select statement we get the next time this
address was seen
                      SELECT SeqNo, Address FROM AddressSet WHERE Address=m
ORDER BY SeqNo ASC LIMIT 1 // here SeqNo gives us the next time this address
was seen

I was wondering if this flow can be modified so as to see some improvements
in performance.

Just to give some context,all these issues come in because we need the
entire history of the trace to do the processing, since when an address is
seen, we need to find out
when in future this was used again, In the worst case the very first address
was never used again, or maybe was used at the end of the address trace.
This is because I am trying to simulate an "optimal" replacement policy in
the cache, unlike the real ones such as LRU, in which I could have
maintained a limited history,
and be done with it.

Thanks again,
Abhisek

On Mon, Jul 18, 2011 at 11:23 AM, Jay A. Kreibich <j...@kreibi.ch> wrote:

> On Mon, Jul 18, 2011 at 05:01:55PM +0200, Stephan Beal scratched on the
> wall:
> > On Mon, Jul 18, 2011 at 4:58 PM, abhisek...@gmail.com
> > <abhisek...@gmail.com>wrote:
> >
> > > These are addresses accessed by a program. There will be 100 billion
> > > entries
> > >
> >
> > You won't be able to fit that many in your database - sqlite3 cannot
> scale
> > to the file size you will need for that. Assuming 10-byte addresses (as
> you
> > demonstrated), 10 bytes x 100B records = 1 terrabyte JUST for the
> addresses
> > (not including any sqlite3-related overhead per record, which is probably
> > much larger than the 10 bytes you're saving).
>
>   In theory, the maximum size of an SQLite database is 128 TB.
>
>  2^31 pages (2 giga-pages) @ 2^16 bytes (64K) = 128 TB, or ~140e12.
>
>  (I know http://sqlite.org/limits.html says 14TB, but I think they
>   dropped a digit)
>
>  If your file system can handle this or not is a different story.
>
>  Using SQLite for this type of data seems very questionable, however.
>  As Stephen points out, the database with just the addresses is likely
>  to be in the 3 to 4 TB range. You said "There will be 100 billion
>   entries or so like this, which makes it necessary to use the
>   database," but I think just the opposite is true.  If you have a
>  *very* large number of data points with with a very specific access
>  pattern, using a general purpose tool seems like exactly the wrong
>  choice.  You need some custom system that is highly optimized for
>  both storage space and your specific access patterns.
>
>   -j
>
> --
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Intelligence is like underwear: it is important that you have it,
>  but showing it to the wrong people has the tendency to make them
>  feel uncomfortable." -- Angela Johnson
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Abhisek
Live Long and Prosper
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to