On 1/20/15, Parakkal, Navin S (Software Engineer) <navin.parak...@hp.com> wrote: > Hello, > > I've few questions about sqlite3 , the database it creates. Actually I'm > finding lot of differences in performance. > > My story: > I have this sqlite3 database called hp.db which is like 100+ million > records for table1. The size of hp.db on Linux x64 (CentOS 7) is like 16 > GB. > When I do a select count(*) on hp_table1 it takes more than 5 mins which > is quite a huge time. > This file is static for now ie we copied it from a production server for > analysis. > Now I create a index by create index nvnhpindex on hp_table1 > (column1,column2) . The primary key of the table hp_table1 is > (column1,column2).
Are you saying that you have a table like this: CREATE TABLE hp_table1( column1 TEXT, column2 TEXT, other_columns MISC, PRIMARY KEY(column1,column2) ); And then you are doing: CREATE INDEX idx1 ON hp_table1(column1,column2); Don't do that!!! The index is redundant. You already have a primary key on those two columns. The primary key is sufficient. The index just makes your database bigger and slower. For a multi-column primary key, your best performance will (probably) result if you say: CREATE TABLE hp_table1( column1 TEXT, column2 TEXT, other_columns MISC, PRIMARY KEY(column1,column2) ) WITHOUT ROWID; Note the use of WITHOUT ROWID at the end of the table declaration. Note also the absence of any indexes. This will likely make a big size and performance difference for you. I say "probably" because there are cases where it might be better to omit the WITHOUT ROWID - specifically if "other_columns" contain very large strings and/or blobs - larger than about 1/5th of your page size. You can run experiments using and omitting WITHOUT ROWID to see which form works best on your system. > It takes around some time (maybe 20 minutes or less , I went for lunch > and came back , really didn't note the time). > > Now I do select count(*) on hp_table1 , it takes around 15 secs. This is > what we want our objective. We want it to be fast. The create index has > increased the size of hp.db to 18 GB. This is OK with us and the customers. > > The problem is this is not a static database. We keep inserting data (insert > rows every 10 secs or like atleast 1 minute ) and occassionally delete rows > (like once in 5 days). This is a 24x7 system. > > > So to identify the problem , I created a empty_database similar to hp.db > with no rows. > I created a index on column1,column2 on empty_table1 inside empty_database. > > Now I inserted the rows from csv (this CSV was created by .mode csv, output > myhp.csv, select * from hp_table1). > > The size of database is around 18GB (empty_database) with rows. Now I do a > select count(*) on empty_table1 (actually it contains lots of rows like > 100M+ records ) and it takes more than 5 mins. 5 mins is too much of a time > for us to bear. The customer wants the information within a minute. > > > Can you please help in resolving this problem ? We are planning to deploy > this across 10000+ nodes on Linux x64 on one customer and many other > customers are going in the similar direction. > > How do we go about resolving this ie what should we do to create a table > with sub minute access for 100-500 million . How do we create the indexes ? > Any other performance incentives. > > Some say we should buy/use Oracle but I just am holding onto sqlite3 > assuming it would help me solve our problem. > > > Regards, > Navin > > -- D. Richard Hipp d...@sqlite.org _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users