On 1/20/15, Parakkal, Navin S (Software Engineer) <[email protected]> wrote:
> Hello,
>
> I've few questions about sqlite3 , the database it creates. Actually I'm
> finding lot of differences in performance.
>
> My story:
> I have this sqlite3 database called hp.db which is like 100+ million
> records for table1. The size of hp.db on Linux x64 (CentOS 7) is like 16
> GB.
> When I do a select count(*) on hp_table1 it takes more than 5 mins which
> is quite a huge time.
> This file is static for now ie we copied it from a production server for
> analysis.
> Now I create a index by create index nvnhpindex on hp_table1
> (column1,column2) . The primary key of the table hp_table1 is
> (column1,column2).
Are you saying that you have a table like this:
CREATE TABLE hp_table1(
column1 TEXT,
column2 TEXT,
other_columns MISC,
PRIMARY KEY(column1,column2)
);
And then you are doing:
CREATE INDEX idx1 ON hp_table1(column1,column2);
Don't do that!!! The index is redundant. You already have a primary
key on those two columns. The primary key is sufficient. The index
just makes your database bigger and slower.
For a multi-column primary key, your best performance will (probably)
result if you say:
CREATE TABLE hp_table1(
column1 TEXT,
column2 TEXT,
other_columns MISC,
PRIMARY KEY(column1,column2)
) WITHOUT ROWID;
Note the use of WITHOUT ROWID at the end of the table declaration.
Note also the absence of any indexes. This will likely make a big
size and performance difference for you.
I say "probably" because there are cases where it might be better to
omit the WITHOUT ROWID - specifically if "other_columns" contain very
large strings and/or blobs - larger than about 1/5th of your page
size. You can run experiments using and omitting WITHOUT ROWID to see
which form works best on your system.
> It takes around some time (maybe 20 minutes or less , I went for lunch
> and came back , really didn't note the time).
>
> Now I do select count(*) on hp_table1 , it takes around 15 secs. This is
> what we want our objective. We want it to be fast. The create index has
> increased the size of hp.db to 18 GB. This is OK with us and the customers.
>
> The problem is this is not a static database. We keep inserting data (insert
> rows every 10 secs or like atleast 1 minute ) and occassionally delete rows
> (like once in 5 days). This is a 24x7 system.
>
>
> So to identify the problem , I created a empty_database similar to hp.db
> with no rows.
> I created a index on column1,column2 on empty_table1 inside empty_database.
>
> Now I inserted the rows from csv (this CSV was created by .mode csv, output
> myhp.csv, select * from hp_table1).
>
> The size of database is around 18GB (empty_database) with rows. Now I do a
> select count(*) on empty_table1 (actually it contains lots of rows like
> 100M+ records ) and it takes more than 5 mins. 5 mins is too much of a time
> for us to bear. The customer wants the information within a minute.
>
>
> Can you please help in resolving this problem ? We are planning to deploy
> this across 10000+ nodes on Linux x64 on one customer and many other
> customers are going in the similar direction.
>
> How do we go about resolving this ie what should we do to create a table
> with sub minute access for 100-500 million . How do we create the indexes ?
> Any other performance incentives.
>
> Some say we should buy/use Oracle but I just am holding onto sqlite3
> assuming it would help me solve our problem.
>
>
> Regards,
> Navin
>
>
--
D. Richard Hipp
[email protected]
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users