In case of SELECTing "all available" I recommend invoking a different 
statement without the timestamp-part instead of providing some min and 
max values for timestamp. This avoids tricking the query planner into 
some wrong decisions (if you have an index with protocol as the first 
column).

And how about WAL mode? If concurrency is of any concern for you, this 
definitely is something worth a try.

-- GB

Jeff Roux schrieb am 31.03.2015 um 12:48:
> Thanks everyone for the answers.
>
> I won't be able to make some tests today, I will come back to you soon with
> additional information. Just to say that, in the worst case, the WHERE
> clause selects the entire data, i.e 1000000 entries. The user can select a
> time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before
> being added in the database, the time stamps are aggregated on a 180 second
> period and a lot of rows has the same time stamp (~ one row per TCP
> session).
>
> All the columns are defined as INTEGER. There are 41 columns in total in
> the flow table. If I remember well, there is no primary key defined for
> this table.
>
> 2015-03-31 8:32 GMT+02:00 GB <gbinfo at web.de>:
>
>>  From what I see, I assume that timestamp gives the highest selectivity.
>> Taking into account that protocol is SELECTed for and portLan is GROUPed
>> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan
>> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of
>> course. Are your colums of INTEGER affinity? If the are of TEXT, they will
>> store anything as TEXT. May make a difference in both space consumption and
>> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not,
>> give it a try. It sometimes makes a big difference.
>>
>> Is it possible that data collection and retrieval happen at the same time?
>> If so, try running the database in WAL mode, it should help with
>> concurrency issues.
>>
>> -- GB
>>
>>
>> Jeff Roux schrieb am 30.03.2015 um 11:46:
>>
>>> Hi everyone,
>>>
>>> I have a daemon that collects information and stores it in a SQLite
>>> database. The table has 1 million rows.
>>>
>>> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>>> and a SSD drive. I have performance issues with some requests. For
>>> instance, the following request takes more than 5 seconds to
>>> accomplish with SQlite3 (in that particular case, the WHERE clause
>>> selects all the data in the database, i.e. 1000000 rows):
>>>
>>> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>>> FROM  flows
>>> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>>> (17, 6)
>>> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>>>
>>> I have done some tests with or without "INDEXED BY" clauses and got
>>> nearly the same results.
>>>
>>> I compared the performance with a mySQL and the same request takes
>>> less than 1 second to accomplish.
>>>
>>> Could you give me some directions to optimize this kind of request
>>> with SQlite3 when there is a big amount of data in the table ? I
>>> need to increase 3 times the number of rows and the performance will
>>> become unacceptable for my application.
>>>
>>> Thanks in advance.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to