Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread Keith Medcalf
On Sunday, 2 December, 2018 12:57, Simon Slavin wrote: >On 2 Dec 2018, at 7:29pm, E.Pasma wrote: >> drop table x; >> create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; >> >> insert into x select random() from generate_series where start=1 >and stop=1000; >> Run Time: real 88.759 user

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread Simon Slavin
On 2 Dec 2018, at 7:29pm, E.Pasma wrote: > drop table x; > create table x(value INTEGER PRIMARY KEY) WITHOUT ROWID; > > insert into x select random() from generate_series where start=1 and > stop=1000; > Run Time: real 88.759 user 36.276227 sys 44.190566 Realtime is 88.759 > create table

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread E.Pasma
> 2 dec. 2018, Keith Medcalf: > > > Well if it is unique and not null, then why not just make it the rowid? In > either case, you would still have to permute the storage tree at insert time > if the inserts were not in-order. So let us compare them shall we: > > sqlite> create table

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread Keith Medcalf
lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of E.Pasma >Sent: Sunday, 2 December, 2018 03:38 >To: SQLite mailing list >Subject: Re: [sqlite] Boosting insert and indexing per

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread E.Pasma
> 2 dec. 2018, E.Pasma: > >> 30 nov. 2018, AJ Miles: >> >> Ah, this tool seems very handy. For those curious, I'll paste the results >> below. The index approximately doubles the storage size, but I am >> intentionally making that tradeoff to avoid the slow down when enforcing a >>

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-12-02 Thread E.Pasma
> 30 nov. 2018, AJ Miles: > > Ah, this tool seems very handy. For those curious, I'll paste the results > below. The index approximately doubles the storage size, but I am > intentionally making that tradeoff to avoid the slow down when enforcing a > unique/primary key on the Reference table

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-29 Thread AJ Miles
Ah, this tool seems very handy. For those curious, I'll paste the results below. The index approximately doubles the storage size, but I am intentionally making that tradeoff to avoid the slow down when enforcing a unique/primary key on the Reference table while inserting. -AJ /** Disk-Space

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-29 Thread Richard Hipp
On 11/29/18, AJ Miles wrote: > > I misspoke when I said it was 200-300gb for just the integers -- my brain > was a little fuzzy. Right now, the integer table has 3 columns: rowid, > integer, and foreign row id to a second table (so 8byte int, 8 byte int, > variable byte int I believe, unless the

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-29 Thread AJ Miles
Simon, Dominique, and Arun - Thank you for the feedback. I'll leave the threading off for inserts since I've seen DB contention issues with other multithreaded/multiprocessed attempts. The indexing improvement is nice though. I misspoke when I said it was 200-300gb for just the integers -- my

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-29 Thread Arun - Siara Logics (cc)
Without going into details of what your structure is, I suggest you look into "without rowid" option when creating the table. It reduces the overhead drastically. On Thu, 29 Nov 2018 18:59:26 +0530 Dominique Devienne wrote > On Wed, Nov 28, 2018 at 6:03 PM AJ M wrote: > > >

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-29 Thread Dominique Devienne
On Wed, Nov 28, 2018 at 6:03 PM AJ M wrote: > [...] The data comes out to 10 billion rows of an 8 byte signed integer (~200-300 gb pre-index), and while insertion takes ~6 hours, indexing takes > 8 hours by > itself. [...] query speed is fine as-is. [...] > Hi AJ. Your message is quite

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-28 Thread Simon Slavin
Ignore multiprocessing for the inserting. You'll just get contention when accessing the database. And I think you are already trying the right PRAGMAs. I think you've done this already, but just in case ... Insert rows in batches. Experiment with the batch size: maybe a thousand INSERTs per

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-28 Thread AJ Miles
Thank you for the good suggestions. I've been applying them to a smaller subset of my database to see how it might perform. I had tried fiddling with the cache size but it seemed to make performance slightly degrade in my case. In principle it should work, so perhaps my smaller database isn't

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-28 Thread David Raymond
-users@mailinglists.sqlite.org Subject: [sqlite] Boosting insert and indexing performance for 10 billion rows (?) Hi everyone - I've been using SQLite through Python (3.7) for a scientific project. The data comes out to 10 billion rows of an 8 byte signed integer (~200-300 gb pre-index

Re: [sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-28 Thread Keith Medcalf
>boun...@mailinglists.sqlite.org] On Behalf Of AJ M >Sent: Wednesday, 28 November, 2018 10:03 >To: sqlite-users@mailinglists.sqlite.org >Subject: [sqlite] Boosting insert and indexing performance for 10 >billion rows (?) > >Hi everyone - > >I've been using SQLite through Python (3.7) for a sci

[sqlite] Boosting insert and indexing performance for 10 billion rows (?)

2018-11-28 Thread AJ M
Hi everyone - I've been using SQLite through Python (3.7) for a scientific project. The data comes out to 10 billion rows of an 8 byte signed integer (~200-300 gb pre-index), and while insertion takes ~6 hours, indexing takes 8 hours by itself. Indexing also seems to slow as it is built. Does