Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

Kevin O'Gorman Fri, 05 Aug 2016 13:29:14 -0700

On Fri, Aug 5, 2016 at 1:08 PM, David Raymond <[email protected]>
wrote:


> There's a depreciated pragma, PRAGMA temp_store_directory =
> 'directory-name'; that apears to still work ok for now at least.
> http://www.sqlite.org/pragma.html#pragma_temp_store_directory
>
> I've used it for the same reasons you've both mentioned about space,
> though for me it vacuuming a huge db, and the full size db copy that makes
> which ate up my entire C drive. But with that pragma you can specify
> exactly which folder you want to use for your temp files. I'd suggest try
> using that, then monitor the folder you give it to see what shows up.
>

Since it's deprecated, I'll stick with TMPDIR because it's pretty standard
on Linux (it also works with the sort utility for instance).


> Apart from the default location of the files, it reads like your next main
> concern is how many temp files get opened up. My bet is that it'll be a
> very small number, just potentially huge in file size while it's doing its
> thing. But again, try that pragma and take a look.
>

My best bet is the contrary:  it starts with small files and makes
increasingly larger ones, like the sort utility does.  The problem is that
there are too many of them at the beginning for it to work with anonymous
files (which sort does not use).  This at least offers a possible
explanation of its getting wedged on large indexes: an unexpected and
untested error, handled poorly.

<Insert normal disclaimers about suggested use of a depreciated feature>
>
> -----Original Message-----
> From: sqlite-users [mailto:[email protected]]
> On Behalf Of Kevin O'Gorman
> Sent: Friday, August 05, 2016 3:41 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] newbie has waited days for a DB build to complete.
> what's up with this.
>
> On Fri, Aug 5, 2016 at 12:30 PM, Igor Korot <[email protected]> wrote:
>
> > Hi, Kevin,
> >
> > On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <[email protected]>
> > wrote:
> > > Okay, I followed some of the advice y'all gave and got some results.
> > >
> > > 1. The original problem was compromised by malformed input.  However,
> it
> > > appears that did not cause the wedging of the process.  See (3) below.
> >
> > Where are the data will come?
> > From the user? Internet?
> >
> > What I'm getting at is - you need to look for some malformed data in
> > the future as well.
> >
>
> I generate it.  I goofed, and I'll try not to goof in the future.
>
>
> >
> > >
> > > 2. I separated the steps, and started small.  Time increased slightly
> > > sub-linearly with dataset size, so I jumped to doing the whole thing.
> > With
> > > proper input, the data was loaded in 68 minutes.
> > >
> > > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting
> "database
> > > or disk is full" which seemed odd since most of my partitions have much
> > > more free space than the entire database.  It turns out that whatever
> > does
> > > the creation was using space on my root partition (this is Linux, so
> that
> > > means "/").  That's the only partition in my setup without a huge
> amount
> > of
> > > free space.  On would expect temporary stuff to go to /tmp (which has
> 3TB
> > > free), but it doesn't go there.  I would go there if the system's
> native
> > > "sort" program were used.  Fortunately, it turns out that the TMPDIR
> > > environment variable is honored, but while I could see space was being
> > > used, there were no files visible.  I take that to mean that the
> > tmpfile()
> > > function (or equivalent) was used.  This could be a bad idea for large
> > > indexes because anonymous files have to be kept open, and there's a
> limit
> > > on the number of files that can be open at a time, around 1,000.  Sure
> > > enough, the index creation appears to be wedged like the original run,
> > and
> > > after a few hours I killed it manually.  This is a deal-killer.
> >
> > The failure you saw - is it on the table with the complete data set?
> > Or you got it during the experimenting?
> >
> > Only on the complete data set.
>
>
> > >
> > > So the questions are: Where do bug reports go?  I seem to be running
> > 3.8.2;
> > > is this fixed in any later version?
> >
> > You can try the "3.14" pre-released one right now. ;-)
> >
>
> Meh.  I submitted a bug report to this list.  I'll see what happens.
>
>
> > Thank you.
> >
> > >
> > >
> > > On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <
> [email protected]>
> > > wrote:
> > >
> > >> The metric for feasability is coding ease, not runtime.  I'm the
> > >> bottleneck, not the machine, at least at this point.
> > >>
> > >> As for adding rows, it will be about like this time: a billion or so
> at
> > a
> > >> time.  But there's no need to save the old data.  Each round can be
> > >> separate except for a persistent "solutions" table of much more modest
> > >> size.  I've been doing this for a while now, and the solutions file
> has
> > >> only 10 million or so lines, each representing a game position for
> which
> > >> optimum moves are known.  Getting this file to include the starting
> > >> position is the point of the exercise.
> > >>
> > >> If I ever get to anything like "production" in this project, I expect
> it
> > >> to run for maybe three years...  That's after I tweak it for speed.
> > >>
> > >> Background: in production, this will be running on a dual-Xeon with 16
> > >> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file
> update
> > >> through Linux flock() calls at the moment.  The code is bash gluing
> > >> together a collection of UNIX utilities and some custom C code.  The C
> > is
> > >> kept as simple as possible, to minimize errors.
> > >>
> > >> As you may surmise, this "hobby" is important to me.
> > >>
> > >>
> > >> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <[email protected]> wrote:
> > >>
> > >>>
> > >>>
> > >>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
> > >>>
> > >>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <
> > [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>
> > >>>> It's even less dense than that.  Each character has only 3 possible
> > >>>> values,
> > >>>> and thus it's pretty easy to compress down to 2 bits each, for a 16
> > byte
> > >>>> blob.
> > >>>> It's just hard to do that without a bunch of SQLite code I'd have to
> > >>>> learn
> > >>>> how to write.  The current effort amounts to a feasibility study,
> and
> > I
> > >>>> want
> > >>>> to keep it as simple as possible.
> > >>>>
> > >>>
> > >>> A feasibility study using equipment that are hamstrung by weights
> they
> > >>> won't have in the real situation is not an accurate study.
> > >>>
> > >>> It's like studying fuel consumption on a different kind of road
> > surface,
> > >>> but for the test purposes, the cars had to tow caravans containing
> > their
> > >>> testing equipment - the study will not look feasible at all.
> > >>>
> > >>> It might of course be that the feasibility you are studying is
> > completely
> > >>> unrelated to the data handling - in which case the point is moot.
> > >>>
> > >>> Let us know how it goes :)
> > >>> Ryan
> > >>>
> > >>> _______________________________________________
> > >>> sqlite-users mailing list
> > >>> [email protected]
> > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> > >>
> > >
> > >
> > >
> > > --
> > > #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> > > _______________________________________________
> > > sqlite-users mailing list
> > > [email protected]
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > [email protected]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

Reply via email to