RETRACTED: Not wedged, just somewhat slower than I expected.  I was
expecting about an hour, and it takes 6.  On reflection, it's okay.  My
ideas about what's going on were wrong too -- anonymous files are used, but
only two of them at a time.

I still think it is using a poor choice of default directory for temporary
files.  Is it /var/tmp instead of the more usual /tmp?  That's the only
other choice that would not be just plain wrong, and maybe it was chosen
thinking that it's likely to be on a bigger partition.  Maybe, sometimes.
But a lot of the time, /var/tmp is on the same partition as /tmp, or just
ignored because the sort utility defaults to /tmp and often that's the
biggest user of big temporary files.  That's the reason that I've placed
/tmp on its own huge partition (3 TB) and left /var/tmp alone.  The big
reason to use /var/tmp is that the files there are not deleted
automatically -- the standard says they're to be preserved -- but anonymous
files aren't ever preserved -- they've disappeared from the file system by
definition.

In all, not a big issue since $TMPDIR is honored, but I don't see that it's
documented.  I just guessed it since the sort utility honors it and I
thought it was possible sort was being used under the covers.  It's not,
but it all worked out okay.

Does anybody know where the actual defaults and controlling environment
variables are documented, by operating system?  Or are they?

On Fri, Aug 5, 2016 at 12:36 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> CREATE INDEX has two problems:
> 1) poor default location of temporary storage.
> 2) gets wedged on very large indexes.
>
> I'm using the sqlite that came with Xubuntu 14.04, I think it's version
> 3.8.2.
>
> I created a table, and used .import to populate it with records, about 1.4
> billion of them.  The resulting database is 232 GB.  All seemed well.
>
> I then went to CREATE INDEX on two fields, one CHAR(64) and the other an
> INT.  This initially died quickly, but I determined that it was because it
> was attempting to use my smallest partition for workspace.  It was in my
> root partition, but not in /tmp which is separate.  I submit that /tmp
> would be the natural choice, as it is what the sort command uses by
> default.  That's problem 1.
>
> Fortunately, it honors the $TMPDIR environment variable, but gets wedged
> after a bit.  I have an idea why.  While it was still actively adding
> space, there were no filenames showing, so I infer that it's using
> "anonymous" files, perhaps by calling tmpfile().  This could be bad news,
> as anonymous files have to be held open, but there are limits on how many
> open files any process is allowed.  Unless your merge process starts with
> at least 1.4 million keys in each buffer, this is going to fail on my
> data.  I suggest using tempnam() and closing the files when you're not
> actively using them.
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to