Okay, I followed some of the advice y'all gave and got some results.

1. The original problem was compromised by malformed input.  However, it
appears that did not cause the wedging of the process.  See (3) below.

2. I separated the steps, and started small.  Time increased slightly
sub-linearly with dataset size, so I jumped to doing the whole thing.  With
proper input, the data was loaded in 68 minutes.

3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database
or disk is full" which seemed odd since most of my partitions have much
more free space than the entire database.  It turns out that whatever does
the creation was using space on my root partition (this is Linux, so that
means "/").  That's the only partition in my setup without a huge amount of
free space.  On would expect temporary stuff to go to /tmp (which has 3TB
free), but it doesn't go there.  I would go there if the system's native
"sort" program were used.  Fortunately, it turns out that the TMPDIR
environment variable is honored, but while I could see space was being
used, there were no files visible.  I take that to mean that the tmpfile()
function (or equivalent) was used.  This could be a bad idea for large
indexes because anonymous files have to be kept open, and there's a limit
on the number of files that can be open at a time, around 1,000.  Sure
enough, the index creation appears to be wedged like the original run, and
after a few hours I killed it manually.  This is a deal-killer.

So the questions are: Where do bug reports go?  I seem to be running 3.8.2;
is this fixed in any later version?


On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> The metric for feasability is coding ease, not runtime.  I'm the
> bottleneck, not the machine, at least at this point.
>
> As for adding rows, it will be about like this time: a billion or so at a
> time.  But there's no need to save the old data.  Each round can be
> separate except for a persistent "solutions" table of much more modest
> size.  I've been doing this for a while now, and the solutions file has
> only 10 million or so lines, each representing a game position for which
> optimum moves are known.  Getting this file to include the starting
> position is the point of the exercise.
>
> If I ever get to anything like "production" in this project, I expect it
> to run for maybe three years...  That's after I tweak it for speed.
>
> Background: in production, this will be running on a dual-Xeon with 16
> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
> through Linux flock() calls at the moment.  The code is bash gluing
> together a collection of UNIX utilities and some custom C code.  The C is
> kept as simple as possible, to minimize errors.
>
> As you may surmise, this "hobby" is important to me.
>
>
> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:
>
>>
>>
>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
>>
>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com>
>>> wrote:
>>>
>>>
>>> It's even less dense than that.  Each character has only 3 possible
>>> values,
>>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
>>> blob.
>>> It's just hard to do that without a bunch of SQLite code I'd have to
>>> learn
>>> how to write.  The current effort amounts to a feasibility study, and I
>>> want
>>> to keep it as simple as possible.
>>>
>>
>> A feasibility study using equipment that are hamstrung by weights they
>> won't have in the real situation is not an accurate study.
>>
>> It's like studying fuel consumption on a different kind of road surface,
>> but for the test purposes, the cars had to tow caravans containing their
>> testing equipment - the study will not look feasible at all.
>>
>> It might of course be that the feasibility you are studying is completely
>> unrelated to the data handling - in which case the point is moot.
>>
>> Let us know how it goes :)
>> Ryan
>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to