Hi, Kevin,

On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> Okay, I followed some of the advice y'all gave and got some results.
>
> 1. The original problem was compromised by malformed input.  However, it
> appears that did not cause the wedging of the process.  See (3) below.

Where are the data will come?
From the user? Internet?

What I'm getting at is - you need to look for some malformed data in
the future as well.

>
> 2. I separated the steps, and started small.  Time increased slightly
> sub-linearly with dataset size, so I jumped to doing the whole thing.  With
> proper input, the data was loaded in 68 minutes.
>
> 3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database
> or disk is full" which seemed odd since most of my partitions have much
> more free space than the entire database.  It turns out that whatever does
> the creation was using space on my root partition (this is Linux, so that
> means "/").  That's the only partition in my setup without a huge amount of
> free space.  On would expect temporary stuff to go to /tmp (which has 3TB
> free), but it doesn't go there.  I would go there if the system's native
> "sort" program were used.  Fortunately, it turns out that the TMPDIR
> environment variable is honored, but while I could see space was being
> used, there were no files visible.  I take that to mean that the tmpfile()
> function (or equivalent) was used.  This could be a bad idea for large
> indexes because anonymous files have to be kept open, and there's a limit
> on the number of files that can be open at a time, around 1,000.  Sure
> enough, the index creation appears to be wedged like the original run, and
> after a few hours I killed it manually.  This is a deal-killer.

The failure you saw - is it on the table with the complete data set?
Or you got it during the experimenting?

>
> So the questions are: Where do bug reports go?  I seem to be running 3.8.2;
> is this fixed in any later version?

You can try the "3.14" pre-released one right now. ;-)

Thank you.

>
>
> On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
>
>> The metric for feasability is coding ease, not runtime.  I'm the
>> bottleneck, not the machine, at least at this point.
>>
>> As for adding rows, it will be about like this time: a billion or so at a
>> time.  But there's no need to save the old data.  Each round can be
>> separate except for a persistent "solutions" table of much more modest
>> size.  I've been doing this for a while now, and the solutions file has
>> only 10 million or so lines, each representing a game position for which
>> optimum moves are known.  Getting this file to include the starting
>> position is the point of the exercise.
>>
>> If I ever get to anything like "production" in this project, I expect it
>> to run for maybe three years...  That's after I tweak it for speed.
>>
>> Background: in production, this will be running on a dual-Xeon with 16
>> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
>> through Linux flock() calls at the moment.  The code is bash gluing
>> together a collection of UNIX utilities and some custom C code.  The C is
>> kept as simple as possible, to minimize errors.
>>
>> As you may surmise, this "hobby" is important to me.
>>
>>
>> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:
>>
>>>
>>>
>>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
>>>
>>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> It's even less dense than that.  Each character has only 3 possible
>>>> values,
>>>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
>>>> blob.
>>>> It's just hard to do that without a bunch of SQLite code I'd have to
>>>> learn
>>>> how to write.  The current effort amounts to a feasibility study, and I
>>>> want
>>>> to keep it as simple as possible.
>>>>
>>>
>>> A feasibility study using equipment that are hamstrung by weights they
>>> won't have in the real situation is not an accurate study.
>>>
>>> It's like studying fuel consumption on a different kind of road surface,
>>> but for the test purposes, the cars had to tow caravans containing their
>>> testing equipment - the study will not look feasible at all.
>>>
>>> It might of course be that the feasibility you are studying is completely
>>> unrelated to the data handling - in which case the point is moot.
>>>
>>> Let us know how it goes :)
>>> Ryan
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>
>>
>>
>> --
>> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>>
>
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to