On Fri, Aug 5, 2016 at 12:30 PM, Igor Korot <ikoro...@gmail.com> wrote:

> Hi, Kevin,
>
> On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
> > Okay, I followed some of the advice y'all gave and got some results.
> >
> > 1. The original problem was compromised by malformed input.  However, it
> > appears that did not cause the wedging of the process.  See (3) below.
>
> Where are the data will come?
> From the user? Internet?
>
> What I'm getting at is - you need to look for some malformed data in
> the future as well.
>

I generate it.  I goofed, and I'll try not to goof in the future.


>
> >
> > 2. I separated the steps, and started small.  Time increased slightly
> > sub-linearly with dataset size, so I jumped to doing the whole thing.
> With
> > proper input, the data was loaded in 68 minutes.
> >
> > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database
> > or disk is full" which seemed odd since most of my partitions have much
> > more free space than the entire database.  It turns out that whatever
> does
> > the creation was using space on my root partition (this is Linux, so that
> > means "/").  That's the only partition in my setup without a huge amount
> of
> > free space.  On would expect temporary stuff to go to /tmp (which has 3TB
> > free), but it doesn't go there.  I would go there if the system's native
> > "sort" program were used.  Fortunately, it turns out that the TMPDIR
> > environment variable is honored, but while I could see space was being
> > used, there were no files visible.  I take that to mean that the
> tmpfile()
> > function (or equivalent) was used.  This could be a bad idea for large
> > indexes because anonymous files have to be kept open, and there's a limit
> > on the number of files that can be open at a time, around 1,000.  Sure
> > enough, the index creation appears to be wedged like the original run,
> and
> > after a few hours I killed it manually.  This is a deal-killer.
>
> The failure you saw - is it on the table with the complete data set?
> Or you got it during the experimenting?
>
> Only on the complete data set.


> >
> > So the questions are: Where do bug reports go?  I seem to be running
> 3.8.2;
> > is this fixed in any later version?
>
> You can try the "3.14" pre-released one right now. ;-)
>

Meh.  I submitted a bug report to this list.  I'll see what happens.


> Thank you.
>
> >
> >
> > On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
> > wrote:
> >
> >> The metric for feasability is coding ease, not runtime.  I'm the
> >> bottleneck, not the machine, at least at this point.
> >>
> >> As for adding rows, it will be about like this time: a billion or so at
> a
> >> time.  But there's no need to save the old data.  Each round can be
> >> separate except for a persistent "solutions" table of much more modest
> >> size.  I've been doing this for a while now, and the solutions file has
> >> only 10 million or so lines, each representing a game position for which
> >> optimum moves are known.  Getting this file to include the starting
> >> position is the point of the exercise.
> >>
> >> If I ever get to anything like "production" in this project, I expect it
> >> to run for maybe three years...  That's after I tweak it for speed.
> >>
> >> Background: in production, this will be running on a dual-Xeon with 16
> >> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
> >> through Linux flock() calls at the moment.  The code is bash gluing
> >> together a collection of UNIX utilities and some custom C code.  The C
> is
> >> kept as simple as possible, to minimize errors.
> >>
> >> As you may surmise, this "hobby" is important to me.
> >>
> >>
> >> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:
> >>
> >>>
> >>>
> >>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
> >>>
> >>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <
> ddevie...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>> It's even less dense than that.  Each character has only 3 possible
> >>>> values,
> >>>> and thus it's pretty easy to compress down to 2 bits each, for a 16
> byte
> >>>> blob.
> >>>> It's just hard to do that without a bunch of SQLite code I'd have to
> >>>> learn
> >>>> how to write.  The current effort amounts to a feasibility study, and
> I
> >>>> want
> >>>> to keep it as simple as possible.
> >>>>
> >>>
> >>> A feasibility study using equipment that are hamstrung by weights they
> >>> won't have in the real situation is not an accurate study.
> >>>
> >>> It's like studying fuel consumption on a different kind of road
> surface,
> >>> but for the test purposes, the cars had to tow caravans containing
> their
> >>> testing equipment - the study will not look feasible at all.
> >>>
> >>> It might of course be that the feasibility you are studying is
> completely
> >>> unrelated to the data handling - in which case the point is moot.
> >>>
> >>> Let us know how it goes :)
> >>> Ryan
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>>
> >>
> >>
> >>
> >> --
> >> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> >>
> >
> >
> >
> > --
> > #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to