On Fri, Aug 5, 2016 at 12:30 PM, Igor Korot <ikoro...@gmail.com> wrote:
> Hi, Kevin, > > On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com> > wrote: > > Okay, I followed some of the advice y'all gave and got some results. > > > > 1. The original problem was compromised by malformed input. However, it > > appears that did not cause the wedging of the process. See (3) below. > > Where are the data will come? > From the user? Internet? > > What I'm getting at is - you need to look for some malformed data in > the future as well. > I generate it. I goofed, and I'll try not to goof in the future. > > > > > 2. I separated the steps, and started small. Time increased slightly > > sub-linearly with dataset size, so I jumped to doing the whole thing. > With > > proper input, the data was loaded in 68 minutes. > > > > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database > > or disk is full" which seemed odd since most of my partitions have much > > more free space than the entire database. It turns out that whatever > does > > the creation was using space on my root partition (this is Linux, so that > > means "/"). That's the only partition in my setup without a huge amount > of > > free space. On would expect temporary stuff to go to /tmp (which has 3TB > > free), but it doesn't go there. I would go there if the system's native > > "sort" program were used. Fortunately, it turns out that the TMPDIR > > environment variable is honored, but while I could see space was being > > used, there were no files visible. I take that to mean that the > tmpfile() > > function (or equivalent) was used. This could be a bad idea for large > > indexes because anonymous files have to be kept open, and there's a limit > > on the number of files that can be open at a time, around 1,000. Sure > > enough, the index creation appears to be wedged like the original run, > and > > after a few hours I killed it manually. This is a deal-killer. > > The failure you saw - is it on the table with the complete data set? > Or you got it during the experimenting? > > Only on the complete data set. > > > > So the questions are: Where do bug reports go? I seem to be running > 3.8.2; > > is this fixed in any later version? > > You can try the "3.14" pre-released one right now. ;-) > Meh. I submitted a bug report to this list. I'll see what happens. > Thank you. > > > > > > > On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com> > > wrote: > > > >> The metric for feasability is coding ease, not runtime. I'm the > >> bottleneck, not the machine, at least at this point. > >> > >> As for adding rows, it will be about like this time: a billion or so at > a > >> time. But there's no need to save the old data. Each round can be > >> separate except for a persistent "solutions" table of much more modest > >> size. I've been doing this for a while now, and the solutions file has > >> only 10 million or so lines, each representing a game position for which > >> optimum moves are known. Getting this file to include the starting > >> position is the point of the exercise. > >> > >> If I ever get to anything like "production" in this project, I expect it > >> to run for maybe three years... That's after I tweak it for speed. > >> > >> Background: in production, this will be running on a dual-Xeon with 16 > >> cores (32 hyperthreads) and 1/4 TiB RAM. It has sequential file update > >> through Linux flock() calls at the moment. The code is bash gluing > >> together a collection of UNIX utilities and some custom C code. The C > is > >> kept as simple as possible, to minimize errors. > >> > >> As you may surmise, this "hobby" is important to me. > >> > >> > >> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote: > >> > >>> > >>> > >>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote: > >>> > >>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne < > ddevie...@gmail.com> > >>>> wrote: > >>>> > >>>> > >>>> It's even less dense than that. Each character has only 3 possible > >>>> values, > >>>> and thus it's pretty easy to compress down to 2 bits each, for a 16 > byte > >>>> blob. > >>>> It's just hard to do that without a bunch of SQLite code I'd have to > >>>> learn > >>>> how to write. The current effort amounts to a feasibility study, and > I > >>>> want > >>>> to keep it as simple as possible. > >>>> > >>> > >>> A feasibility study using equipment that are hamstrung by weights they > >>> won't have in the real situation is not an accurate study. > >>> > >>> It's like studying fuel consumption on a different kind of road > surface, > >>> but for the test purposes, the cars had to tow caravans containing > their > >>> testing equipment - the study will not look feasible at all. > >>> > >>> It might of course be that the feasibility you are studying is > completely > >>> unrelated to the data handling - in which case the point is moot. > >>> > >>> Let us know how it goes :) > >>> Ryan > >>> > >>> _______________________________________________ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >>> > >> > >> > >> > >> -- > >> #define QUESTION ((bb) || (!bb)) /* Shakespeare */ > >> > > > > > > > > -- > > #define QUESTION ((bb) || (!bb)) /* Shakespeare */ > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- #define QUESTION ((bb) || (!bb)) /* Shakespeare */ _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users