Hi, Kevin, On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com> wrote: > Okay, I followed some of the advice y'all gave and got some results. > > 1. The original problem was compromised by malformed input. However, it > appears that did not cause the wedging of the process. See (3) below.
Where are the data will come? From the user? Internet? What I'm getting at is - you need to look for some malformed data in the future as well. > > 2. I separated the steps, and started small. Time increased slightly > sub-linearly with dataset size, so I jumped to doing the whole thing. With > proper input, the data was loaded in 68 minutes. > > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database > or disk is full" which seemed odd since most of my partitions have much > more free space than the entire database. It turns out that whatever does > the creation was using space on my root partition (this is Linux, so that > means "/"). That's the only partition in my setup without a huge amount of > free space. On would expect temporary stuff to go to /tmp (which has 3TB > free), but it doesn't go there. I would go there if the system's native > "sort" program were used. Fortunately, it turns out that the TMPDIR > environment variable is honored, but while I could see space was being > used, there were no files visible. I take that to mean that the tmpfile() > function (or equivalent) was used. This could be a bad idea for large > indexes because anonymous files have to be kept open, and there's a limit > on the number of files that can be open at a time, around 1,000. Sure > enough, the index creation appears to be wedged like the original run, and > after a few hours I killed it manually. This is a deal-killer. The failure you saw - is it on the table with the complete data set? Or you got it during the experimenting? > > So the questions are: Where do bug reports go? I seem to be running 3.8.2; > is this fixed in any later version? You can try the "3.14" pre-released one right now. ;-) Thank you. > > > On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com> > wrote: > >> The metric for feasability is coding ease, not runtime. I'm the >> bottleneck, not the machine, at least at this point. >> >> As for adding rows, it will be about like this time: a billion or so at a >> time. But there's no need to save the old data. Each round can be >> separate except for a persistent "solutions" table of much more modest >> size. I've been doing this for a while now, and the solutions file has >> only 10 million or so lines, each representing a game position for which >> optimum moves are known. Getting this file to include the starting >> position is the point of the exercise. >> >> If I ever get to anything like "production" in this project, I expect it >> to run for maybe three years... That's after I tweak it for speed. >> >> Background: in production, this will be running on a dual-Xeon with 16 >> cores (32 hyperthreads) and 1/4 TiB RAM. It has sequential file update >> through Linux flock() calls at the moment. The code is bash gluing >> together a collection of UNIX utilities and some custom C code. The C is >> kept as simple as possible, to minimize errors. >> >> As you may surmise, this "hobby" is important to me. >> >> >> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote: >> >>> >>> >>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote: >>> >>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com> >>>> wrote: >>>> >>>> >>>> It's even less dense than that. Each character has only 3 possible >>>> values, >>>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte >>>> blob. >>>> It's just hard to do that without a bunch of SQLite code I'd have to >>>> learn >>>> how to write. The current effort amounts to a feasibility study, and I >>>> want >>>> to keep it as simple as possible. >>>> >>> >>> A feasibility study using equipment that are hamstrung by weights they >>> won't have in the real situation is not an accurate study. >>> >>> It's like studying fuel consumption on a different kind of road surface, >>> but for the test purposes, the cars had to tow caravans containing their >>> testing equipment - the study will not look feasible at all. >>> >>> It might of course be that the feasibility you are studying is completely >>> unrelated to the data handling - in which case the point is moot. >>> >>> Let us know how it goes :) >>> Ryan >>> >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@mailinglists.sqlite.org >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >>> >> >> >> >> -- >> #define QUESTION ((bb) || (!bb)) /* Shakespeare */ >> > > > > -- > #define QUESTION ((bb) || (!bb)) /* Shakespeare */ > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users