--- [EMAIL PROTECTED] wrote:
> Joe Wilson <[EMAIL PROTECTED]> wrote:
> >
> > As for the stats from sqlite3_analyzer, they seem to be in the right
> > ballpark.
> > But I'm not sure its heuristic accounts for rows that are significantly
> > larger
> > than the page size, though. In such cases I a
Joe Wilson <[EMAIL PROTECTED]> wrote:
>
> As for the stats from sqlite3_analyzer, they seem to be in the right ballpark.
> But I'm not sure its heuristic accounts for rows that are significantly larger
> than the page size, though. In such cases I am seeing higher than expected
> fragmentation af
--- [EMAIL PROTECTED] wrote:
> Joe Wilson <[EMAIL PROTECTED]> wrote:
> >
> > See also: Changes to support fragmentation analysis in sqlite3_analyzer.
> > http://www.sqlite.org/cvstrac/chngview?cn=3634
> >
>
> I'm not real sure those patches are working right.
> I need to revisit that whole fragm
Joe Wilson <[EMAIL PROTECTED]> wrote:
>
> See also: Changes to support fragmentation analysis in sqlite3_analyzer.
> http://www.sqlite.org/cvstrac/chngview?cn=3634
>
I'm not real sure those patches are working right.
I need to revisit that whole fragmentation analysis
thing before the next relea
P Kishor wrote:
Mac/Unix person meself, but the Windows XP sort is pretty darn good as
well.
I'll take a look. Last time I used it it was useless. Win9x days? these
days (especially for a one off) I'd probably go straight to doing it in
Python to avoid x-platform syntax issues.
Martin
Ah yes, I should read more carefully :)
Thanks, right, I was actually guaranteeing uniqueness originally by just
fetching and then inserting only if there wasn't a match (I needed a rowid
if the row existed anyway). Now I'm guaranteeing uniqueness by letting sort
do the work for me, but simila
--- [EMAIL PROTECTED] wrote:
> Gerry Snyder <[EMAIL PROTECTED]> wrote:
> > Chris Jones wrote:
> > > Hi all,
> > >
> > > I have a very simple schema. I need to assign a unique identifier to a
> > > large collection of strings, each at most 80-bytes, although typically
> > > shorter.
> > >
> > > T
Chris Jones wrote:
I probably should have made this more explicit, but in sqlite, every row has
a unique identifier named rowid, which exists even if it isn't explicity
declared in the schema, and I was depending on that. If you declare a
PRIMARY KEY, then this replaces rowid.
A tiny cor
Chris Jones <[EMAIL PROTECTED]> writes:
> Derrell.Lipman wrote:
>>
>> Chris Jones <[EMAIL PROTECTED]> writes:
>>
>> I don't think that your original solution solves that problem either. You
>> first posted this schema:
>>
>>> My schema looks as follows:
>>>
>>> CREATE TABLE rawfen ( fen VARCH
Derrell.Lipman wrote:
>
> Chris Jones <[EMAIL PROTECTED]> writes:
>
> I don't think that your original solution solves that problem either. You
> first posted this schema:
>
>> My schema looks as follows:
>>
>> CREATE TABLE rawfen ( fen VARCHAR(80) );
>> CREATE INDEX rawfen_idx_fen ON rawfe
Chris Jones wrote:
So, I did a "sort -u -S 1800M fenout.txt > fenoutsort.txt"
The sort took about 45 minutes, which is acceptable for me (it was much
longer without the -S option to tell it to make use of more memory), and
then loading the table was very efficient. Inserting all the rows into m
--- Dennis Cote <[EMAIL PROTECTED]> wrote:
> You could also improve the locality in the database file further by
> running a vacuum command after it has been created. this will move the
> pages around so that the page of the table are contiguous and so are the
> pages of the index, rather than h
A fast technique to achieve your objective is to perform what I believe
is called a "monkey puzzle" sort. The data is not moved, instead an
array of descriptors to each element is sorted. The output is realized
by scanning the list of descriptors and picking up the associated record
from the
An issue with cache is cache shadowing, the churning as data is copied
from one cache to another to another.
An example is the speed-up achieved on network accesses by using
sendfile or TransmitFile and bypassing up to four levels of buffering
for a message being despatched to a network interf
On 3/22/07, Martin Jenkins <[EMAIL PROTECTED]> wrote:
Chris Jones wrote:
> realized that the unix "sort"
If I'd known you were on 'nix I'd have suggested using 'sort' and/or
'md5sum' about 12 hours ago. ;)
Mac/Unix person meself, but the Windows XP sort is pretty darn good as well.
--
Pune
Chris Jones wrote:
realized that the unix "sort"
If I'd known you were on 'nix I'd have suggested using 'sort' and/or
'md5sum' about 12 hours ago. ;)
Martin
-
To unsubscribe, send email to [EMAIL PROTECTED]
You could sort the table then perform a merge which removes duplicates.
Chris Jones wrote:
I don't think that solves my problem. Sure, it guarantees that the IDs are
unique, but not the strings.
My whole goal is to be able to create a unique identifier for each string,
in such a way that I
Thanks everyone for your feedback.
I ended up doing a presort on the data, and then adding the data in order.
At first I was a little concerned about how I was going to implement an
external sort on a data set that huge, and realized that the unix "sort"
command can handle large files, and in f
Chris Jones wrote:
I've read elsewhere that this is a data locality issue, which certainly
makes sense.
And in those threads, a suggestion has been made to insert in sorted order.
But it's unclear to me exactly what the sorting function would need to be -
it's likely my sorting function (say s
On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
"P Kishor" <[EMAIL PROTECTED]> wrote:
> Richard,
>
> On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> ...
> > The problem is that your working set is bigger than your cache
> > which is causing thrashing. I suggest a solution lik
"P Kishor" <[EMAIL PROTECTED]> wrote:
> Richard,
>
> On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> ...
> > The problem is that your working set is bigger than your cache
> > which is causing thrashing. I suggest a solution like this:
> >
> > Add entries to table ONE until the table a
At 04:47 22/03/2007, you wrote:
I don't think that solves my problem. Sure, it guarantees that the IDs are
unique, but not the strings.
My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identif
Richard,
On 3/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
..
The problem is that your working set is bigger than your cache
which is causing thrashing. I suggest a solution like this:
Add entries to table ONE until the table and its unique index get
so big that they no longer fit in ca
Gerry Snyder <[EMAIL PROTECTED]> wrote:
> Chris Jones wrote:
> Hi all,
> I have a very simple schema. I need to assign a unique identifier
> to a large collection of strings, each at most 80-bytes, although
> typically shorter.
Would it help to hash the strings, then save them in the DB, checki
Ion Silvestru <[EMAIL PROTECTED]> wrote:
> > drh wrote:
> > INSERT INTO two SELECT * FROM one ORDER BY unique_column;
>
> >The ORDER BY is important here.
>
> This is an excerpt from SQLite documentation:
>
> The second form of the INSERT statement takes it data from a SELE
Chris Jones <[EMAIL PROTECTED]> writes:
> I don't think that solves my problem. Sure, it guarantees that the IDs are
> unique, but not the strings.
>
> My whole goal is to be able to create a unique identifier for each string,
> in such a way that I dont have the same string listed twice, with
> drh wrote:
> INSERT INTO two SELECT * FROM one ORDER BY unique_column;
>The ORDER BY is important here.
This is an excerpt from SQLite documentation:
The second form of the INSERT statement takes it data from a SELECT statement.
The number of columns in the result of the
Gerry Snyder <[EMAIL PROTECTED]> wrote:
> Chris Jones wrote:
> > Hi all,
> >
> > I have a very simple schema. I need to assign a unique identifier to a
> > large collection of strings, each at most 80-bytes, although typically
> > shorter.
> >
> > The problem is I have 112 million of them.
>
>
Chris Jones wrote:
Hi all,
I have a very simple schema. I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.
The problem is I have 112 million of them.
Maybe you could start by breaking the data into 8 equal groups and make
You stated in your OP
I need to assign a unique identifier to a large collection of strings
Obviously I misunderstood that to mean you wanted the strings tagged
uniquely, not that the strings were unique. In your case, it seems
then, you will have to put up with checking each string, and as th
I don't think that solves my problem. Sure, it guarantees that the IDs are
unique, but not the strings.
My whole goal is to be able to create a unique identifier for each string,
in such a way that I dont have the same string listed twice, with different
identifiers.
In your solution, there i
On 3/21/07, Chris Jones <[EMAIL PROTECTED]> wrote:
Hi all,
I have a very simple schema. I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.
The problem is I have 112 million of them.
My schema looks as follows:
CREATE TAB
Hi all,
I have a very simple schema. I need to assign a unique identifier to a
large collection of strings, each at most 80-bytes, although typically
shorter.
The problem is I have 112 million of them.
My schema looks as follows:
CREATE TABLE rawfen ( fen VARCHAR(80) );
CREATE INDEX rawfen_id
33 matches
Mail list logo