On Thu, 28 Jan 2016 19:59:02 +0100 Dominique Devienne <ddevienne at gmail.com> wrote:
> Not sure to understand what you expect this pragma to do, but inserts > typically don't involve automatic indexes, which are used only in queries > (selects). I though it was responsible for the `sqlite_autoindex_"table_name"_N` which gets generated and augments the DB size. That's the name and the documentation which makes me believe this. I can see these index in sqlitebrowser and at the file size. > Your insert does have a query, but it fits the PK, so no automatic index is > needed. (and you can likely use AUTOINCREMENT to avoid that query in the > first place). "count" is not a primary key, and AUTOINCREMENT may be used only on primary key. It indeed does not need a primary key, a unique constraint is enough. Whether or not I define a PK or a unique constraint, it ends into the same speed and DB size. I just wanted to test if it differs when a PK is defined (it does not). > The 2.4x size difference is kinda expected, since your PK includes your > biggest column I suspect (i.e. value), so both the table, and its PK index > most store it, doubling at least the memory, and with page waste, clearly > can account for the difference. > > And the time difference can be in-part at least attributed to the increased > IO. That was my conclusion too, and that's why I would like to prevent the creation of this extraneous index. I can achieve it using `WITHOUT ROWID`, and not with the pragma, and that's what I don't understand. So, I can achieve it, just that it's in a way I don't understand, which give me the feeling I'm using a trick and I don't like it. Unless ROWID is indeed what's responsible for the creation of these extraneous indexes?? > It seems unusual to use the HTML text in the PK. Are you sure you need it? It's short texts (one to ten characters for an average), just element name, attribute name and attribute value, unique triplets with a count of their occurences. There are only a few attribute values which may be a bit long, the href of <a> elements. I must say I simplified my explanations, it's not really the index, it's a prior step which makes some statistics which are later used to decide what elements and attributes will be focused on to index the contents. The text the final index stores is reasonably short title-like texts. -- Yannick Duch?ne