Hi, First let me say that FTS3 is really awesome. This is my first experience playing with FTS and it works very nicely with the PORTER stemming.
My particular use for FTS is not document text but addresses and it would be very useful if there were a way to analyze the FTS index to get statistics on the keys. I could then use this information to make a custom parser/stemmer that could eliminate stop words. For example, Rd, road, st, street, etc would be overly represented and not very discriminating, so these should/could be removed. Ideally this list should be generated based on loading the data, the analyzing the index, then updating the stemmer to remove the new stop works and again analyzing and adjusting if needed. Is this possible? How? If I had to code this where would I start, I would like to get a list of the keys and a count of how many rows that a given key is represented in. I assume a token that is represented multiple times in a document is represented by a list of offsets, so I can also get a count of the number of time it show in each document somehow. I think I have figured this much out by reading all the posts on FTS in the archive. Thanks, -Steve _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users