Hi, For my use case I need to index some texts, and I also need be able to assign attributes to sub-sequences/spans of tokens in the texts. I want to be able to search only for keywords/phrases that occur in spans with a certain attribute. As an example, imagine we have a set of rich text documents and we want to find the locations of the "SQLite rocks" phrase in that set, but not just any instance of it- only those that have an attribute "bold".
The general idea, I'm considering at the moment is as follows: 1) create a table 'spans' with columns (doc_id, attrib_id, start_tok, end_tok), where doc_id is equal to the corresponding rowid in the FTS table, and start_tok and end_tok are the 0-based offsets that delimit a span 2) issue a match (sub?)query against the FTS table and obtain a list of (rowid, list-of-offsets in that doc) 3) somehow convert the above results into (rowid, start_offset, end_offset) for each entry of the list-of-offsets 4) join the results from 3) with the "spans" table on "rowid=spans.doc_id" where "spans.attrib_id=X and start_offset >= spans.start_pos and end_offset <= spans.end_pos" Do you think this general approach makes sense, and how would you approach the problem if not? I only have a cursory knowledge of both SQL and SQLite at this point, so it's quite possible I'm missing something obvious. On the implementation side: - as far as I understand FTS5 has some clear advantages over FTS3/4. Apparently there isn't readily available offsets() function in FTS5 yet, but the API should make writing one rather straightforward - step 3) is what I wonder about the most... What would be a good way to convert a (doc_id, <list-of-offsets>) row into (doc_id, start_offset, end_offset) tuples for every entry in the list? I'd guess I will have to implement some sort of virtual table(a.k.a. "table-valued function")? Perhaps a table-valued function that takes a FTS query as its parameter that it then uses to obtain (doc_id, <list-of-offsets>), and expose each hit as (doc_id, start_offset, end_offset) rows in the virtual table? Any thoughts and ideas would be greatly appreciated. Thank you! _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users