Re: Fulltext matching

Ilya Kasnacheev Tue, 11 Sep 2018 02:08:49 -0700

Hello!

The only way to know if it will be accepted is to fill those tickets and
pull-requests (and then write about it on developers list)


Regards,
-- 
Ilya Kasnacheev


вт, 11 сент. 2018 г. в 0:04, Courtney Robinson <courtney.robin...@hypi.io>:

> Hi,
> Thanks for the response.
> I went ahead and implemented a custom indexing SPI. Works like a charm. As
> long as Ignite doesn't drop support for the indexing SPI interface this is
> exactly what we need.
> I'm happy to create Jira issues and extract this into something more
> generic for upstream if it'll be accepted.
>
> Regards,
> Courtney Robinson
> CTO, Hypi
> Tel: +4402032870961 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>
>
> On Thu, Sep 6, 2018 at 4:09 PM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> wrote:
>
>> Hello!
>>
>> Unfortunately, fulltext doesn't seem to have much traction, so I
>> recommend doing investigations on your side, possibly creating JIRA issues
>> in the process.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 3 сент. 2018 г. в 22:34, Courtney Robinson <courtney.robin...@hypi.io
>> >:
>>
>>> Hi,
>>>
>>> We've got Ignite in production and decided to start using some fulltext
>>> matching as well.
>>> I've investigated and can't figure out why my queries are not matching.
>>>
>>> I construct a query entity e.g new QueryEntity(keyClass, valueClass) and
>>> in debug I can see it generates a list of fields
>>> e.g. a, b, c.a, c.b
>>> I then expected to be able to match on those fields that are marked as
>>> indexed. Everything is annotation driven. The appropriate fields have been
>>> annotated and appear to be detected as such
>>> when I inspect what gets put into the QueryEntityDescriptor. i.e. all
>>> expected indices and indexed fields are present.
>>>
>>> In LuceneGridIndex I see that the lucene document generated as fields
>>> a,b (c.a and c.b are not included). Now a couple of questions arise:
>>>
>>> 1. Is there a way to get Ignite to index the nested fields as well so
>>> that c.a and c.b end up in the doc?
>>>
>>> 2. If you use a composite object as a key, its fields are extracted into
>>> the top level so if you have Key.a and Value.a you cannot index both since
>>> Key.a becomes a which collides with Value.a - can this be changed, are
>>> there any known reasons why it couldn't be (i.e. I'm happy to send a PR
>>> doing so - but I suspect the answer to this is linked to the answer to the
>>> first question)
>>>
>>> 3. The docs simply say you can use lucene syntax, I presume it means the
>>> syntax that appears in
>>> https://lucene.apache.org/core/2_9_4/queryparsersyntax.html is all
>>> valid - checking the code that appears to be case as it does
>>> a MultiFieldQueryParser in GridLuceneIndex. However, when I try to run a
>>> query such as a:<my-text> - none of the indexed documents match. In debug
>>> mode I've enabled parser.setAllowLeadingWildcard(true); and if I do a
>>> simple searcher.search * I get back the list of expected documents.
>>>
>>> What's even more odd is I tried querying each of the 6 indexed fields as
>>> found in idxdFields in GridLuceneIndex and 1 of them match. The other
>>> values are being typed exactly but also doing wild cards or other free text
>>> forms do not match.
>>>
>>> 4. I couldn't see a way to provide a custom GridLuceneIndex, I found the
>>> two cases where it's constructed in the code base and doesn't look like I
>>> can inject instances. Is it ok to construct and use a custom
>>> GridLuceneDirectory/IndexWriter/Searcher and so on in the same way
>>> GridLuceneIndex does it so I can do a custom IndexingSpi to change how
>>> indexing happens?
>>> There are a number of things I'd like to customise and from looking at
>>> the current impl. these things aren't injectable, I guess it's not
>>> considered a prime use case maybe.
>>>
>>> Yeah, the analyzer and a number of things would be handy to change.
>>> Ideally also want to customise how a field is indexed e.g. to be able to do
>>> term matches with lucene queries
>>>
>>> Looking at this impl as well it passes Integer.MAX_VALUE and pulls back
>>> all matches. That'll surely kill our nodes for some of the use cases we're
>>> considering.
>>> I'd also like to implement paging, the searcher API has a nice option to
>>> pass through a last doc it can continue from to potentially implement
>>> something like deep-paging.
>>>
>>> 5. If I were to do a custom IndexingSpi to make all of this happen, how
>>> do I get additional parameters through so that I could have paging params
>>> passed
>>>
>>> Ideally I could customise the indexing, searching and paging through
>>> standard Ignite means but I can't find any means of doing that in the
>>> current code and short of doing a custom IndexingSpi I think I've gone as
>>> far as I can debugging and could do with a few pointers of how to go about
>>> this.
>>>
>>> FYI, SQL isn't a great option for this part of the product, we're
>>> generating and compiling Java classes at runtime and generating SQL to do
>>> the queries is an order of magnitude more work than indexing the relatively
>>> few fields we need and then searching but off the bat the paging would be
>>> an issue as there can be several million matches to a query. Can't have
>>> Ignite pulling all of those into memory.
>>>
>>> Thanks in advance
>>>
>>> Courtney
>>>
>>

Re: Fulltext matching

Reply via email to