[ 
https://issues.apache.org/jira/browse/MAILBOX-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724029#comment-15724029
 ] 

Tellier Benoit commented on MAILBOX-280:
----------------------------------------

This behaviour is expected.

Please have a look to 
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html 
. It explains how scoring works in ElasticSearch. To makes it short it relies 
on : 
 - Field length : the longer, the worst.
 - Term frequancy : the more a searched word appears, the better
 - Inversed document frequency : the more a searched word apears in a document, 
the worst.

Then one other operation you should know in ES is "tokenization". Before 
indexing, elastic search rewrite your query. 

Here Bodyyyyyy become merged into body and will match both body and bodyyyyyyy 
as it is the same content in the index. Again, tokenizers consider '-' as a 
separator, and Openpaas-linagora will bee tokenized as "Linagora" and 
"Openpaas".

Finally on complex queries, ES behaves as a sum of score of individual words. 
It's expected and it is the way it works. It still add some proximity 
information.

Such feature are expected from a decent search. For instance I can not remember 
which company edits OpenPaas (and query "OpenPaas-Other" ) and still get 
results. 

The problem is we swallow ES relevance scoring, wich is not at all used for 
sorting messages. Thus, as "OpenPaas-Other" is a partial match (with relevance 
0.1) and "OpenPaas-Linagora" a good match (relevance 1.5), if I do not take 
score into accont, I might end up with "OpenPass-Other" being reported before 
"OpenPaas-Linagora"

Finally, I would not open a MAILBOX ticket when mentionning JMAP related stuff. 
If JMAP wants to query it differently, we should handle that from the JMAP 
layer.

> String FilterCondition in getMessageList request should work correctly when 
> including white-spaces or hyphens
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAILBOX-280
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-280
>             Project: James Mailbox
>          Issue Type: Bug
>            Reporter: Laura Royet
>
> When making a getMessageList request with a String FilterCondition containing 
> a white-space or an hyphen, the integration test becomes green even with 
> wrong ending word or one wrong word. 
> Examples:
> When the email contains "Openpaas-Linagora"
> a filtering on "Openpaassssss-Linagora"in the property "text" of 
> FilterCondition is matching. It is the same for the following String :
> "Openpaas-Linagoraaaaaaa", "bla OpenPaas", "OpenPaas anticonstitutionn".
> When the email contains "Test body"
> a filtering on "Testyyy body"in the property "body" of FilterCondition is 
> matching. It is the same for: "Testy bodyyyyy", "Test gakayanakj", "halabdp 
> body".
> Can be reproduced respectivly in tests 
> "messageWithComplicatedAttachmentShouldHaveItsEmailBodyIndexed()" in 
> "SetMessagesMethodTest.java" in "package 
> org.apache.james.jmap.methods.integration"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to