[jira] Commented: (SOLR-1749) debug output should include explanation of what input strings were passed to the analzyers for each field

Hoss Man (JIRA) Tue, 02 Feb 2010 17:49:45 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828888#action_12828888
 ]


Hoss Man commented on SOLR-1749:
--------------------------------

This is an idea that's been rolling arround in my head for a while, and today I 
thought i'd spend some time experimenting with it.

It seemed like the main impelmentation challenge would be that by the time you 
are deep enough down in the code to be using an Analyzer, you don't have access 
to the SolrQueryRequest to record the debugging info.

I thought of two potential solutions...

 * Use ThreadLocal to track the debugging info if needed
 * Use Proxy Wrapper classes to record the debugging info if needed

I initially figured that writing proxy classes for SolrQueryRequest, 
IndexSchema, and Analyzer would be relatively straight forward, so i started 
down that path and discovered two anoying problems...

 # IndexSchema is currently final
 # not all code paths use IndexSchema.getQueryAnalyzer(), many fetch the 
FieldTypes and ask them for their Analyzer directly.

The second problem isn't insurmountable, but it complicates things in that it 
would require Proxy wrappers for FieldType as well.  The first problem requires 
a simple change, but carries with it some baggage that i wasn't ready to 
embrace.  In both cases i started to be very bothered by the long term 
maintenance something like this would introduce.  It would be very easy to 
write these Proxy classes that extend IndexSchema, FieldType, and Analyzer but 
it would be just as easy to forget to add the appropriate Proxy methods to them 
down the road when new methods are added to those base classes.

The issue with the FieldType also exposed a flaw in the idea of using 
ThreadLocal: if we only had to worry about IndexSearcher.getQueryAnalyzer(), we 
could modify it to check ThreadLocal easily enough, but at the FieldType level 
we would only be able to modify FieldTypes that ship with Solr, and we'd be 
missing any "plugin" FieldTypes,


So i aborted the experiment but i figured i should post the feature idea, and 
my existing thoughts, here in case anyone had other suggestions on how it could 
be implemented feasibly.

> debug output should include explanation of what input strings were passed to 
> the analzyers for each field
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1749
>                 URL: https://issues.apache.org/jira/browse/SOLR-1749
>             Project: Solr
>          Issue Type: Wish
>          Components: search
>            Reporter: Hoss Man
>
> Users are frequently confused by the interplay between "Query Parsing" and 
> "Query Time Analysis" (ie: markup meta-characters like whitespace and quotes, 
> multi-word synonyms, Shingles, etc...)  It would be nice if we had more 
> debugging output available that would help eliminate this confusion.  The 
> ideal API that comes to mind would be to include in the debug output of 
> SearchHandler a list of every "string" that was Analyzed, and what list of 
> field names it was analyzed against.  
> This info would not only make it clear to users what exactly they should 
> cut/paste into the analysis.jsp tool to see how their Analyzer is getting 
> used, but also what exactly is being done to their input strings prior to 
> their Analyzer being used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1749) debug output should include explanation of what input strings were passed to the analzyers for each field

Reply via email to