[jira] Commented: (SOLR-477) AnalysisRequestHandler

Ryan McKinley (JIRA) Mon, 11 Feb 2008 21:22:39 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567960#action_12567960
 ]


Ryan McKinley commented on SOLR-477:
------------------------------------

{quote}
I admit I don't fully understand the interplay between the other writers (JSON, 
etc.) so help would be appreciated there.
{quote}

essentially the types supported by TextResponseWriter are automatically 
supported by the standard writers.  Check line 109 writeVal() in: 
http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/request/TextResponseWriter.java


{quote}
As for a SearchComponent piece, I'd like to hear more.  Does the 
SearchComponent  piece handle ContentStreams?  That is, could I just send my 
<add>...</add> to it and it would spit out the tokens?  On the query side of 
things, I think it would be useful to see how the query is analyzed, so that 
makes sense in a SearchComponent.  Perhaps we can find common code?
{quote}

No ContentStreams in the version I'm working with.  I am analyzing stored 
fields so the client can link directly to a valid 'filter'.  To see it in 
action, check:
http://www.digitalcommonwealth.org/browse/archive:C%2FWMARS+Digital+Treasures+Respository/

Note how the subject line gets split into linkable tokens.  Check that stored 
content "Mass." actually links to "/subject:Massachusetts/"

I've also found this really useful for debugging what tokens exist for given 
fields -- of course it only works for stored fields.

After you finish the handler version, I'll see what can be shared.


> AnalysisRequestHandler
> ----------------------
>
>                 Key: SOLR-477
>                 URL: https://issues.apache.org/jira/browse/SOLR-477
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-477.patch
>
>
> Being able to programmatically access tokenization information can be quite 
> useful not only in Solr, but in other NLP applications where token vectors 
> are necessary.
> The patch to follow creates an AnalysisRequestHandler which processes a 
> document through the analysis process and returns a response filled with 
> tokens, their offsets, position inc., type and value.
> Patch also adds some character array processing to Xml and adds Token 
> handling to XMLWriter.
> I only implemented Xml output, as I don't know JSON or the other types.  If 
> someone else is so motivated, they can add those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-477) AnalysisRequestHandler

Reply via email to