[jira] Commented: (SOLR-477) AnalysisRequestHandler

Grant Ingersoll (JIRA) Mon, 11 Feb 2008 20:58:30 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567956#action_12567956
 ]


Grant Ingersoll commented on SOLR-477:
--------------------------------------

OK, I will update tomorrow w/ primitives instead of the token tag.  I admit I 
don't fully understand the interplay between the other writers (JSON, etc.) so 
help would be appreciated there.

As for a SearchComponent piece, I'd like to hear more.  Does the 
SearchComponent  piece handle ContentStreams?  That is, could I just send my 
<add>...</add> to it and it would spit out the tokens?  On the query side of 
things, I think it would be useful to see how the query is analyzed, so that 
makes sense in a SearchComponent.  Perhaps we can find common code?

{quote}
Grant quick question because i can't tell by skimming the patch: is there any 
information currently returned by analysis.jsp that isn't returned by this 
handler? if not, we should consider adding it so that that JSP can go the way 
of hte DODO like the other JSPs are.
{quote}

It only outputs the final token result, so you wouldn't be able to do the 
intermediate code output.  Also, I think the JSP does some fancy sorting for 
positions, etc.  I leave that up to the application.  But, yes, I did have a 
look at it and originally started using some of the code, then went away from 
it once I realized I didn't care whether it was a regular Analyzer or a 
TokenChain.


> AnalysisRequestHandler
> ----------------------
>
>                 Key: SOLR-477
>                 URL: https://issues.apache.org/jira/browse/SOLR-477
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-477.patch
>
>
> Being able to programmatically access tokenization information can be quite 
> useful not only in Solr, but in other NLP applications where token vectors 
> are necessary.
> The patch to follow creates an AnalysisRequestHandler which processes a 
> document through the analysis process and returns a response filled with 
> tokens, their offsets, position inc., type and value.
> Patch also adds some character array processing to Xml and adds Token 
> handling to XMLWriter.
> I only implemented Xml output, as I don't know JSON or the other types.  If 
> someone else is so motivated, they can add those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-477) AnalysisRequestHandler

Reply via email to