[ 
https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-477:
---------------------------------

    Attachment: SOLR-477.patch

First draft at a simple patch to add an AnalysisRequestHandler to Solr.  

Adds in an example set of docs under examples/exampleAnalysis along with a 
quick and dirty post.sh script to run the analysis.

Adds support into XmlWriter and TextResponseWriter, although TextResponseWriter 
currently does nothing when encountering a Token.

Unit tests show sample of how to invoke, but in a nutshell it works more or 
less like the <add> functionality.  Each field in a <doc> is tokenized 
according to the analysis semantics for that field.  Payloads are currently 
ignored.  I suppose we could UUEncode the byte array or something, but, since 
Solr doesn't currently support payloads, not much point in outputting them just 
yet.

> AnalysisRequestHandler
> ----------------------
>
>                 Key: SOLR-477
>                 URL: https://issues.apache.org/jira/browse/SOLR-477
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-477.patch
>
>
> Being able to programmatically access tokenization information can be quite 
> useful not only in Solr, but in other NLP applications where token vectors 
> are necessary.
> The patch to follow creates an AnalysisRequestHandler which processes a 
> document through the analysis process and returns a response filled with 
> tokens, their offsets, position inc., type and value.
> Patch also adds some character array processing to Xml and adds Token 
> handling to XMLWriter.
> I only implemented Xml output, as I don't know JSON or the other types.  If 
> someone else is so motivated, they can add those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to