[ https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll updated SOLR-477: --------------------------------- Attachment: SOLR-477.patch First draft at a simple patch to add an AnalysisRequestHandler to Solr. Adds in an example set of docs under examples/exampleAnalysis along with a quick and dirty post.sh script to run the analysis. Adds support into XmlWriter and TextResponseWriter, although TextResponseWriter currently does nothing when encountering a Token. Unit tests show sample of how to invoke, but in a nutshell it works more or less like the <add> functionality. Each field in a <doc> is tokenized according to the analysis semantics for that field. Payloads are currently ignored. I suppose we could UUEncode the byte array or something, but, since Solr doesn't currently support payloads, not much point in outputting them just yet. > AnalysisRequestHandler > ---------------------- > > Key: SOLR-477 > URL: https://issues.apache.org/jira/browse/SOLR-477 > Project: Solr > Issue Type: New Feature > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: SOLR-477.patch > > > Being able to programmatically access tokenization information can be quite > useful not only in Solr, but in other NLP applications where token vectors > are necessary. > The patch to follow creates an AnalysisRequestHandler which processes a > document through the analysis process and returns a response filled with > tokens, their offsets, position inc., type and value. > Patch also adds some character array processing to Xml and adds Token > handling to XMLWriter. > I only implemented Xml output, as I don't know JSON or the other types. If > someone else is so motivated, they can add those. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.