[ https://issues.apache.org/jira/browse/SOLR-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567960#action_12567960 ]
Ryan McKinley commented on SOLR-477: ------------------------------------ {quote} I admit I don't fully understand the interplay between the other writers (JSON, etc.) so help would be appreciated there. {quote} essentially the types supported by TextResponseWriter are automatically supported by the standard writers. Check line 109 writeVal() in: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/request/TextResponseWriter.java {quote} As for a SearchComponent piece, I'd like to hear more. Does the SearchComponent piece handle ContentStreams? That is, could I just send my <add>...</add> to it and it would spit out the tokens? On the query side of things, I think it would be useful to see how the query is analyzed, so that makes sense in a SearchComponent. Perhaps we can find common code? {quote} No ContentStreams in the version I'm working with. I am analyzing stored fields so the client can link directly to a valid 'filter'. To see it in action, check: http://www.digitalcommonwealth.org/browse/archive:C%2FWMARS+Digital+Treasures+Respository/ Note how the subject line gets split into linkable tokens. Check that stored content "Mass." actually links to "/subject:Massachusetts/" I've also found this really useful for debugging what tokens exist for given fields -- of course it only works for stored fields. After you finish the handler version, I'll see what can be shared. > AnalysisRequestHandler > ---------------------- > > Key: SOLR-477 > URL: https://issues.apache.org/jira/browse/SOLR-477 > Project: Solr > Issue Type: New Feature > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: SOLR-477.patch > > > Being able to programmatically access tokenization information can be quite > useful not only in Solr, but in other NLP applications where token vectors > are necessary. > The patch to follow creates an AnalysisRequestHandler which processes a > document through the analysis process and returns a response filled with > tokens, their offsets, position inc., type and value. > Patch also adds some character array processing to Xml and adds Token > handling to XMLWriter. > I only implemented Xml output, as I don't know JSON or the other types. If > someone else is so motivated, they can add those. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.