Hi Emir, In this case, I need more control at Lucene level, so I have to use the lucene index writer directly. So, I can not use Solr for importing. Or, is there anyway I can add a tokenstream to a SolrInputDocument (is there any other class exposed by Solr during indexing that I can use for this purpose?). Am I correct or still missing something? Thank you.
On Wed, Nov 22, 2017 at 11:33 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Roxana, > I think you can use https://lucene.apache.org/core/5_4_0/analyzers-common/ > org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html < > https://lucene.apache.org/core/5_4_0/analyzers-common/ > org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html> like suggested > earlier. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 22 Nov 2017, at 11:43, Roxana Danger <roxana.dan...@gmail.com> wrote: > > > > Hi Emir, > > Many thanks for your reply. > > The UpdateProcessor can do this work, but is analyzer.reusableTokenStream > > <https://lucene.apache.org/core/3_0_3/api/core/org/ > apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang.String, > > java.io.Reader)> the way to obtain a previous generated tokenstream? is > it > > guarantee to get access to the token stream and not reconstruct it? > > Thanks, > > Roxana > > > > > > On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović < > > emir.arnauto...@sematext.com> wrote: > > > >> Hi Roxana, > >> I don’t think that it is possible. In some cases (seems like yours is > good > >> fit) you could create custom update request processor that would do the > >> shared analysis (you can have it defined in schema) and after analysis > use > >> those tokens to create new values for those two fields and remove source > >> value (or flag it as ignored in schema). > >> > >> HTH, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >>> On 22 Nov 2017, at 11:09, Roxana Danger <roxana.dan...@gmail.com> > wrote: > >>> > >>> Hello all, > >>> > >>> I would like to reuse the tokenstream generated for one field, to > create > >> a > >>> new tokenstream (adding a few filters to the available tokenstream), > for > >>> another field without the need of executing again the whole analysis. > >>> > >>> The particular application is: > >>> - I have field *tokens* that uses an analyzer that generate the tokens > >> (and > >>> maintains the token type attributes) > >>> - I would like to have another two new fields: *verbs* and > *adjectives*. > >>> These should reuse the tokenstream generated for the field *tokens* and > >>> filter the verbs and adjectives for the respective fields. > >>> > >>> Is this feasible? How should it be implemented? > >>> > >>> Many thanks. > >> > >> > >