I'm reaching a bit here, haven't implemented one myself, but... It seems like you're just dealing with some shared memory. So say your filter recorded all the stuff you want to put into the DB. When you put stuff in to the shared memory, you probably have to figure out when you should commit the batch (if you're indexing 100M docs, you probably don't want to use up that much memory, but what do I know). This is all done at the filter.
It seems like you could also create an SolrEventListener on the PostCommit event (see: http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) to put whatever remained in your list into your DB. Of course you'd have to do some synchronization so multiple threads played nice with each other. And you'd have to be sure to fire a commit at the end of your indexing process if you wanted some certainty that everything was tidied up. If some delay isn't a problem and you have autocommit configured, then your event listener would be called when then next autocommit happened. FWIW Erick On Tue, Aug 21, 2012 at 8:19 PM, ksu wildcats <ksu.wildc...@gmail.com> wrote: > Jack > > Reading through the documentation for UpdateRequestProcessor my > understanding is that its good for handling processing of documents before > analysis. > Is it true that processAdd (where we can have custom logic) is invoked once > per document and is invoked before any of the analyzers gets invoked? > > I couldn't figure out how I can use UpdateRequestProcessor to access the > tokens stored in memory by CustomFilterFactory/CustomFilter. > > Can you please provide more information on how I can use > UpdateRequestProcessor to handle any post processing that needs to be done > after all documents are added to the index? > > Also does CustomFilterFactory/CustomFilter has any ways to do post > processing after all documents are added to index? > > Here is the code i have for CustomFilterFactory/CustomFilter. This might > help understand what i am trying to do and may be there is a better way to > do this. > The main problem i have with this approach is that i am forced to write > results stored in memory (customMap) to database per document and if i have > 1 million documents then thats 1 million db calls. I am trying to avoid the > number of calls made to database by storing results in memory and write > results to database once for every X documents (say, every 10000 docs). > > public class CustomFilterFactory extends BaseTokenFilterFactory { > public CustomFilter create(TokenStream input) { > String databaseName = getArgs().get("paramname"); > return new CustomFilter(input, databasename); > } > } > > public class CustomFilter extends TokenFilter { > private TermAttribute termAtt; > Map<TermAttribute, Integer> customMap = new HashMap<TermAttribute, > Integer>(); > String databasename = null; > protected CustomFilter(TokenStream input, String databasename) { > super(input); > termAtt = (TermAttribute) addAttribute(TermAttribute.class); > this.databasename = databasename; > } > > public final boolean incrementToken() throws IOException { > if (!input.incrementToken()) { > writeResultsToDB() > return false; > } > > if (addWordToCustomMap()) { > // do some analysis on term and then populate > customMap > // customMap.put(term,somevalue); > } > > if (customMap.size() > commitSize) { > writeResultsToDB() > } > return true; > } > > boolean addWordToCustomMap() { > // custom logic - some validation on term to determine if > this should be > added to customMap > } > > void writeResultsToDB() throws IOException { > // custom logic that reads data from customMap, does some > analysis and > writes them to database. > } > } > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002531.html > Sent from the Solr - User mailing list archive at Nabble.com.