Re: Solr Custom Filter Factory - How to pass parameters?

Erick Erickson Wed, 22 Aug 2012 10:36:11 -0700

I'm reaching a bit here, haven't implemented one myself, but...

It seems like you're just dealing with some shared memory. So say
your filter recorded all the stuff you want to put into the DB. When
you put stuff in to the shared memory, you probably have to figure
out when you should commit the batch (if you're indexing 100M docs,
you probably don't want to use up that much memory, but what do I know).
This is all done at the filter.


It seems like you could also create an  SolrEventListener on
the PostCommit event
(see: http://wiki.apache.org/solr/SolrPlugins#SolrEventListener)
to put whatever remained in your list into your DB.

Of course you'd have to do some synchronization so multiple threads
played nice with each other. And you'd have to be sure to fire a commit
at the end of your indexing process if you wanted some certainty that
everything was tidied up. If some delay isn't a problem and you have
autocommit configured, then your event listener would be called when
then next autocommit happened.

FWIW
Erick

On Tue, Aug 21, 2012 at 8:19 PM, ksu wildcats <ksu.wildc...@gmail.com> wrote:
> Jack
>
> Reading through the documentation for UpdateRequestProcessor my
> understanding is that its good for handling processing of documents before
> analysis.
> Is it true that processAdd (where we can have custom logic) is invoked once
> per document and is invoked before any of the analyzers gets invoked?
>
> I couldn't figure out how I can use UpdateRequestProcessor to access the
> tokens stored in memory by CustomFilterFactory/CustomFilter.
>
> Can you please provide more information on how I can use
> UpdateRequestProcessor to handle any post processing that needs to be done
> after all documents are added to the index?
>
> Also does CustomFilterFactory/CustomFilter has any ways to do post
> processing after all documents are added to index?
>
> Here is the code i have for CustomFilterFactory/CustomFilter. This might
> help understand what i am trying to do and may be there is a better way to
> do this.
> The main problem i have with this approach is that i am forced to write
> results stored in memory (customMap) to database per document and if i have
> 1 million documents then thats 1 million db calls. I am trying to avoid the
> number of calls made to database by storing results in memory and write
> results to database once for every X documents (say, every 10000 docs).
>
> public class CustomFilterFactory extends BaseTokenFilterFactory {
>           public CustomFilter create(TokenStream input) {
>                     String databaseName = getArgs().get("paramname");
>                     return new CustomFilter(input, databasename);
>          }
> }
>
> public class CustomFilter extends TokenFilter {
>         private TermAttribute termAtt;
>         Map<TermAttribute, Integer> customMap = new HashMap<TermAttribute,
> Integer>();
>         String databasename = null;
>           protected CustomFilter(TokenStream input, String databasename) {
>                   super(input);
>                   termAtt = (TermAttribute) addAttribute(TermAttribute.class);
>                   this.databasename  = databasename;
>           }
>
>           public final boolean incrementToken() throws IOException {
>                   if (!input.incrementToken()) {
>                       writeResultsToDB()
>                       return false;
>                   }
>
>                   if (addWordToCustomMap()) {
>                         // do some analysis on term and then populate 
> customMap
>                         // customMap.put(term,somevalue);
>                   }
>
>                   if (customMap.size() > commitSize) {
>                         writeResultsToDB()
>                   }
>                   return true;
>           }
>
>           boolean addWordToCustomMap() {
>                 // custom logic - some validation on term to determine if 
> this should be
> added to customMap
>           }
>
>           void writeResultsToDB() throws IOException {
>                 // custom logic that reads data from customMap, does some 
> analysis and
> writes them to database.
>           }
> }
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002531.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Custom Filter Factory - How to pass parameters?

Reply via email to