Either is possible. For the first, you would write a custom update processor
that handled the dual Tika call...

For the second, consider writing a SolrJ program that just does it all on
the client. Just download Tika from the apache project (or tease out all
the jars from the Solr distro) and then make it all work on the client.

Here's a sample app:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Feb 19, 2012 at 9:44 PM, bing <jsuser1...@hotmail.com> wrote:
> Hi, all,
>
> I am deploying a multicore solr server runing on Tomcat, where I want to
> achieve language detection during index/query.
>
> Solr3.5.0 has a wrapped Tika API that can do language detection. Currently,
> the default behavior of Solr3.5.0 is, every time I index a document, and at
> mean time Solr call Tika API to give the result of language detection, i.e.
> index and detection happens at the same time. However, I hope I can have the
> language detection result first, and then I decide which core to put the
> document, i.e. detection happens before index.
>
> There seems that I need to do development in either of the following ways:
>
> 1. I might need to do revision of Solr itself, change the default behavior
> of Solr;
> 2. Or I might write a Java client outside Solr, call the client through
> server (JSP maybe) in index/query.
>
> Can anyone meeting with similar conditions give some suggestions about the
> advantages and disad of the two approaches? Any other alternatives? Thank
> you.
>
>
> Best
> Bing
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to