Thanks Erick,  This is how I was doing it but when I saw the Solr Cell
stuff I figured I'd give it a go.  What I ended up doing is the following

ModifiableSolrParams params = indexer.index(artifact);

 params.add("fmap.content", "my_custom_field");

 params.add("extractFormat", "text");

 ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
"/update/extract");

 up.setParams(params);

 FileStream f = new FileStream(new File(""));

 up.addContentStream(f);


On Fri, Sep 6, 2013 at 9:54 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> It's always frustrating when someone replies with "Why not do it
> a completely different way?".  But I will anyway :).
>
> There's no requirement at all that you send things to Solr to make
> Solr Cel (aka Tika) do it's tricks. Since you're already in SolrJ
> anyway, why not just parse on the client? This has the advantage
> of allowing you to offload the Tika processing from Solr which can
> be quite expensive. You can use the same Tika jars that come
> with Solr or download whatever version from the Tika project
> you want. That way, you can exercise much better control over
> what's done.
>
> Here's a skeletal program with indexing from a DB mixed in, but
> it shouldn't be hard at all to pull the DB parts out.
>
> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
>
> FWIW,
> Erick
>
>
> On Thu, Sep 5, 2013 at 5:28 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>
> > Is it possible to configure solr cell to only extract and store the body
> of
> > a document when indexing?  I'm currently doing the following which I
> > thought would work
> >
> > ModifiableSolrParams params = new ModifiableSolrParams();
> >
> >  params.set("defaultField", "content");
> >
> >  params.set("xpath", "/xhtml:html/xhtml:body/descendant::node()");
> >
> >  ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
> > "/update/extract");
> >
> >  up.setParams(params);
> >
> >  FileStream f = new FileStream(new File(".."));
> >
> >  up.addContentStream(f);
> >
> > up.setAction(ACTION.COMMIT, true, true);
> >
> > solrServer.request(up);
> >
> >
> > But the result of content is as follows
> >
> > <arr name="content_mvtxt">
> > <str/>
> > <str>null</str>
> > <str>ISO-8859-1</str>
> > <str>text/plain; charset=ISO-8859-1</str>
> > <str>Just a little test</str>
> > </arr>
> >
> >
> > What I had hoped for was just
> >
> > <arr name="content_mvtxt">
> > <str>Just a little test</str>
> > </arr>
> >
>

Reply via email to