Hi Juan,

I'd try to reproduce as much of the pipeline as possible using a solr
output connection.  If you include the tika extractor in the pipeline, you
will want to configure the solr connection to not use the extracting update
handler.  There's a checkbox on the Schema tab you need to uncheck for
that.  But if you do that you can see what is being sent to Solr pretty
exactly; it all gets logged in the INFO messages dumped to solr log.  This
should help you figure out if the problem is your tika configuration or not.

Please give this a try and let me know what happens.

Karl


On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <[email protected]>
wrote:

> Hi,
>
> I've successfully sent data to FileSystems and SOLR, but for Amazon
> CloudSearch I'm seeing that only empty messages are being sent to my
> domain. I think this may be an issue on how I've setup the TIKA Extractor
> Transformation or the field mapping. I think the Database where the records
> are supposed to be stored before flushing to Amazon, is storing empty
> content.
>
> I've tried to find documentation on how to setup the TIKA Transformation,
> but I haven't been able to find any.
>
> If someone could provide an example of a job setup to send from a
> FileSystem to CloudSearch, that'd be great!
>
> Thanks in advance,
>
> --
> Juan Pablo Diaz-Vaz Varas,
> Full Stack Developer - MC+A Chile
> +56 9 84265890
>

Reply via email to