Hi Juan, I'd try to reproduce as much of the pipeline as possible using a solr output connection. If you include the tika extractor in the pipeline, you will want to configure the solr connection to not use the extracting update handler. There's a checkbox on the Schema tab you need to uncheck for that. But if you do that you can see what is being sent to Solr pretty exactly; it all gets logged in the INFO messages dumped to solr log. This should help you figure out if the problem is your tika configuration or not.
Please give this a try and let me know what happens. Karl On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <[email protected]> wrote: > Hi, > > I've successfully sent data to FileSystems and SOLR, but for Amazon > CloudSearch I'm seeing that only empty messages are being sent to my > domain. I think this may be an issue on how I've setup the TIKA Extractor > Transformation or the field mapping. I think the Database where the records > are supposed to be stored before flushing to Amazon, is storing empty > content. > > I've tried to find documentation on how to setup the TIKA Transformation, > but I haven't been able to find any. > > If someone could provide an example of a job setup to send from a > FileSystem to CloudSearch, that'd be great! > > Thanks in advance, > > -- > Juan Pablo Diaz-Vaz Varas, > Full Stack Developer - MC+A Chile > +56 9 84265890 >
