I am trying to create a pipeline that intakes PDF files, parses the data
using Tika and processes the data. A problem I have is that sometimes Tika
doesn't perfectly convert certain pieces of text correctly.
I can detect that this and would like to fork the output of my pipeline:
for correctly
Hi Mohil,
Thanks for the detailed report. I think most people are reduced capacity
right now. Filing a Jira would be helpful for tracking this.
Since I am writing, I will add a quick guess, but we should move to Jira.
It seems this has more to do with Dataflow than ElasticSearch. The default
for
I have facing issue related to side inputs and mentioned it in this link
https://stackoverflow.com/questions/60900937/side-input-of-size-around-50mb-causing-long-gc-pause
Any help would be appreciated
--
Regards,
*Kiran M Hurakadli.*