> Thank you I will follow Erick's steps > BTW I am also trying to ingesting using Flume , Flume uses Morphlines along > with Tika > Even Flume SolrSink will have the same issue?
Yes, when using Tika you run the risk of it choking on a document, eating CPU and/or RAM until everything dies. This is also true when you run it standalone. The problem is usually caused by PDF and Office documents that are unusual, corrupt or incomplete (e.g. truncated in size) or extremely large. But even ordinary HTML can get you into trouble due to extreme sizes or very deep nested elements. But, in general, it is not a problem you will experience frequently. We operate broad and large scale web crawlers, ingesting all kinds of bad stuff all the time. The trick to avoid problems is running each Tika parse in a separate thread, have a timer and kill the thread if it reaches a limit. It can still go wrong, but trouble is very rare. Running it standalone and talking to it over network is safest, but not very portable/easy distributable on Hadoop or other platforms.