Hello, I am trying to write a Hadoop program that handles JSON and hence wrote a CustomInputFormat to handle the data. The Custom format extends the RecordReader and then overrides the nextKeyValue() method.
However, this doesn't solve the problem when one JSON object is split across two InputSplit. I was wondering if there is a way to change how to Input file is broken in to InputSplits so that I can control it and not let the JSON break between the splits. Any help will be much appreciated! Many thanks in advance! Warm regards Arko