Hi,

  My company has been working on a Yarn application for a couple of years-- we 
essentially take the place of MapReduce and split our data and processing 
ourselves.

  One of the things we've been working to support is Hive access, and the 
HCatalog interfaces and API seemed perfect. Using this information: 
<https://hive.apache.org/javadocs/hcat-r0.5.0/readerwriter.html> 
https://hive.apache.org/javadocs/hcat-r0.5.0/readerwriter.html and 
TestReaderWriter.java from the source code, I was able to create and use 
HCatSplits to allow balanced data local parallel reading (using the size and 
locations methods available from each HCatSplit).

  Much to my dismay, 0.13 removes a lot of that functionality. The 
ReaderContext class is now an interface that only exposes numSplits, whereas 
all of the other methods are in the inaccessible (package only) 
ReaderContextImpl class.

  Since I no longer have access to the actual HCatSplits from the 
ReaderContext, I am unable to process them and send them to our yarn app on the 
data local nodes.  My only choice seems to be to partition out the splits to 
slave nodes more or less at random.

  Does anyone know if, as of 0.13, this is the intended way to interface with 
Hive via non-Hadoop yarn applications? Is the underlying HCatSplit only 
intended for internal use, now?


Thanks,


Nathan Bamford

Reply via email to