Re: HCatalog access from a Java app

Dmitry Vasilenko Mon, 23 Jun 2014 06:14:34 -0700

Missed the last line for the # 2

HCatSchema schema = HCatInputFormat.getTableSchema(job.getConfiguration())



On Mon, Jun 23, 2014 at 8:11 AM, Dmitry Vasilenko <dvasi...@gmail.com>
wrote:

> Hi Brian,
>
> 1. To enumerate databases and tables and to get the Hive table schema you
> can use the code I provided earlier.
>
> 2. To get the HCatalog flavor of the table schema you will use something
> like this:
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.mapreduce.InputSplit;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
> import org.apache.hive.hcatalog.data.schema.HCatSchemaUtils;
> import org.apache.hive.hcatalog.mapreduce.HCatInputFormat;
> import org.apache.hive.hcatalog.mapreduce.HCatSplit;
>
> ...................
>
>   Job job = new Job(configuration);
>   job.setJarByClass(XXXX.class); // you class name
> job.setInputFormatClass(HCatInputFormat.class);
> job.setOutputFormatClass(TextOutputFormat.class);
>   HCatInputFormat.setInput(job, "data_base_name", "table_name",
> "partition_filter");
>
>
> 3. To get input splits you will have to do some gymnastics as the team
> IMHO made an unfortunate decision to change the ReaderContext  in non cross
> version compatible way. The new ReaderContext does not provide you with the
> convenient way to get  input splits so you will have to use HCatInputFormat
> directly:
>
>   ReadEntity.Builder builder = ....
>    ReadEntity entity = builder.build();
>   Job job = new Job(configuration);
> HCatInputFormat inputFormat = HCatInputFormat.setInput(job,
> entity.getDbName(), entity.getTableName(), entity.getFilterString());
>   List<InputSplit> splits =
> inputFormat.getSplits(ShimLoader.getHadoopShims().newJobContext(job));
>
>
>
> 4. To get an iterator over records more effort is needed. Here is the code
> that should do the work for the given input split and Hadoop configuration
> (returned from the Hadoop job)
>
>
> Iterator<HCatRecord> read() throws HCatException {
> HCatInputFormat inputFormat = new HCatInputFormat();
> @SuppressWarnings("rawtypes")
> final RecordReader<WritableComparable, HCatRecord> recordReader;
> try {
> TaskAttemptContext context =
> ShimLoader.getHadoopShims().newTaskAttemptContext(this.configuration, null);
> recordReader = inputFormat.createRecordReader(this.inputSplit, context);
> recordReader.initialize(this.inputSplit, context);
> } catch (Exception e) {
> throw new HCatException(ErrorType.ERROR_NOT_INITIALIZED, e);
> }
> return new Iterator<HCatRecord>() {
> public boolean hasNext() {
> try {
> boolean retVal = recordReader.nextKeyValue();
> if (retVal) {
> return true;
> }
> // if its false, we need to close recordReader.
> recordReader.close();
> return false;
> } catch (Exception e) {
> throw new RuntimeException(e);
> }
> }
>
> @Override
> public HCatRecord next() {
> try {
> return recordReader.getCurrentValue();
> } catch (Exception e) {
> throw new RuntimeException(e);
> }
> }
>
> @Override
> public void remove() {
> throw new UnsupportedOperationException();
> }
> };
> }
>
> Note also that when you create a Hadoop configuration you will have to
> disable client cache:
>
> configuration.set(HCatConstants.HCAT_HIVE_CLIENT_DISABLE_CACHE,
> Boolean.TRUE.toString());
>
> otherwise the client process will not terminate (HIVE 6268).
>
> Hope this helps.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Jun 21, 2014 at 9:11 AM, Brian Jeltema <
> brian.jelt...@digitalenvoy.net> wrote:
>
>> I’m also experimenting with version 0.13, and see that it differs from
>> 0.12 significantly.
>> Can you give me a code example for 0.13?
>>
>> Thanks
>> Brian
>>
>> On Jun 13, 2014, at 9:25 AM, Brian Jeltema <
>> brian.jelt...@digitalenvoy.net> wrote:
>>
>> Version 0.12.0.
>>
>> I’d like to obtain the table’s schema, scan a table partition, and use
>> the schema to parse the rows.
>>
>> I can probably figure this out by looking at the HCatalog source. My
>> concern was that
>> the HCatalog packages in the Hive distributions are excluded in the
>> JavaDoc, which implies
>> that the API is not public. Is there a reason for this?
>>
>> Brian
>>
>> On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote:
>>
>> You should be able to access this information. The exact API depends on
>> the version of Hive/HCat. As you know earlier HCat API is being deprecated
>> and will be removed in Hive 0.14.0. I can provide you with the code sample
>> if you tell me what you are trying to do and what version of Hive you are
>> using.
>>
>>
>> On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema <
>> brian.jelt...@digitalenvoy.net> wrote:
>>
>>> I’m experimenting with HCatalog, and would like to be able to access
>>> tables and their schema
>>> from a Java application (not Hive/Pig/MapReduce). However, the API seems
>>> to be hidden, which
>>> leads leads me to believe that this is not a supported use case. Is
>>> HCatalog use limited to
>>> one of the supported frameworks?
>>>
>>> TIA
>>>
>>> Brian
>>
>>
>>
>>
>>
>

Re: HCatalog access from a Java app

Reply via email to