regarding:

> 3. To read the HCat records....
> 
> It depends on how you' like to read the records  ... will you be reading ALL 
> the records remotely from the client app  
> or you will get input splits and read the records on mappers....???
> 
> The code will be different (somewhat)... let me know...

in this case I’d be reading all of the records remotely from the client app

TIA
Brian

On Jun 13, 2014, at 9:51 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote:

> I am not sure about java docs... ;-]
> I have spent the last three years integrating with HCat and to make it work 
> had to go thru the code...
> 
> So here are some samples that can be helpful to start with. If you are using 
> Hive 0.12.0 I would not bother with the new APIs... I had to create some shim 
> classes for HCat to make my code version independent but I cannot share that. 
> 
> So 
> 
> 1. To enumerate tables ... just use Hive client ... this seems to be version 
> independent 
> 
>    hiveMetastoreClient = new HiveMetaStoreClient(conf);
> 
> // the conf should contain the "hive.metastore.uris" property that point to 
> your Hive Metastore thrift server
>    List<String> databases = hiveMetastoreClient.getAllDatabases();
> // this will get you all the databases
>    List<String> tables = hiveMetastoreClient.getAllTables(database);
> // this will get you all the tables for the give data base
> 
> 2. To get the table schema... I assume that you are after HCat schema  
> 
> 
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.mapreduce.InputSplit;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
> import org.apache.hcatalog.data.schema.HCatSchemaUtils;
> import org.apache.hcatalog.mapreduce.HCatInputFormat;
> import org.apache.hcatalog.mapreduce.HCatSplit;
> import org.apache.hcatalog.mapreduce.InputJobInfo;
> 
> 
>   Job job = new Job(config);
>   job.setJarByClass(XXXXXX.class); // this will be your class 
> job.setInputFormatClass(HCatInputFormat.class);
> job.setOutputFormatClass(TextOutputFormat.class);
>   InputJobInfo inputJobInfo = InputJobInfo.create("my_data_base", "my_table", 
> "partition filter");
> HCatInputFormat.setInput(job, inputJobInfo);
> HCatSchema s =  HCatInputFormat.getTableSchema(job);
> 
> 
> 3. To read the HCat records....
> 
> It depends on how you' like to read the records  ... will you be reading ALL 
> the records remotely from the client app  
> or you will get input splits and read the records on mappers....???
> 
> The code will be different (somewhat)... let me know...
> 
> 
> 
> 
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> On Fri, Jun 13, 2014 at 8:25 AM, Brian Jeltema 
> <brian.jelt...@digitalenvoy.net> wrote:
> Version 0.12.0.
> 
> I’d like to obtain the table’s schema, scan a table partition, and use the 
> schema to parse the rows.
> 
> I can probably figure this out by looking at the HCatalog source. My concern 
> was that
> the HCatalog packages in the Hive distributions are excluded in the JavaDoc, 
> which implies
> that the API is not public. Is there a reason for this?
> 
> Brian
> 
> On Jun 13, 2014, at 9:10 AM, Dmitry Vasilenko <dvasi...@gmail.com> wrote:
> 
>> You should be able to access this information. The exact API depends on the 
>> version of Hive/HCat. As you know earlier HCat API is being deprecated and 
>> will be removed in Hive 0.14.0. I can provide you with the code sample if 
>> you tell me what you are trying to do and what version of Hive you are 
>> using. 
>> 
>> 
>> On Fri, Jun 13, 2014 at 7:33 AM, Brian Jeltema 
>> <brian.jelt...@digitalenvoy.net> wrote:
>> I’m experimenting with HCatalog, and would like to be able to access tables 
>> and their schema
>> from a Java application (not Hive/Pig/MapReduce). However, the API seems to 
>> be hidden, which
>> leads leads me to believe that this is not a supported use case. Is HCatalog 
>> use limited to
>> one of the supported frameworks?
>> 
>> TIA
>> 
>> Brian
>> 
> 
> 

Reply via email to