Re: Stored By

Gabriel Balan Thu, 28 Jan 2016 15:27:52 -0800

Hi

Why not write your own storage handler extending AccumuloStorageHandler and 
overriding getInputFormatClass() to return your  HiveAccumuloTableInputFormat 
subclass.


hth
Gabriel Balan

On 1/21/2016 10:46 AM, peter.mar...@baesystems.com wrote:


Hi,

So I am using the AccumuloStorageHandler to allow me to access Accumulo tables 
from Hive.

This works fine. So typically I would use something like this:

CREATE EXTERNAL TABLE test_text (rowid STRING, testint INT, testbig BIGINT, 
testfloat FLOAT, testdouble DOUBLE, teststring STRING, testbool BOOLEAN)

STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'

WITH 
SERDEPROPERTIES('accumulo.table.name'='test_table_text','accumulo.columns.mapping'
 = 
':rowid,testint:v,testbig:v,testfloat:v,testdouble:v,teststring:v,testbool:v');

Now for many reasons I am planning to have my own InputFormat.

I don’t want to start from scratch so I plan to have my class derive from the 
existing class HiveAccumuloTableInputFormat and pick up a lot of functionality 
for free.

Now it was my understanding that “STORED BY” was a sort of optimization that 
saved the user having to specify the input format and output format and so on 
explicitly.

Given that I want, eventually, to use my own input format class in the 
short-term I just want to ensure that I can create a Hive table that uses 
Accumulo but specifying the inputformat explicitly.

I’ve looked at the source of AccumuloStorageHandler and I can see what 
inputformat and outputformat it returns.

So my best guess at creating the same table as above, but without using “STORED 
BY” is as follows:

CREATE EXTERNAL TABLE test_text2 (rowid STRING, testint INT, testbig BIGINT, 
testfloat FLOAT, testdouble DOUBLE, teststring STRING, testbool BOOLEAN)

ROW FORMAT SERDE 'org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe'

WITH 
SERDEPROPERTIES('accumulo.table.name'='test_table_text','accumulo.columns.mapping'
 = 
':rowid,testint:v,testbig:v,testfloat:v,testdouble:v,teststring:v,testbool:v')

STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat';

This fails with:

FAILED: SemanticException [Error 10055]: Output Format must implement 
HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or 
SequenceFileOutputFormat

Which seems plausible, because 
'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat' really 
doesn’t seem to implement  HiveOutputFormat.

However this begs the question, how can the storage handler get away with it if 
I can’t?

So, before I go off and implement my own storage handler class as well as my 
own inputformat class, can anyone tell me if I am doing something silly

or is there some other way around this problem?

Regards,

Z

Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.


--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.

Re: Stored By

Reply via email to