Hi Michael,

Statistics for columns in Hive are kept in Hive metadata table
tab_col_stats.

When I am looking at this table in Oracle, I only see statistics for
primitives columns here. STRUCT columns do not have it as a STRUCT column
will have to be broken into its primitive columns.  I don't think Hive has
the means to do that.

desc tab_col_stats;
 Name
Null?    Type
 ------------------------------------------------------------------------
-------- -------------------------------------------------
 CS_ID
NOT NULL NUMBER
 DB_NAME
NOT NULL VARCHAR2(128)
 TABLE_NAME
NOT NULL VARCHAR2(128)
 COLUMN_NAME
NOT NULL VARCHAR2(1000)
 COLUMN_TYPE
NOT NULL VARCHAR2(128)
 TBL_ID
NOT NULL NUMBER
 LONG_LOW_VALUE
NUMBER
 LONG_HIGH_VALUE
NUMBER
 DOUBLE_LOW_VALUE
NUMBER
 DOUBLE_HIGH_VALUE
NUMBER
 BIG_DECIMAL_LOW_VALUE
VARCHAR2(4000)
 BIG_DECIMAL_HIGH_VALUE
VARCHAR2(4000)
 NUM_NULLS
NOT NULL NUMBER
 NUM_DISTINCTS
NUMBER
 AVG_COL_LEN
NUMBER
 MAX_COL_LEN
NUMBER
 NUM_TRUES
NUMBER
 NUM_FALSES
NUMBER
 LAST_ANALYZED
NOT NULL NUMBER



 So in summary although column type STRUCT do exit, I don't think Hive can
cater for their statistics. Actually I don't think Oracle itself does it.

HTH

P.S. I am on Hive 2 and it does not.

hive> analyze table foo compute statistics for columns;
FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
but array<bigint> is passed.


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 June 2016 at 09:57, Michael Häusler <mich...@akatose.de> wrote:

> Hi there,
>
> you can reproduce the messages below with Hive 1.2.1.
>
> Best regards
> Michael
>
>
> On 2016-06-13, at 22:21, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> which version of Hive are you using?
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 13 June 2016 at 16:00, Michael Häusler <mich...@akatose.de> wrote:
>
>> Hi there,
>>
>>
>> when testing column statistics I stumbled upon the following error
>> message:
>>
>> DROP TABLE IF EXISTS foo;
>> CREATE TABLE foo (foo BIGINT, bar ARRAY<BIGINT>, foobar
>> STRUCT<key:STRING,value:STRING>);
>>
>> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS;
>> FAILED: UDFArgumentTypeException Only primitive type arguments are
>> accepted but array<bigint> is passed.
>>
>> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS foobar, bar;
>> FAILED: UDFArgumentTypeException Only primitive type arguments are
>> accepted but struct<key:string,value:string> is passed.
>>
>>
>> 1) Basically, it seems that column statistics don't work for
>> non-primitive types. Are there any workarounds or any plans to change this?
>>
>> 2) Furthermore, the convenience syntax to compute statistics for all
>> columns does not work as soon as there is a non-supported column. Are there
>> any plans to change this, so it is easier to compute statistics for all
>> supported columns?
>>
>> 3) ANALYZE TABLE will only provide the first failing *type* in the error
>> message. Especially for wide tables it would be much easier if all
>> non-supported column *names* would be printed.
>>
>>
>> Best regards
>> Michael
>>
>>
>
>

Reply via email to