Hi Michael, Statistics for columns in Hive are kept in Hive metadata table tab_col_stats.
When I am looking at this table in Oracle, I only see statistics for primitives columns here. STRUCT columns do not have it as a STRUCT column will have to be broken into its primitive columns. I don't think Hive has the means to do that. desc tab_col_stats; Name Null? Type ------------------------------------------------------------------------ -------- ------------------------------------------------- CS_ID NOT NULL NUMBER DB_NAME NOT NULL VARCHAR2(128) TABLE_NAME NOT NULL VARCHAR2(128) COLUMN_NAME NOT NULL VARCHAR2(1000) COLUMN_TYPE NOT NULL VARCHAR2(128) TBL_ID NOT NULL NUMBER LONG_LOW_VALUE NUMBER LONG_HIGH_VALUE NUMBER DOUBLE_LOW_VALUE NUMBER DOUBLE_HIGH_VALUE NUMBER BIG_DECIMAL_LOW_VALUE VARCHAR2(4000) BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000) NUM_NULLS NOT NULL NUMBER NUM_DISTINCTS NUMBER AVG_COL_LEN NUMBER MAX_COL_LEN NUMBER NUM_TRUES NUMBER NUM_FALSES NUMBER LAST_ANALYZED NOT NULL NUMBER So in summary although column type STRUCT do exit, I don't think Hive can cater for their statistics. Actually I don't think Oracle itself does it. HTH P.S. I am on Hive 2 and it does not. hive> analyze table foo compute statistics for columns; FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array<bigint> is passed. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 14 June 2016 at 09:57, Michael Häusler <mich...@akatose.de> wrote: > Hi there, > > you can reproduce the messages below with Hive 1.2.1. > > Best regards > Michael > > > On 2016-06-13, at 22:21, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > which version of Hive are you using? > > Dr Mich Talebzadeh > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > http://talebzadehmich.wordpress.com > > > > On 13 June 2016 at 16:00, Michael Häusler <mich...@akatose.de> wrote: > >> Hi there, >> >> >> when testing column statistics I stumbled upon the following error >> message: >> >> DROP TABLE IF EXISTS foo; >> CREATE TABLE foo (foo BIGINT, bar ARRAY<BIGINT>, foobar >> STRUCT<key:STRING,value:STRING>); >> >> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS; >> FAILED: UDFArgumentTypeException Only primitive type arguments are >> accepted but array<bigint> is passed. >> >> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS foobar, bar; >> FAILED: UDFArgumentTypeException Only primitive type arguments are >> accepted but struct<key:string,value:string> is passed. >> >> >> 1) Basically, it seems that column statistics don't work for >> non-primitive types. Are there any workarounds or any plans to change this? >> >> 2) Furthermore, the convenience syntax to compute statistics for all >> columns does not work as soon as there is a non-supported column. Are there >> any plans to change this, so it is easier to compute statistics for all >> supported columns? >> >> 3) ANALYZE TABLE will only provide the first failing *type* in the error >> message. Especially for wide tables it would be much easier if all >> non-supported column *names* would be printed. >> >> >> Best regards >> Michael >> >> > >