Raymond - you were the closest. Parquet field names contained '::' ex. bag1::user_name
Hope it will help anyone in the future Thanks for all your help Tor On Sun, Aug 17, 2014 at 7:50 PM, Raymond Lau <raymond.lau...@gmail.com> wrote: > Do your field names in your parquet files contain upper case letters by > any chance ex. userName? Hive will not read the data of external tables if > they are not completely lower case field names, it doesn't convert them > properly in the case of external tables. > On Aug 17, 2014 8:00 AM, "hadoop hive" <hadooph...@gmail.com> wrote: > >> Take a small set of data like 2-5 line and insert it... >> >> After that you can try insert first 10 column and then next 10 till you >> fund your problematic column >> On Aug 17, 2014 8:37 PM, "Tor Ivry" <tork...@gmail.com> wrote: >> >>> Is there any way to debug this? >>> >>> We are talking about many fields here. >>> How can I see which field has the mismatch? >>> >>> >>> >>> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive <hadooph...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> You check the data type you have provided while creating external >>>> table, it should match with data in files. >>>> >>>> Thanks >>>> Vikas Srivastava >>>> On Aug 17, 2014 7:07 PM, "Tor Ivry" <tork...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> >>>>> >>>>> I have a hive (0.11) table with the following create syntax: >>>>> >>>>> >>>>> >>>>> CREATE EXTERNAL TABLE events( >>>>> >>>>> … >>>>> >>>>> ) >>>>> >>>>> PARTITIONED BY(dt string) >>>>> >>>>> ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' >>>>> >>>>> STORED AS >>>>> >>>>> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" >>>>> >>>>> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" >>>>> >>>>> LOCATION '/data-events/success’; >>>>> >>>>> >>>>> >>>>> Query runs fine. >>>>> >>>>> >>>>> I add hdfs partitions (containing snappy.parquet files). >>>>> >>>>> >>>>> >>>>> When I run >>>>> >>>>> hive >>>>> >>>>> > select count(*) from events where dt=“20140815” >>>>> >>>>> I get the correct result >>>>> >>>>> >>>>> >>>>> *Problem:* >>>>> >>>>> When I run >>>>> >>>>> hive >>>>> >>>>> > select * from events where dt=“20140815” limit 1; >>>>> >>>>> I get >>>>> >>>>> OK >>>>> >>>>> NULL NULL NULL NULL NULL NULL NULL 20140815 >>>>> >>>>> >>>>> >>>>> *The same query in Impala returns the correct values.* >>>>> >>>>> >>>>> >>>>> Any idea what could be the issue? >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> Tor >>>>> >>>> >>>