Hello Everyone, We are using "-as-parquet" option to transfer the data from MySQL to HDFS in parquet format directly. It is working fine and we can able to perform the operations with that table in Impala, but not through hive shell.
Sqoop version is 1.4.6. We are using amazon emr cluster. Sqoop command is sqoop import --connect jdbc:mysql://test.hostname.com:3306/test --username root --password test --table employee --hive-import --hive-table employee --hive-database test --as-parquetfile --hive-overwrite -m 1 --fields-terminated-by '\001' --null-string '\\N' --null-non-string '\\N' In hive shell, we can able to read all records (for ex, select * from tbl_name), but not select count(1), analyze commands, etc. Please see the below exception: Caused by: java.lang.RuntimeException: hdfs:// 10.2.20.193:9000/user/hive/warehouse/dev_pavan_db.db/employee/.metadata/schemas/1.avsc is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [101, 34, 10, 125] at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:288) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:254) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:200) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65) ... 16 more Also, source table (employee) has int column (employee id). But hive shell showing it as NULL values, but impala shows correct values. After going through some articles, which suggested to change the .avsc filename and .metadata directory, it didn't helped me. Can you please take a look? Thanks, Mani
