Sqoop Direct Parquet table issues in Hive

Manikandan R Mon, 21 Sep 2015 23:56:11 -0700

Hello Everyone,

We are using "-as-parquet" option to transfer the data from MySQL to HDFS
in parquet format directly. It is working fine and we can able to perform
the operations with that table in Impala, but not through hive shell.


Sqoop version is 1.4.6. We are using amazon emr cluster.

Sqoop command is

sqoop  import --connect jdbc:mysql://test.hostname.com:3306/test --username
root --password test --table  employee --hive-import   --hive-table
 employee  --hive-database  test --as-parquetfile --hive-overwrite -m 1
--fields-terminated-by  '\001'  --null-string  '\\N'  --null-non-string
 '\\N'

In hive shell, we can able to read all records (for ex, select * from
tbl_name), but not select count(1), analyze commands, etc.

Please see the below exception:

Caused by: java.lang.RuntimeException: hdfs://
10.2.20.193:9000/user/hive/warehouse/dev_pavan_db.db/employee/.metadata/schemas/1.avsc
is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but
found [101, 34, 10, 125]
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:288)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:254)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:200)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:79)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more

Also, source table (employee) has int column (employee id). But hive shell
showing it as NULL values, but impala shows correct values.

After going through some articles, which suggested to change the .avsc
filename and .metadata directory, it didn't helped me.

Can you please take a look?

Thanks,
Mani

Sqoop Direct Parquet table issues in Hive

Reply via email to