Impala 1.2.4. We are using amazon emr cluster. Thanks, Mani
On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <[email protected]> wrote: > Oh that makes more sense. Seems like a format mismatch. You might have to > upgrade impala. Mind providing the version of Impala you're using? > > -Abe > > On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <[email protected]> > wrote: > >> actual errors are >> >> Query: select * from gwynniebee_bi.mi_test >> ERROR: AnalysisException: Failed to load metadata for table: >> gwynniebee_bi.mi_test >> CAUSED BY: TableLoadingException: Unrecognized table type for table: >> gwynniebee_bi.mi_test >> >> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <[email protected]> >> wrote: >> >>> It should be same as I have created many tables before in Hive and used >>> to read the same in Impala without any issues. >>> >>> I am running oozie based workflows in Production environment to take the >>> data from MySQL to HDFS (via sqoop hive imports) in raw format -> Storing >>> the same data again in Parquet format using Impala shell and on top of it, >>> reports are running using Impala queries. This is happening for few weeks >>> without any issues. >>> >>> Now, I am trying to see whether I can import the data from mySQL to >>> Impala (parquet) directly to avoid the Intermediate step. >>> >>> >>> >>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <[email protected]> >>> wrote: >>> >>>> Check your config. They should use the same metastore. >>>> >>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <[email protected]> >>>> wrote: >>>> >>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog. >>>>> >>>>> I can able to read data from Hive, but not from Impala shell. Any >>>>> workaround? >>>>> >>>>> Thanks, >>>>> Mani >>>>> >>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <[email protected]> >>>>> wrote: >>>>> >>>>>> Make sure HIVE_HOME and HCAT_HOME are set. >>>>>> >>>>>> For the datetime/timestamp issue... this is because parquet doesn't >>>>>> support timestamp types yet. Avro schemas support them as of 1.8.0 >>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try >>>>>> casting to a numeric or string value first? >>>>>> >>>>>> -Abe >>>>>> >>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am running >>>>>>> >>>>>>> ./sqoop import --connect jdbc:mysql:// >>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root --password >>>>>>> gwynniebee --table bats_active --hive-import --hive-database >>>>>>> gwynniebee_bi >>>>>>> --hive-table test_pq_bats_active --null-string '\\N' --null-non-string >>>>>>> '\\N' --as-parquetfile -m1 >>>>>>> >>>>>>> and getting the below exception. I come to know from various sources >>>>>>> that $HIVE_HOME has to be set properly to avoid these kind of errors. >>>>>>> In my >>>>>>> case, corresponding home directory exists. But, still it is throwing the >>>>>>> below exception. >>>>>>> >>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI patterns in >>>>>>> org.kitesdk.data.spi.hive.Loader >>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running Sqoop: >>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: >>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive >>>>>>> datasets >>>>>>> are on the classpath. >>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: >>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive >>>>>>> datasets >>>>>>> are on the classpath. >>>>>>> at >>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109) >>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228) >>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307) >>>>>>> at >>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107) >>>>>>> at >>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89) >>>>>>> at >>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) >>>>>>> at >>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) >>>>>>> at >>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) >>>>>>> at >>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) >>>>>>> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) >>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) >>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143) >>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) >>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) >>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) >>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236) >>>>>>> >>>>>>> So, I tried an alternative solution, creating an parquet file first >>>>>>> without any hive related options and creating an table referring to the >>>>>>> same location in Impala. It worked fine. But, it is throwing the below >>>>>>> issues ( I think it is because of date related columns). >>>>>>> >>>>>>> ERROR: File hdfs:// >>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet >>>>>>> has an incompatible type with the table schema for column create_date. >>>>>>> Expected type: BYTE_ARRAY. Actual type: INT64 >>>>>>> >>>>>>> Then, I tried table without datetime columns. It is working fine in >>>>>>> this case. >>>>>>> >>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha bin. >>>>>>> >>>>>>> I would prefer first approach for my requirements. Can anyone please >>>>>>> help me in this regard? >>>>>>> >>>>>>> Thanks, >>>>>>> Mani >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
