Re: Unknown dataset URI issues in Sqoop hive import as parquet

Manikandan R Wed, 01 Jul 2015 23:30:30 -0700

Ok, thanks

On Thu, Jul 2, 2015 at 11:38 AM, Abraham Elmahrek <[email protected]> wrote:


> I'd check with the impala user group! But I think 1.2.4 is an older
> version. Upgrading might make your headaches go away in general.
>
> -Abe
>
> On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <[email protected]>
> wrote:
>
>> Ok, Abe. I will try for that.
>>
>> Also, for the past 2 days, Impalad is getting crashed in 1 node
>> particularly. Because of this, oozie workflows are taking huge amount of
>> time to complete. Even, it is not getting completed after 24 hours. We used
>> to restart the daemon, it works fine for sometime. Again, it crashes. It
>> doesn't seem very stable.
>>
>> I've attached error report file. Please check.
>>
>> Thanks,
>> Mani
>>
>>
>> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <[email protected]>
>> wrote:
>>
>>> Could you try upgrading Impala?
>>>
>>> -Abe
>>>
>>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <[email protected]>
>>> wrote:
>>>
>>>> Hello Abe,
>>>>
>>>> Can you please update on this? Also let me know if you need any more
>>>> info.
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <[email protected]>
>>>> wrote:
>>>>
>>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>>
>>>>> Thanks,
>>>>> Mani
>>>>>
>>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Oh that makes more sense. Seems like a format mismatch. You might
>>>>>> have to upgrade impala. Mind providing the version of Impala you're 
>>>>>> using?
>>>>>>
>>>>>> -Abe
>>>>>>
>>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> actual errors are
>>>>>>>
>>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>>> gwynniebee_bi.mi_test
>>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for table:
>>>>>>> gwynniebee_bi.mi_test
>>>>>>>
>>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It should be same as I have created many tables before in Hive and
>>>>>>>> used to read the same in Impala without any issues.
>>>>>>>>
>>>>>>>> I am running oozie based workflows in Production environment to
>>>>>>>> take the data from MySQL to HDFS (via sqoop hive imports) in raw 
>>>>>>>> format ->
>>>>>>>> Storing the same data again in Parquet format using Impala shell and 
>>>>>>>> on top
>>>>>>>> of it, reports are running using Impala queries. This is happening for 
>>>>>>>> few
>>>>>>>> weeks without any issues.
>>>>>>>>
>>>>>>>> Now, I am trying to see whether I can import the data from mySQL to
>>>>>>>> Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>>
>>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>>
>>>>>>>>>> I can able to read data from Hive, but not from Impala shell. Any
>>>>>>>>>> workaround?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mani
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>>
>>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as 
>>>>>>>>>>> of 1.8.0
>>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739. Try
>>>>>>>>>>> casting to a numeric or string value first?
>>>>>>>>>>>
>>>>>>>>>>> -Abe
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am running
>>>>>>>>>>>>
>>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import 
>>>>>>>>>>>> --hive-database
>>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string '\\N'
>>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>>
>>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these kind 
>>>>>>>>>>>> of
>>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, 
>>>>>>>>>>>> still it is
>>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>>
>>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset 
>>>>>>>>>>>> URI:
>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive 
>>>>>>>>>>>> datasets
>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI:
>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for hive 
>>>>>>>>>>>> datasets
>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>> at
>>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>>
>>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>>> first without any hive related options and creating an table 
>>>>>>>>>>>> referring to
>>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing 
>>>>>>>>>>>> the below
>>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>>
>>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>>> has an incompatible type with the table schema for column 
>>>>>>>>>>>> create_date.
>>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>>
>>>>>>>>>>>> Then, I tried table without datetime columns. It is working
>>>>>>>>>>>> fine in this case.
>>>>>>>>>>>>
>>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>>> bin.
>>>>>>>>>>>>
>>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mani
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Reply via email to