Re: Unknown dataset URI issues in Sqoop hive import as parquet

Abraham Elmahrek Sun, 19 Jul 2015 12:32:56 -0700

Hey man,

Can you start a separate thread for this? I'd add details like:


   - version
   - command
   - --verbose output

-Abe

On Fri, Jul 17, 2015 at 7:35 AM, Anupam sinha <[email protected]> wrote:

> I have face similar issue like, sqoop not working in access node,
> get the error "SQLServer test failed (1)"
>
> do i need to change any setting
>
> On Thu, Jul 2, 2015 at 11:59 AM, Manikandan R <[email protected]>
> wrote:
>
>> Ok, thanks
>>
>> On Thu, Jul 2, 2015 at 11:38 AM, Abraham Elmahrek <[email protected]>
>> wrote:
>>
>>> I'd check with the impala user group! But I think 1.2.4 is an older
>>> version. Upgrading might make your headaches go away in general.
>>>
>>> -Abe
>>>
>>> On Wed, Jul 1, 2015 at 11:04 PM, Manikandan R <[email protected]>
>>> wrote:
>>>
>>>> Ok, Abe. I will try for that.
>>>>
>>>> Also, for the past 2 days, Impalad is getting crashed in 1 node
>>>> particularly. Because of this, oozie workflows are taking huge amount of
>>>> time to complete. Even, it is not getting completed after 24 hours. We used
>>>> to restart the daemon, it works fine for sometime. Again, it crashes. It
>>>> doesn't seem very stable.
>>>>
>>>> I've attached error report file. Please check.
>>>>
>>>> Thanks,
>>>> Mani
>>>>
>>>>
>>>> On Thu, Jul 2, 2015 at 11:11 AM, Abraham Elmahrek <[email protected]>
>>>> wrote:
>>>>
>>>>> Could you try upgrading Impala?
>>>>>
>>>>> -Abe
>>>>>
>>>>> On Wed, Jul 1, 2015 at 10:27 PM, Manikandan R <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Abe,
>>>>>>
>>>>>> Can you please update on this? Also let me know if you need any more
>>>>>> info.
>>>>>>
>>>>>> Thanks,
>>>>>> Mani
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 11:39 AM, Manikandan R <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Impala 1.2.4. We are using amazon emr cluster.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mani
>>>>>>>
>>>>>>> On Sun, Jun 28, 2015 at 11:37 PM, Abraham Elmahrek <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Oh that makes more sense. Seems like a format mismatch. You might
>>>>>>>> have to upgrade impala. Mind providing the version of Impala you're 
>>>>>>>> using?
>>>>>>>>
>>>>>>>> -Abe
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2015 at 12:52 AM, Manikandan R <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> actual errors are
>>>>>>>>>
>>>>>>>>> Query: select * from gwynniebee_bi.mi_test
>>>>>>>>> ERROR: AnalysisException: Failed to load metadata for table:
>>>>>>>>> gwynniebee_bi.mi_test
>>>>>>>>> CAUSED BY: TableLoadingException: Unrecognized table type for
>>>>>>>>> table: gwynniebee_bi.mi_test
>>>>>>>>>
>>>>>>>>> On Fri, Jun 26, 2015 at 1:21 PM, Manikandan R <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> It should be same as I have created many tables before in Hive
>>>>>>>>>> and used to read the same in Impala without any issues.
>>>>>>>>>>
>>>>>>>>>> I am running oozie based workflows in Production environment to
>>>>>>>>>> take the data from MySQL to HDFS (via sqoop hive imports) in raw 
>>>>>>>>>> format ->
>>>>>>>>>> Storing the same data again in Parquet format using Impala shell and 
>>>>>>>>>> on top
>>>>>>>>>> of it, reports are running using Impala queries. This is happening 
>>>>>>>>>> for few
>>>>>>>>>> weeks without any issues.
>>>>>>>>>>
>>>>>>>>>> Now, I am trying to see whether I can import the data from mySQL
>>>>>>>>>> to Impala (parquet) directly to avoid the Intermediate step.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 26, 2015 at 1:02 PM, Abraham Elmahrek <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Check your config. They should use the same metastore.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 26, 2015 at 12:26 AM, Manikandan R <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes, it works. I set HCAT_HOME as HIVE_HOME/hcatalog.
>>>>>>>>>>>>
>>>>>>>>>>>> I can able to read data from Hive, but not from Impala shell.
>>>>>>>>>>>> Any workaround?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mani
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 25, 2015 at 7:27 PM, Abraham Elmahrek <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Make sure HIVE_HOME and HCAT_HOME are set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the datetime/timestamp issue... this is because parquet
>>>>>>>>>>>>> doesn't support timestamp types yet. Avro schemas support them as 
>>>>>>>>>>>>> of 1.8.0
>>>>>>>>>>>>> apparently: https://issues.apache.org/jira/browse/AVRO-739.
>>>>>>>>>>>>> Try casting to a numeric or string value first?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Abe
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 25, 2015 at 6:49 AM, Manikandan R <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am running
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ./sqoop import --connect jdbc:mysql://
>>>>>>>>>>>>>> ups.db.gwynniebee.com/gwynniebee_bats --username root
>>>>>>>>>>>>>> --password gwynniebee --table bats_active --hive-import 
>>>>>>>>>>>>>> --hive-database
>>>>>>>>>>>>>> gwynniebee_bi --hive-table test_pq_bats_active --null-string 
>>>>>>>>>>>>>> '\\N'
>>>>>>>>>>>>>> --null-non-string '\\N' --as-parquetfile -m1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and getting the below exception. I come to know from various
>>>>>>>>>>>>>> sources that $HIVE_HOME has to be set properly to avoid these 
>>>>>>>>>>>>>> kind of
>>>>>>>>>>>>>> errors. In my case, corresponding home directory exists. But, 
>>>>>>>>>>>>>> still it is
>>>>>>>>>>>>>> throwing the below exception.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 15/06/25 13:24:19 WARN spi.Registration: Not loading URI
>>>>>>>>>>>>>> patterns in org.kitesdk.data.spi.hive.Loader
>>>>>>>>>>>>>> 15/06/25 13:24:19 ERROR sqoop.Sqoop: Got exception running
>>>>>>>>>>>>>> Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown 
>>>>>>>>>>>>>> dataset URI:
>>>>>>>>>>>>>> hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs for 
>>>>>>>>>>>>>> hive datasets
>>>>>>>>>>>>>> are on the classpath.
>>>>>>>>>>>>>> org.kitesdk.data.DatasetNotFoundException: Unknown dataset
>>>>>>>>>>>>>> URI: hive:/gwynniebee_bi/test_pq_bats_active. Check that JARs 
>>>>>>>>>>>>>> for hive
>>>>>>>>>>>>>> datasets are on the classpath.
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
>>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:228)
>>>>>>>>>>>>>> at org.kitesdk.data.Datasets.create(Datasets.java:307)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
>>>>>>>>>>>>>> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>>>>>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>>>>>>>>>>>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, I tried an alternative solution, creating an parquet file
>>>>>>>>>>>>>> first without any hive related options and creating an table 
>>>>>>>>>>>>>> referring to
>>>>>>>>>>>>>> the same location in Impala. It worked fine. But, it is throwing 
>>>>>>>>>>>>>> the below
>>>>>>>>>>>>>> issues ( I think it is because of date related columns).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ERROR: File hdfs://
>>>>>>>>>>>>>> 10.183.138.137:9000/data/gwynniebee_bi/test_pq_bats_active/a4a65639-ae38-417e-bbd9-56f4eb76c06b.parquet
>>>>>>>>>>>>>> has an incompatible type with the table schema for column 
>>>>>>>>>>>>>> create_date.
>>>>>>>>>>>>>> Expected type: BYTE_ARRAY.  Actual type: INT64
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, I tried table without datetime columns. It is working
>>>>>>>>>>>>>> fine in this case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using hive 0.13 and sqoop-1.4.6.bin__hadoop-2.0.4-alpha
>>>>>>>>>>>>>> bin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would prefer first approach for my requirements. Can anyone
>>>>>>>>>>>>>> please help me in this regard?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Mani
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Unknown dataset URI issues in Sqoop hive import as parquet

Reply via email to