Re: Sqoop export from Hive table stored as Parquet

Markus Kemper Tue, 25 Oct 2016 11:05:48 -0700

Glad to hear it.  Let us know how it goes


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <[email protected]>
wrote:

> Hi Markus,
>
> Thank you!
>
> I tried this (hcatalog options) myself a few minutes after I hit "send" on
> the e-mail. It worked fine, Sqoop was able to read the Parquet structure.
> Just my MR crashed but it was due to my unstable environment.
>
> It looks like I'm on the way to the solution.
>
> Cheers,
>
> Douglas
>
> On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <[email protected]>
> wrote:
>
>> Hello Douglas,
>>
>> The only workaround that I am aware of is to use the Sqoop --hcatalog
>> options, for example:
>>
>> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
>> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <[email protected]
>> > wrote:
>>
>>> Hello everyone,
>>>
>>> I saw in the past few months quite a few messages about Parquet support
>>> on Sqoop, all about importing. Some of them worked well.
>>>
>>> But for exporting I'm receiving this error when trying to export from a
>>> Hive table stored as Parquet to Postgresql:
>>>
>>> [cloudera@quickstart ~]$ sqoop export --connect
>>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>>> /user/hive/warehouse/teste1
>>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports
>>> will fail.
>>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of
>>> 1000
>>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>>> /usr/lib/hadoop-mapreduce
>>> Note: 
>>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>>> uses or overrides a deprecated API.
>>> Note: Recompile with -Xlint:deprecation for details.
>>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
>>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>>> deprecated. Instead, use mapreduce.job.jar
>>> 16/10/25 09:19:12 INFO Configuration.deprecation:
>>> mapred.map.max.attempts is deprecated. Instead, use
>>> mapreduce.map.maxattempts
>>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>>> SELECT t.* FROM "test1" AS t LIMIT 1
>>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>> metadata
>>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>>> metadata
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>>> heckExists(FileSystemMetadataProvider.java:562)
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>>> ind(FileSystemMetadataProvider.java:605)
>>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>>> oad(FileSystemMetadataProvider.java:114)
>>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>>> load(FileSystemDatasetRepository.java:197)
>>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>> er.readFrom(DatasetKeyInputFormat.java:92)
>>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>>> er.readFrom(DatasetKeyInputFormat.java:139)
>>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>>> t(JdbcExportJob.java:84)
>>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>>> Base.java:432)
>>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>>
>>> I saw a recent JIRA opened about this, https://issues.apache.or
>>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>>> for this?
>>>
>>> Thanks in advance,
>>>
>>> Douglas
>>>
>>> --
>>> Visite: http://canseidesercowboy.wordpress.com/
>>> Siga: @dougspadotto ou @excowboys
>>> -----
>>> Frodo: "I wish none of this had happened."
>>> Gandalf: "So do all who live to see such times, but that is not for them
>>> to decide. All we have to decide is what to do with the time that is given
>>> to us."
>>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>>
>>
>>
>
>
> --
> Visite: http://canseidesercowboy.wordpress.com/
> Siga: @dougspadotto ou @excowboys
> -----
> Frodo: "I wish none of this had happened."
> Gandalf: "So do all who live to see such times, but that is not for them
> to decide. All we have to decide is what to do with the time that is given
> to us."
> -- Lord of the Rings: The Fellowship of the Ring (2001)
>

Re: Sqoop export from Hive table stored as Parquet

Reply via email to