Re: Exporting parquet, issues with schema

Jarek Jarcec Cecho Thu, 10 Dec 2015 01:11:41 -0800

Hi Brian,
Sqoop uses library called Kite to work with parquet files. The library requires 
the .metadata directory and hence the dependency.


A workaround if you need to export any arbitrary parquet directory is to call 
command  "kite create” on the directory first - this will create the required 
.metadata directory and you should be good to proceed with export.

We’re choosing different route in Sqoop 2, so this unfortunate need won’t be 
there.

Jarcec

> On Dec 9, 2015, at 9:07 PM, Brian Henriksen <[email protected]> 
> wrote:
> 
> I am trying to use sqoop to export some parquet data to oracle from HDFS.  
> The first problem I ran into is that parquet export requires a .metadata 
> directory that is created by a sqoop parquet IMPORT (Can anyone explain this 
> to me, it seems odd to me that one can only send data to a database, that you 
> just grabbed from a database).  I got around this by converting a small 
> subset of my parquet data to text, sqoop export the text to oracle, and then 
> sqoop import the data back to HDFS as parquet, and with it the .metadata 
> directory.  Here is the error Im getting:
> 
> 
> java.lang.NullPointerException
> at java.io.StringReader.<init>(StringReader.java:50)
> at org.apache.avro.Schema$Parser.parse(Schema.java:917)
> at parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:54)
> at 
> parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
> at 
> parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
> at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
> at 
> org.kitesdk.data.spi.AbstractKeyRecordReaderWrapper.initialize(AbstractKeyRecordReaderWrapper.java:50)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroup
> 
> This looks like sqoop is getting to the point of starting up the mappers, but 
> they are not aware of my parquet / avro schema.  Where does sqoop look for 
> these schemas?  As far as I know, parquet files include the schema within the 
> data files themselves, in addition to this there is the .metadata directory 
> that contains a .avsc JSON file with the same schema.  Any ideas?

Re: Exporting parquet, issues with schema

Reply via email to