You are right. Sqoop has improved support for Avro export files from the beginning. Sorry I overlooked - it was a while since I changed these for Hcatalog support!
Generally we should be able to support export of all formats (there is of course bug that still needs fixing for Parquet in Hive and may be for some other formats) in a s storage location and format agnostic way with Hcatlaog Venkat On 3/2/15, 9:30 AM, "Gwen Shapira" <[email protected]> wrote: >I'm pretty sure we support Avro exports, otherwise we wouldn't have >JIRAs like this: > >https://issues.apache.org/jira/browse/SQOOP-1283 > >On Sun, Mar 1, 2015 at 6:13 PM, Venkat Ranganathan ><[email protected]> wrote: >> If you are exporting from a hdfs location the default is text file >>format. >> I don’t think we support export to any other format than text using the >> —export-dir option (on the import side you could use —as-avrodatafile >>for >> avro like other files). >> >> But you can use the —hcatalog-table option to export a hive table >>without >> worrying about the storage location or format. >> >> Venkat >> >> From: Mark Grover >> Reply-To: "[email protected]" >> Date: Sunday, March 1, 2015 at 4:30 PM >> To: "[email protected]" >> Subject: Re: How does sqoop export detect Avro schema? >> >> Forgot to mention, here's the error I am getting: >> https://gist.github.com/markgrover/113196fecd1ec5bd0b38 >> >> And, please include me on cc. I am not on the list. Thanks again! >> >> On Sun, Mar 1, 2015 at 4:29 PM, Mark Grover <[email protected]> wrote: >>> >>> Hi Sqoop folks, >>> I am trying to better understand how sqoop export works. >>> >>> In the sqoop export command, we don't put any information about the >>> metadata of the HDFS data being exported. So, how does sqoop figure >>>out the >>> avro schema of the data being exported? >>> >>> Does it use Kite's .metadata directory for this? If so, that'd mean you >>> can't export data not populated by Kite. I don't think that's the case. >>> Does it parse our the file header or look at file extensions? If so, >>>that >>> doesn't work, I just populated an hive table which stores data in >>>avro, and >>> it's file extension is not avro. >>> Does it do something else that I am missing? >>> >>> I created a Hive avro table using some new syntax supported in Hive >>>0.14+: >>> >>> CREATE EXTERNAL TABLE avg_movie_rating2(movie_id INT, rating DOUBLE) >>> STORED AS AVRO >>> LOCATION '/data/movielens/aggregated_ratings' >>> >>> And, I just haven't been able to get Sqoop to be able to export that >>>data. >>> Here's the sqoop export command that I ran: >>> >>> sqoop export --connect \ >>> jdbc:mysql://mgrover-haa-2.vpc.cloudera.com:3306/movie_dwh \ >>> --username root --table avg_movie_rating --export-dir \ >>> /data/movielens/aggregated_ratings -m 16 \ >>> --update-key movie_id --update-mode allowinsert >>> >>> Any thoughts/insights would be much appreciated! >>> >>> Thanks! >>> Mark >> >>
