I've also found the parquet as an output format doesn't work properly with hcat import and doesn't deal with timestamp or decimal types without crashing. This was using 1.4.5 and 1.4.6 client and cdh 5.3.
Even when just importing as text fields to hdfs folders, ending up with files of a suitable size (parquet files aren't splittable) that wont require rewriting to redistribute evenly is a guessing game. You need to know how much data you expect to pull out, how many mappers that means you should specify and you will also be run into problems if the split by column does not have even distribution, unless you are using the oraoop connector in which case splitting by a column is unnecessary and distribution is pretty uniform. We found the most hastle free method for us to pull from an oracle db was to do a hcatalog import as sequence file, which correctly mapped the data types, and then do a insert or create table select * from the imported table, converting to parquet at this point instead. Impala is convenient for the second step if available as it manages parquet output file sizes without any effort regardless of input data or requested output compression type. Jish On 28 May 2015 01:49, "Brett Medalen" <[email protected]> wrote: > Not available until Sqoop 1.4.5 or 1.4.6 > > On May 27, 2015, at 6:40 PM, Kumar Jayapal <[email protected]> wrote: > > Hi, > > Can I use --as-parquetfile argument while importing data to Hive? > > I have check the site > > https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_basic_usage > > I don't see this option any place mentioned. > > > > > > > > > > Thanks > Jay > >
