[jira] [Commented] (SQOOP-318) Add support for splittable lzo files with Hive

[email protected] (JIRA) Fri, 19 Aug 2011 16:49:56 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088076#comment-13088076
 ]

[email protected] commented on SQOOP-318:
-----------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1597/#review1563
-----------------------------------------------------------

Great patch Joey! I do have a high-level suggestion of adding a mapping to 
alias "lzop" to the codec "com.hadoop.compression.lzo.LzopCodec" in 
com.cloudera.sqoop.io.CodecMap implementation. If you do that, it is likely 
that the tests you have added in HiveImport and TableDefWriter will have to be 
modified in order to accommodate the use of the alias.

Also, it would be great to have a blurb about this in the user guide under 
src/docs/user.

Some minor checkstyle issues noted below.

src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3536>

    Indent.

src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3537>

    Line longer than 80.

src/java/com/cloudera/sqoop/hive/HiveImport.java
<https://reviews.apache.org/r/1597/#comment3538>

    Line longer than 80.

src/java/com/cloudera/sqoop/hive/TableDefWriter.java
<https://reviews.apache.org/r/1597/#comment3539>

    Lines longer than 80.

- Arvind

On 2011-08-19 18:49:06, Joey Echeverria wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1597/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-19 18:49:06)
bq.  
bq.  
bq.  Review request for Sqoop.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I added a check when generating the create table string to see if the 
LzopCodec is in use. If it is, it outputs
bq.  
bq.  STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
bq.  OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
bq.  
bq.  at the end of the create table command, otherwise it outputs the standard
bq.  
bq.  STORED AS TEXTFILE
bq.  
bq.  I also added a call to the DistributedLzoIndexer before the data is 
imported into Hive.
bq.  
bq.  
bq.  This addresses bug SQOOP-318.
bq.      https://issues.apache.org/jira/browse/SQOOP-318
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/java/com/cloudera/sqoop/hive/HiveImport.java 36c17ba 
bq.    src/java/com/cloudera/sqoop/hive/TableDefWriter.java 7dd9135 
bq.    src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java 43b755e 
bq.  
bq.  Diff: https://reviews.apache.org/r/1597/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  It includes a test for the create table syntax. I manually tested calling 
the indexer. I'm not sure how to automate that without making LZO required to 
build.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Joey
bq.  
bq.

> Add support for splittable lzo files with Hive
> ----------------------------------------------
>
>                 Key: SQOOP-318
>                 URL: https://issues.apache.org/jira/browse/SQOOP-318
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: hive-integration
>    Affects Versions: 1.3.0
>            Reporter: Joey Echeverria
>            Assignee: Joey Echeverria
>            Priority: Minor
>         Attachments: SQOOP-318-1.patch
>
>
> When importing LZO compressed files into Hive, it would be useful to create 
> the hive table with the com.hadoop.mapred.DeprecatedLzoTextInputFormat. It 
> would also be nice to automatically run the DistributedIndexer so that the 
> LZO files can be split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-318) Add support for splittable lzo files with Hive

Reply via email to