[ 
https://issues.apache.org/jira/browse/PIG-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reopened PIG-4591:
-------------------------------------

> Drop use of the internal Bzip2TextInputFormat
> ---------------------------------------------
>
>                 Key: PIG-4591
>                 URL: https://issues.apache.org/jira/browse/PIG-4591
>             Project: Pig
>          Issue Type: Wish
>          Components: data, tools
>    Affects Versions: 0.14.0
>         Environment: set pig.noSplitCombination to false and 
> pig.maxCombinedSplitSize hight enought so combination nof input files do 
> happen.
>            Reporter: Remi Catherinot
>            Priority: Minor
>              Labels: easyfix
>
> When loading mutiple files which not all the files sharing the same 
> compressor (load gz + bz2 + rawata files for example), depending on the last 
> file used, PigStorage will use either Bzip2TextInputFormat if the last file 
> ends with .bz2 end fail, or PigStorage will use TextInputFormat in any other 
> case and succeed in ready all types of files (including the bz2 one).
> A = LOAD 'file1.gz,file2.bz2' USING PigStorage(); <-- this will fail
> B = LOAD 'file2.bz2,file1.gz' USING PigStorage(); <-- this will succeed
> I think another person suggested in the dev mailing list to drop the use of 
> the internal pig Bzip2TextInputFormat because hadoop now better handle those 
> cases (bz2 compression & co). I don't push the patch yet because i don't have 
> a fully comliant pig test environnement so i'm not able to be sure this won't 
> introduce a regression with the minimal supported level version of hadoop by 
> pig 0.14/0.15 + i need to know if you agree in drop the internal Bzip2 stuff 
> and rely on the hadoop implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to