[ https://issues.apache.org/jira/browse/PIG-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1576: ---------------------------- Fix Version/s: 0.9.0 > Difference in Semantics between Load statement in Pig and HDFS client on > Command line > ------------------------------------------------------------------------------------- > > Key: PIG-1576 > URL: https://issues.apache.org/jira/browse/PIG-1576 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.6.0, 0.7.0 > Reporter: Viraj Bhat > Fix For: 0.9.0 > > > Here is my directory structure on HDFS which I want to access using Pig. > This is a sample, but in real use case I have more than 100 of these > directories. > {code} > $ hadoop fs -ls /user/viraj/recursive/ > Found 3 items > drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 > /user/viraj/recursive/20080615 > drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 > /user/viraj/recursive/20080616 > drwxr-xr-x - viraj supergroup 0 2010-08-26 11:25 > /user/viraj/recursive/20080617 > {code} > Using the command line I am access them using variety of options: > {code} > $ hadoop fs -ls /user/viraj/recursive/{200806}{15..17}/ > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080615/kv2.txt > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080616/kv2.txt > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080617/kv2.txt > $ hadoop fs -ls /user/viraj/recursive/{20080615..20080617}/ > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080615/kv2.txt > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080616/kv2.txt > -rw-r--r-- 1 viraj supergroup 5791 2010-08-26 11:25 > /user/viraj/recursive/20080617/kv2.txt > {code} > I have written a Pig script, all the below combination of load statements do > not work? > {code} > --A = load '/user/viraj/recursive/{200806}{15..17}/' using > PigStorage('\u0001') as (k:int, v:chararray); > A = load '/user/viraj/recursive/{20080615..20080617}/' using > PigStorage('\u0001') as (k:int, v:chararray); > AL = limit A 10; > dump AL; > {code} > I get the following error in Pig 0.8 > {noformat} > 2010-08-27 16:34:27,704 [main] ERROR > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! > 2010-08-27 16:34:27,711 [main] INFO org.apache.pig.tools.pigstats.PigStats - > Script Statistics: > HadoopVersion PigVersion UserId StartedAt FinishedAt > Features > 0.20.2 0.8.0-SNAPSHOT viraj 2010-08-27 16:34:24 2010-08-27 16:34:27 > LIMIT > Failed! > Failed Jobs: > JobId Alias Feature Message Outputs > N/A A,AL Message: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to > create input splits for: /user/viraj/recursive/{20080615..20080617}/ > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) > at > org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) > at > org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input > Pattern hdfs://localhost:9000/user/viraj/recursive/{20080615..20080617} > matches 0 files > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:268) > ... 7 more > hdfs://localhost:9000/tmp/temp241388470/tmp987803889, > {noformat} > The following works: > {code} > A = load '/user/viraj/recursive/{200806}{15,16,17}/' using > PigStorage('\u0001') as (k:int, v:chararray); > AL = limit A 10; > dump AL; > {code} > Why is there an inconsistency between HDFS client and Pig? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.