Do you see the failure in the first job (sampling) or second job? Do you
see the exception right after the job kick off?
If the replicated side is too large, you probably will see a "Java heap
exception" rather than job setup exception. It more like an environment
issue. Check if you can run regular join, or you have other hadoop
config file in your classpath.
Daniel
On 04/27/2011 05:26 PM, Renato Marroquín Mogrovejo wrote:
Now that the Apache server is ok with me again, I can write back to
the list. I wrote to the Apache Infra team and they told me to write
messages just in plain text, disabling any html within the message
(not that I ever sent html but oh well), I guess that worked :)
Well, first thanks for answering. I am using pig 0.7 and my pig script
is as follows:
{code}
sr = LOAD 'pigData/sr.dat' using PigStorage('|') AS
(sr_ret_date_sk:int, sr_ret_tim_sk:int, sr_ite_sk:int, sr_cus_sk:int,
sr_cde_sk:int, sr_hde_sk:int, sr_add_sk:int, sr_sto_sk:int,
sr_rea_sk:int, sr_tic_num:int, sr_ret_qua:int, sr_ret_amt:double,
sr_ret_tax:double, sr_ret_amt_inc_tax:double, sr_fee:double,
sr_ret_sh_cst:double, sr_ref_csh:double, sr_rev_cha:double,
sr_sto_cred:double, sr_net_lss:double);
cd = LOAD 'pigData/cd.dat' using PigStorage('|') AS (cd_dem_sk:int,
cd_gnd:chararray, cd_mrt_sts:chararray, cd_edt_sts:chararray,
cd_pur_est:int, cd_cred_rtg:chararray, cd_dep_cnt:int,
cd_dep_emp_cnt:int, cd_dep_col_count:int);
proy_sR = FOREACH sr GENERATE sr_cde_sk;
proy_cD = FOREACH cd GENERATE cd_dem_sk;
join_sR_cD = JOIN proy_sR BY sr_cde_sk, proy_cD BY cd_dem_sk USING 'replicated';
STORE join_sR_cD INTO 'queryResults/query.11.sr.cd.5.1' using PigStorage('|');
{/code}
Being "cd" the relation of 77MB and "sr" the relation of 32MB. I had
some other similar queries in which the 32MB relation was being joined
with smaller relations (<10MB) giving the same problem, I modified
those, so the queries<10MB would be ones being replicated.
Thanks again.
Renato M.
2011/4/27 Thejas M Nair<te...@yahoo-inc.com>:
The exception indicates that the hadoop job creation failed. Are you able to
run simple MR queries using each of the inputs ?
It could also caused by some problem pig is having with copying the file
being replicated to distributed cache.
-Thejas
On 4/27/11 3:42 PM, "Renato Marroquín Mogrovejo"
<renatoj.marroq...@gmail.com> wrote:
Does anybody have any suggestions? Please???
Thanks again.
Renato M.
2011/4/26 Alan Gates<ga...@yahoo-inc.com>
Sent for Renato, since Apache's mail system has decided it doesn't like
him.
Alan.
I am getting an error while trying to execute a simple fragment replicated
join on two files (one of 77MB and the other one of 32MB). I am using the
32MB file as the small one to be replicated, but I keep getting this
error.
Does any body know how this count is done? I mean how Pig determines that
the small file is not small enough, or how I could modify this?
I am executing these on four PC's with 3GB of RAM running DebianLenny.
Thanks in advance.
Renato M.
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.executionengine.ExecException: ERROR 2043:
Unexpected
error during execution.
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
at org.apache.pig.PigServer.execute(PigServer.java:828)
at org.apache.pig.PigServer.access$100(PigServer.java:105)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.
--