Hi fellow data crunchers,
I am running a JobFlow with a step using
"org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob"
and a following step using
"org.apache.mahout.cf.taste.hadoop.item.RecommenderJob". The first step
works without problems, but the second one is throwing an Exception:
|Exception in thread"main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory temp/itemIDIndex
already exists and is not empty
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:124)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:818)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:165)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
|
It looks like the second job is using the same temporal output
directories as the first job. How can I avoid this? Or even better: If
some of the tasks are already done and cached in the first step, how
could I use them so that they don't have to be recomputed in the second
step?
Best regards,
Thomas
PS: This is the actual JobFlow definition in JSON:
[
[......],
{
"Name": "MR Step 2: Find similiar items",
"HadoopJarStep": {
"MainClass":
"org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob",
"Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
"Args": [
"--input",
"s3n://recommendertest/data/<jobid>/aggregateWatched/",
"--output",
"s3n://recommendertest/data/<jobid>/similiarItems/",
"--similarityClassname", "SIMILARITY_PEARSON_CORRELATION",
"--maxSimilaritiesPerItem", "100"
]
}
},
{
"Name": "MR Step 3: Find items for user",
"HadoopJarStep": {
"MainClass": "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
"Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
"Args": [
"--input",
"s3n://recommendertest/data/<jobid>/aggregateWatched/",
"--output",
"s3n://recommendertest/data/<jobid>/userRecommendations/",
"--similarityClassname", "SIMILARITY_PEARSON_CORRELATION",
"--numRecommendations", "100"
]
}
}
]
||||