Hi fellow data crunchers,

I am running a JobFlow with a step using "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" and a following step using "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob". The first step works without problems, but the second one is throwing an Exception:

|Exception in thread"main"  
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory temp/itemIDIndex 
already exists and is not empty
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:124)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:818)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
        at 
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:165)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:328)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

|

It looks like the second job is using the same temporal output directories as the first job. How can I avoid this? Or even better: If some of the tasks are already done and cached in the first step, how could I use them so that they don't have to be recomputed in the second step?

Best regards,
Thomas

PS: This is the actual JobFlow definition in JSON:

[
   [......],
  {
    "Name": "MR Step 2: Find similiar items",
    "HadoopJarStep": {
"MainClass": "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob",
      "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
      "Args": [
"--input", "s3n://recommendertest/data/<jobid>/aggregateWatched/", "--output", "s3n://recommendertest/data/<jobid>/similiarItems/",
         "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
         "--maxSimilaritiesPerItem",    "100"
      ]
    }
  },
  {
    "Name": "MR Step 3: Find items for user",
    "HadoopJarStep": {
      "MainClass": "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
      "Jar": "s3n://recommendertest/mahout-core/mahout-core-0.4-job.jar",
      "Args": [
"--input", "s3n://recommendertest/data/<jobid>/aggregateWatched/", "--output", "s3n://recommendertest/data/<jobid>/userRecommendations/",
         "--similarityClassname",    "SIMILARITY_PEARSON_CORRELATION",
         "--numRecommendations",    "100"
      ]
    }
  }
]

||||

Reply via email to