Re: Performance Issue using item-based approach!

Najum Ali Thu, 17 Apr 2014 07:44:38 -0700

Ted,

Is it also possible to use ItemSimilarityJob in a non-distributed environment?


Am 17.04.2014 um 16:22 schrieb Ted Dunning <ted.dunn...@gmail.com>:

> Najum,
> 
> You should also be able to use the ItemSimilarityJob to compute a limited
> indicator set.
> 
> This is stepping off of the path you have been on, but it would allow you
> to deploy the recommender via a search engine.
> 
> That makes a lot of code simply vanish.  THis is also a well trod
> production path.
> 
> 
> 
> 
> On Thu, Apr 17, 2014 at 3:57 AM, Najum Ali <naju...@googlemail.com> wrote:
> 
>> @Sebastian
>> 
>> wow … you are right. The original csv file is about 21mb and the
>> corresponding precomputed item-item similarity file is about 260mb!!
>> And yes, there are wide more than 50 "most similar items“ for an item ..
>> 
>> Trying to restrict this to 50 (or something like that) most similar items
>> for an item could do the trick as you said.
>> Ok I will give it try and reply later.
>> 
>> By the way, what´s about the SampingCandidateItemsStrategy or something
>> like this, by using this Constructor:
>> *GenericItemBasedRecommender
>> <https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.html#GenericItemBasedRecommender(org.apache.mahout.cf.taste.model.DataModel,%20org.apache.mahout.cf.taste.similarity.ItemSimilarity,%20org.apache.mahout.cf.taste.recommender.CandidateItemsStrategy,%20org.apache.mahout.cf.taste.recommender.MostSimilarItemsCandidateItemsStrategy)>*
>> (DataModel<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html>
>> dataModel, 
>> ItemSimilarity<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/similarity/ItemSimilarity.html>
>> similarity, 
>> CandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/CandidateItemsStrategy.html>
>> candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html>
>> mostSimilarItemsCandidateItemsStrategy)
>> 
>> 
>> Am 17.04.2014 um 12:41 schrieb Sebastian Schelter <s...@apache.org>:
>> 
>> Hi Najum,
>> 
>> I think I found the problem. Remember: Two items are similar whenever at
>> least one user interacted with both of them ("the items co-occur").
>> 
>> In the movielens dataset this is true for almost all pairs of items,
>> unfortunately. From 3076 items, more than 11 million similarities are
>> created. A common approach for that (which is not yet implemented in our
>> precomputation unfortunately) is to only retain the top-k similar items per
>> item.
>> 
>> A solution would be to take the csv file that is created by the
>> MultithreadedBatchItemSimilarities and postprocess it so that only the 50
>> most similar items per item are retained. That should help with your
>> problem.
>> 
>> Unfortunately, we don't have code for that yet, maybe you want to try to
>> write that yourself?
>> 
>> Best,
>> Sebastian
>> 
>> PS: The user-based recommender restricts the number of similar users, I
>> guess thats why it is so fast here.
>> 
>> 
>> On 04/17/2014 12:18 PM, Najum Ali wrote:
>> 
>> Ok, here you go:
>> 
>> I have created a simple class with main-method (no server and other stuff):
>> 
>> public class RecommenderTest {
>> public static void main(String[] args) throws IOException, TasteException {
>> DataModel dataModel = new FileDataModel(new
>> 
>> File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));
>> ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
>> ItemBasedRecommender recommender = new
>> GenericItemBasedRecommender(dataModel,
>> similarity);
>> 
>> String pathToPreComputedFile = preComputeSimilarities(recommender,
>> dataModel.getNumItems());
>> 
>> InputStream inputStream = new FileInputStream(new
>> File(pathToPreComputedFile));
>> BufferedReader bufferedReader = new BufferedReader(new
>> InputStreamReader(inputStream));
>> Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
>> 
>> bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());
>> ItemSimilarity precomputedSimilarity = new
>> GenericItemSimilarity(correlations);
>> ItemBasedRecommender recommenderWithPrecomputation = new
>> GenericItemBasedRecommender(dataModel, precomputedSimilarity);
>> 
>> recommend(recommender);
>> recommend(recommenderWithPrecomputation);
>> }
>> 
>> private static String preComputeSimilarities(ItemBasedRecommender
>> recommender,
>> int simItemsPerItem) throws TasteException {
>> String pathToAbsolutePath = "";
>> try {
>> File resultFile = new File(System.getProperty("java.io.tmpdir"),
>> "similarities.csv");
>> if (resultFile.exists()) {
>> resultFile.delete();
>> }
>> BatchItemSimilarities batchJob = new
>> MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);
>> int numSimilarities =
>> batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(),
>> 1,
>> new FileSimilarItemsWriter(resultFile));
>> pathToAbsolutePath = resultFile.getAbsolutePath();
>> System.out.println("Computed " + numSimilarities + " similarities and
>> saved them
>> to " + pathToAbsolutePath);
>> } catch (IOException e) {
>> System.out.println("Error while writing pre computed similarities to
>> file");
>> }
>> return pathToAbsolutePath;
>> }
>> 
>> private static void recommend(ItemBasedRecommender recommender) throws
>> TasteException {
>> long start = System.nanoTime();
>> List<RecommendedItem> recommendations = recommender.recommend(1, 10);
>> long end = System.nanoTime();
>> System.out.println("Created recommendations in " +
>> getCalculationTimeInMilliseconds(start, end) + " ms. Recommendations:" +
>> recommendations);
>> }
>> 
>> private static double getCalculationTimeInMilliseconds(long start, long
>> end) {
>> double calculationTime = (end - start);
>> return (calculationTime / 1_000_000);
>> }
>> 
>> 
>> private static Function<String, GenericItemSimilarity.ItemItemSimilarity>
>> mapToItemItemSimilarity = (line) -> {
>> String[] row = line.split(",");
>> return new GenericItemSimilarity.ItemItemSimilarity(
>> Long.parseLong(row[0]), Long.parseLong(row[1]),
>> Double.parseDouble(row[2]));
>> };
>> }
>> 
>> And thats the Output-log:
>> 
>> 3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Creating FileDataModel for file
>> 
>> /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv
>> 63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Reading file info...
>> 1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel -
>> Processed 1000000 lines
>> 1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel
>> - Read
>> lines: 1000209
>> 1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel -
>> Processed 6040 users
>> 1599 [main] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - Queued 3706 items in 38 batches
>> 10928 [pool-1-thread-8] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 7 processed 5 batches
>> 10928 [pool-1-thread-8] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 7 processed 5 batches. done.
>> 10978 [pool-1-thread-5] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 4 processed 4 batches. done.
>> 11589 [pool-1-thread-4] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 3 processed 5 batches
>> 11589 [pool-1-thread-4] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 3 processed 5 batches. done.
>> 11592 [pool-1-thread-6] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 5 processed 5 batches
>> 11592 [pool-1-thread-6] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 5 processed 5 batches. done.
>> 11707 [pool-1-thread-7] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 6 processed 5 batches
>> 11707 [pool-1-thread-7] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 6 processed 5 batches. done.
>> 11730 [pool-1-thread-3] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 2 processed 4 batches. done.
>> 11849 [pool-1-thread-1] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 0 processed 5 batches
>> 11849 [pool-1-thread-1] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 0 processed 5 batches. done.
>> 11854 [pool-1-thread-2] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 1 processed 5 batches
>> 11854 [pool-1-thread-2] INFO
>> 
>> org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
>> - worker 1 processed 5 batches. done.
>> Computed 9174333 similarities and saved them to
>> /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv
>> Created recommendations in *1683.613
>> ms*. Recommendations:[RecommendedItem[item:3890, value:4.6771617],
>> RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127,
>> value:4.660716], RecommendedItem[item:3323, value:4.660716],
>> RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123,
>> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
>> Created recommendations in* 985.679
>> ms.* Recommendations:[RecommendedItem[item:3530, value:5.0],
>> RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890,
>> value:4.6771617], RecommendedItem[item:127, value:4.660716],
>> RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123,
>> value:4.603366], RecommendedItem[item:3233, value:4.5707765],
>> RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
>> value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
>> 
>> Again almost same results. Although what I also don´t understand is, why
>> am I
>> getting different RecommendItems?
>> That really frustrates me…
>> 
>> You can find the Java file in the attachment.
>> 
>> 
>> 
>> Greetings from Germany,
>> Najum
>> 
>> Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <s...@apache.org
>> <mailto:s...@apache.org <s...@apache.org>>>:
>> 
>> Yes, just to make sure the problem is in the mahout code and not in the
>> surrounding environment.
>> 
>> On 04/17/2014 11:43 AM, Najum Ali wrote:
>> 
>> @Sebastian
>> What do u mean with a standalone recommender? A simple offline java main
>> program?
>> 
>> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <s...@apache.org
>> <mailto:s...@apache.org <s...@apache.org>>>:
>> 
>> Could you take the output of the precomputation, feed it into a standalone
>> recommender and test it there?
>> 
>> 
>> On 04/17/2014 11:37 AM, Najum Ali wrote:
>> 
>> @sebastian
>> 
>> Are you sure that the precomputation is done only once and not in every
>> request?
>> 
>> Yes, a @Bean annotated Object is in Spring per default a singleton
>> instance.
>> I also just tested it out using a System.out.println()
>> Here is my log:
>> 
>> System.out.println("----> precomputation done!“ is called before returning
>> the
>> GenericItemSimilarity.
>> 
>> The first two recommendations are Item-based -> pearson similarity
>> The thrid and 4th log are also item-based using pre computed similarity
>> The last log is the userbased recommender using pearson
>> 
>> Look at the huge time difference!
>> 
>> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <s...@apache.org
>> <mailto:s...@apache.org <s...@apache.org>>
>> <mailto:s...@apache.org <s...@apache.org>>>:
>> 
>> Najum,
>> 
>> this is really strange, feeding an ItemBased Recommender with precomputed
>> similarities should give you superfast recommendations.
>> 
>> Are you sure that the precomputation is done only once and not in every
>> request?
>> 
>> --sebastian
>> 
>> On 04/17/2014 11:17 AM, Najum Ali wrote:
>> 
>> Hi guys,
>> 
>> I have created a precomputed item-item-similarity collection for a
>> GenericItemBasedRecommender.
>> Using the 1M MovieLens data, my item-based recommender is only 40-50%
>> faster
>> than without precomputation (like 589.5ms instead 1222.9ms).
>> But the user-based recommender instead is really fast, it´s like 24.2ms?
>> How can
>> this happen?
>> 
>> Here are more details to my Implementation:
>> 
>> CSV File: 1M pref, 6040 Users, 3706 Items
>> 
>> For my Implementation I´m using screenshots, because having the good
>> highlighting.
>> My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I
>> receive Recommendations as Webservice (JSON).
>> 
>> For DataModel, I´m using FileDataModel.
>> 
>> 
>> This code below creates me a precomputed ItemSimilarity when I start the
>> Webserver and the property isItemPreComputationEnabled is set to true:
>> 
>> 
>> For time measuring I´m using AOP. I´m measuring the whole time from
>> entering my
>> Controller to sending the response.
>> based on System.nanoTime(); and getting the diff. It´s the same time
>> measure for
>> user based.
>> 
>> I haved tried to cache the recommender and the similarity with no big
>> difference. I also tried to use CandidateItemsStrategy and
>> MostSimilarItemsCandidateItemsStrategy, but also no performance boost.
>> 
>> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity
>> similarity)
>> throws TasteException {
>> final int numberOfUsers = dataModel.getNumUsers();
>> final int numberOfItems = dataModel.getNumItems();
>> CandidateItemsStrategy candidateItemsStrategy = new
>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>> return model -> new GenericItemBasedRecommender(model,
>> similarity,candidateItemsStrategy,mostSimilarStrategy);
>> }
>> 
>> I dont know why item-based is taking so much longer then user-based.
>> User-based
>> is like fast as hell. I even tried a DataSet using 100k Prefs, and
>> 10Million
>> (Movielens). Everytime the user-based is soo much faster for any
>> similarity.
>> 
>> Hope you anyone can help me to understand this. Maybe I´m doing something
>> wrong.
>> 
>> Thanks!! :))
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Performance Issue using item-based approach!

Reply via email to