[ https://issues.apache.org/jira/browse/SPARK-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-27758. ---------------------------------- Resolution: Incomplete > Features won't generate after 1M rows > ------------------------------------- > > Key: SPARK-27758 > URL: https://issues.apache.org/jira/browse/SPARK-27758 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.1.0 > Reporter: Rakesh Partapsing > Priority: Major > > I am trying to fit a huge dataset with ALS. The model I use: > val als = new ALS() > .setImplicitPrefs(true) > .setNonnegative(true) > .setUserCol("userIndex") > .setItemCol("itemIndex") > .setRatingCol("count") > .setMaxIter(20) > .setRank(40) > .setRegParam(0.5) > .setNumUserBlocks(20) > .setNumItemBlocks(20) > .setAlpha(5) > > val alsModel = als.fit(data) > > Now I see data if the user or itemindex has more than 1M rows, features will > not be calculated for this user/itemId. Nor an error is returned. Is this a > know issue for spark 2.1.0? > So what I do now is randomSplit my data in like 4 batches, process each batch > through ALS and then average each feature element from the 4 batches. Is this > a valid approach? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org