.
On June 11, 2015, at 3:17 PM, Sean Owen so...@cloudera.com wrote:
Yep you need to use a transformation of the raw value; use toString for
example.
On Thu, Jun 11, 2015, 8:54 PM Crystal Xing crystalxin...@gmail.com
wrote:
That is a little scary.
So you mean in general, we shouldn't use
I load a list of ids from a text file as NLineInputFormat, and when I do
distinct(), it returns incorrect number.
JavaRDDText idListData = jvc
.hadoopFile(idList, NLineInputFormat.class,
LongWritable.class, Text.class).values().distinct()
I should have
to them since they change. So you may
have a bunch of copies of one object at the end that become just one in
each partition.
On Thu, Jun 11, 2015, 8:36 PM Crystal Xing crystalxin...@gmail.com
wrote:
I load a list of ids from a text file as NLineInputFormat, and when I
do distinct
...@cloudera.com wrote:
You can flatMap:
rdd.flatMap { in =
if (condition(in)) {
Some(transformation(in))
} else {
None
}
}
On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing crystalxin...@gmail.com
wrote:
Hi,
I have a text file input and I want to parse line by line and map each
Hi,
I have a text file input and I want to parse line by line and map each line
to another format. But at the same time, I want to filter out some lines I
do not need.
I wonder if there is a way to filter out those lines in the map function.
Do I have to do two steps filter and map? In that
this is not something to do, if you can avoid it
architecturally. For example, consider precomputing recommendations
only for users whose probability of needing recommendations soon is
not very small. Usually, only a small number of users are active.
On Thu, Feb 12, 2015 at 10:26 PM, Crystal Xing crystalxin
Hi,
I wonder if there is a way to do fast top N product recommendations for all
users in training using mllib's ALS algorithm.
I am currently calling
public Rating
http://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/mllib/recommendation/Rating.html[]
recommendProducts(int user,
no
interaction user_product pairs ?
On Thu, Feb 12, 2015 at 3:13 PM, Sean Owen so...@cloudera.com wrote:
Where there is no user-item interaction, you provide no interaction,
not an interaction with strength 0. Otherwise your input is fully
dense.
On Thu, Feb 12, 2015 at 11:09 PM, Crystal
but it's all taken care of by the implementation.
On Thu, Feb 12, 2015 at 11:29 PM, Crystal Xing crystalxin...@gmail.com
wrote:
HI Sean,
I am reading the paper of implicit training.
Collaborative Filtering for Implicit Feedback Datasets
It mentioned
To this end, let us introduce
Hi,
I have some implicit rating data, such as the purchasing data. I read the
paper about the implicit training algorithm used in spark and it mentioned
the for user-prodct pairs which do not have implicit rating data, such as
no purchase, we need to provide the value as 0.
This is different
10 matches
Mail list logo