Do you want to compare within the rdd or do you have some external list or
data coming in ?

For matching, you could look at string edit distances or cosine similarity
if you are only comparing title strings.
On Oct 20, 2015 9:09 PM, "Ascot Moss" <ascot.m...@gmail.com> wrote:

> Hi,
>
> I have my RDD that stores the titles of some articles:
> 1. "About Spark Streaming"
> 2. "About Spark MLlib"
> 3. "About Spark SQL"
> 4. "About Spark Installation"
> 5. "Kafka Streaming"
> 6. "Kafka Setup"
> 7. ....
>
> I need to build a model to find titles by similarity,
> e.g
> if given "About Spark", hope to get:
>
> "About Spark Installation", 0.98622 (where 0.98622 is the score
> of similarity, range between 0 to 1)
> "About Spark MLlib", 0.95394
> "About Spark Streaming", 0.94332
> "About Spark SQL", 0.9111
>
> Any idea or reference to do so?
>
> Thanks
> Ascot
>
>
>
>
>
>  and need to find out similar titles
>

Reply via email to