Thanks Holden and Chetan.

Holden - Have you tried it out, do you know the right way to do it?
Chetan - yes, if we use a Java NLP library, it should not be any issue in
integrating with spark streaming, but as I pointed out earlier, we want to
give flexibility to data scientists to use the language and library of
their choice, instead of restricting them to a library of our choice.

On Sun, Nov 26, 2017 at 9:42 PM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> But you can still use Stanford NLP library and distribute through spark
> right !
>
> On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> So it’s certainly doable (it’s not super easy mind you), but until the
>> arrow udf release goes out it will be rather slow.
>>
>> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
>>> was wondering if this is a good idea and what are the right Spark operators
>>> to do this? The reason we want to try this combination is that we don't
>>> want to run our transformations in python (pyspark), but after the
>>> transformations, we need to run some natural language processing operations
>>> and we don't want to restrict the functions data scientists' can use to
>>> Spark natural language library. So, Spark streaming with NLTK looks like
>>> the right option, from the perspective of fast data processing and data
>>> science flexibility.
>>>
>>> Regards,
>>> Ashish
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Reply via email to