Hi, Has someone tried running NLTK (python) with Spark Streaming (scala)? I was wondering if this is a good idea and what are the right Spark operators to do this? The reason we want to try this combination is that we don't want to run our transformations in python (pyspark), but after the transformations, we need to run some natural language processing operations and we don't want to restrict the functions data scientists' can use to Spark natural language library. So, Spark streaming with NLTK looks like the right option, from the perspective of fast data processing and data science flexibility.
Regards, Ashish