Hello,

I'm looking for a way to use R in Nutch, particularly HTML parser, but usage in 
the other parts can be intresting as well. For each parsed document I would 
like to run a script and provide the results back to the system e.g. topic 
detection of the document.
 
NB I'm not looking for a way of scaling R to Hadoop or HDFS like Microsoft R 
server. This way uses Hadoop as an execution engine after the crawling process. 
In other words, first the computationally intensive full crawling after that 
another computationally intensive R/Hadoop process.
 
Instead I'm looking for a way of calling R scripts directly from java code of 
map or reduce jobs. Any ideas how to make it? One way to do it is "Rserve - 
Binary R server", but I'm looking for alternatives, to compare efficiency.

Semyon.

Reply via email to