Re: Nutch as crawler for text analysis: setup ? version ?

2012-03-09 Thread Mathijs Homminga
Dear Piet, First, you're absolutely right about the state of the documentation. We have to deal with this in the near future. Now, although nutchgora currently is developed on a branch, it is actually still alive and kicking. More, a first Nutch 2.0 release, based on the nutchgora branch, is in

Re: Nutch as crawler for text analysis: setup ? version ?

2012-03-09 Thread Markus Jelsma
Behemoth [1] eats Nutch 1.x segments and can push them a.o. to GATE. Nutch comes with its own Tika parser. [1]: https://github.com/jnioche/behemoth cheers On Friday 09 March 2012 16:19:03 Piet van Remortel wrote: > Hi all, > > Pretty new to nutch. Trying to create a setup where nutch repeated