Philippe, I have used GATE run on Hadoop with Behemoth but I take the annotations produced and push them to SolR for searching on later. I am not familiar with mimir. It should be possible to feed documents fetched with Nutch to GATE for text analysis
If you want to chat about using GATE with Nutch feel free to contact me off list. Alex On 1 January 2014 13:12, Philippe de Rochambeau <[email protected]> wrote: > Hello, > > can you use Nutch to crawl PDFs and extract person, location, dates, times > an money amounts as entities, as opposed to plain text strings? > > In GATE mimir-cloud (http://gate.ac.uk/mimir/), you can search for > {People}, {Location}, {Date}, and {Money} entities (if you have previously > used the appropriate Processing Resources to index your data sources, in > GATE Developer 7.1.) For instance, you can run search queries such as: > > « JOHN PAUL » IN {People} > Paris IN {Location}, > {Date normalized>20010101 normalize<20100101} > {Money > 2000} > ... > > Can you do such things in Nutch? > > Many thanks. > > Philippe

