Kind of depends on your use case. You can for example use the name finder to detect entities in those articles. They could be used to compute a graph which tells you which names are frequently mentioned together.
Topic modeling might help to search for articles based on their topic. Jörn On Thu, 2015-02-19 at 21:29 +0100, Philippe de Rochambeau wrote: > Hello, > > In the past few months, I have indexed tens of thousands of PDFs containing > newspaper articles from 1887 until 1940 using SOLR for my company. > > Every day, my colleagues in the Archive Department spend hours searching > through the archives using SOLR, looking for potentially-interesting articles > from a social and historical point of view. > > Can OpenNLP be used to automate their work and/or to analyze patterns in the > data? > > Many thanks. > > Philippe
signature.asc
Description: This is a digitally signed message part
