Hi: I'm currently working on a plattform for crawl a large amount of PDFs files. Using nutch (and tika) I'm able of extract and store the textual content of the files in solr, but right now we want to be able to extract the content of the PDFs by page, this means that, we want to store several solr fields (one per each page in the document). Is there any recommended way of accomplish this in nutch/solr?. With a parse plugin I could store the text from each page to the metadata's document, anything else would be needed?
slds -- "It is only in the mysterious equation of love that any logical reasons can be found." "Good programmers often confuse halloween (31 OCT) with christmas (25 DEC)"

