Since I could not solve my problem I try to get help again.
I searched the web, but could not find anything related to my problem. A
mailing list
post in [1] explaines the problem I try to solve, but I cannot find a
solution for it.

To shortly repeat: i need to write a nutch plugin that can create from one
crawled document two (or more), all having the same url and as content
having part of the original document.
Then I can proceed and parse/index the generated documents as usual.

Can anyone guide me into the right direction? Where should I start to
search? Classes, wikis, homepages, books?
Nutch does a great job for what I need it now, but I think it lacks a bit of
documentation, especially when it comes to plugin development.

How would a bare-bones plugin look like?

Is it even possible to modify this behaviour with a Nutch plugin? It is
essential for my app, otherwise I need to switch technology...



[1] http://osdir.com/ml/nutch-user.lucene.apache.org/2009-09/msg00117.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-hierarchical-data-schema-design-tp3052894p3077726.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to