Markus Jelsma-2 wrote: > > ... i'd strongly suggest not to index > multiple entities into a single document.
Unfortunately this is not possible, there are other parties involved and I cannot force them to put one entity per page. All I can do for now is to use the knowledge about the structure I have. I thought it is pretty common that even if one builds a fulltext index using nutch/solr, one would like to preserve some kind of information about the original structure. Having a look at the available plugins i found that the feed plugin should do what I need as its parser returns more than one document. This is what I plan to implement. I would like to split the document to be parsed into several documents, one per entity. From each of them I can then read out the desired values and fill the index fields. Then also the search queries should become easy as one index entry contains information about one entity. Before I get started changing my current plugin (it currently implements HtmlParseFilter but it seemt I need to implement Parse) I would like to ask you if this seems a possible solution? Are there any pitfalls or tricks I should be aware of? And another question: FeedParser.java within the feed plugin contains a main() method, but how can I execute it? It seems simpler to test during development using this method than building the plugin and crawl/index all again. Building the plugin with ant I cannot execute it, even if i manually change the Manifest file to contain the Main-Class attribute. How can I execute it with all libraries and dependencies in place? -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-hierarchical-data-schema-design-tp3052894p3062775.html Sent from the Nutch - User mailing list archive at Nabble.com.

