Does this solve you're problem Jann? Is this worth filing an issue for as it is rather trivial to address but could help more users unfamiliar with specifics of Nutch (or Solr) Schema(s)
On Mon, Sep 19, 2011 at 3:06 PM, Markus Jelsma <[email protected]>wrote: > *previous sent by accident > > On Monday 19 September 2011 15:58:35 lewis john mcgibbney wrote: > > Yes, what Markus has pointed out is the problem I think Jann. This means > > you need to re-index you're data and change the stored and index value to > > true. > > > > Markus', out of interest do you know the pro's/con's if we were to make > > this default in the Nutch schema? For example, with small indexes I > > wouldn't imagine there would be much difference, however non-trivial > sized > > indexes I would imagine would be a different story... > > The index size ~*2.1 depending on analyzers etc (stopwords mostly). > However, > uses that set up very large indexes are expected to be at least > intermediate > Solr users and have proper understanding of the schema. > > They will toggle settings as they see fit whereas new users don't but > expect > output. > > > > > Any thoughts. > > > > On Mon, Sep 19, 2011 at 2:54 PM, Markus Jelsma > > > > <[email protected]>wrote: > > > Check line 79 of your Solr schema: > > > > > > > http://svn.apache.org/viewvc/nutch/branches/branch-1.3/conf/schema.xml?vi > > > ew=markup > > > > > > Maybe we should configure the field to be stored in 1.4. I can imagine > > > this causes a lot of headaches for new users. Also highlighting will > > > never work with unstored fields. > > > > > > On Monday 19 September 2011 11:02:17 Jann Forrer wrote: > > > > Hi > > > > > > > > I tried to run nutch-1.3 together with solr 3.x according to > > > > http://wiki.apache.org/nutch/NutchTutorial. > > > > > > > > That worked as described but if I try to search the index using the > > > > Solr admin > > > > interface i always get an empty result. > > > > > > > > http://localhost:8983/solr/admin/schema.jsp > > > > > > > > Using the Schema Browser I see entries in different fields (e.g. the > > > > url field) but the content field is emtpy. I > > > > was looking for similar problem on the mailing list but I didn't > found > > > > a solution for this problem. > > > > > > > > Here is what I did: > > > > > > > > 1.) ./bin/nutch crawl urls -dir crawl -depth 3 -topN 5 > > > > 2.) Dumping the segment (./bin/nutch readseg -dump > > > > crawl/segments/20110916124747 test). The script > > > > > > > > did also dump the content of the web pages. All seems to be ok > > > > > > here. > > > > > > > 3.) Copy the nutch schema.xml to the solr conf directory > > > > 4.) bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb > > > > crawl/linkdb crawl/segments/* > > > > 5.) And then trying to search using > http://localhost:8983/solr/admin/. > > > > but didn't found any HTML-content. > > > > > > > > However if there was a pdf-File to crawl, this pdf-Content is > > > > > > found. > > > > > > > BTW. Using Nutch 1.1 and solr 1.4.1 all worked as expected. I could > > > > use these version but I am upgrading > > > > from an older Nutch Version and it would be nice if I could use the > > > > newer version where nutch and solr > > > > are better integrated. > > > > > > > > Any Ideas what might be wrong? > > > > > > > > Jann > > > > > > -- > > > Markus Jelsma - CTO - Openindex > > > http://www.linkedin.com/in/markus17 > > > 050-8536620 / 06-50258350 > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > -- *Lewis*

