Grant, I'm not a java developer but a sysadmin and I've been struggling for a 
couple of month now to build a full web search engine stack based on hadoop + 
nutch + solr .

I don't know much about the documentation for developers so I trust you if you 
say it's good. 

What I do know is that I found good docs for the very first steps (installing 
and performing simple "single-run" crawling and indexing with near-default 
configurations) but I'm now facing a great lack of information about features 
useful in production scenarios. 

Now I need to dig deeper into how data are managed, into the workflow and all 
the features that are needed in real world and i find that the documentation is 
little, rather confused, incomplete, often quite old and spreaded into too many 
disconnected pieces. Stumbling around on the Net I discovered I'm not alone, 
actually.

I'm experiencing many difficulties in understanding and implementing even quite 
basic features such as consistent incremental recrawling/reindexing, adding 
custom fields, data parsing, duplicate detection, automatic removal of old 
indexed documents based on insertion date and so on. 

I mean, I would like a more organic set of use cases suited for real-world 
scenarios (as starting points) and some in-depth explanation of exactly how the 
data "flow" from the crawler into the complex structure of Solr and how it is 
handled by the different components of the stack. (Nutch documentation is 
probably even worse, but this is not the right place to complain about that).

S

---------------------------------- 
"Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't."
Paul Graham


"A mathematician is a device for turning coffee into theorems."
Paul Erdos (who obviously never met a sysadmin)



----- Messaggio originale -----
> Da: Grant Ingersoll <gsing...@apache.org>
> A: solr-user@lucene.apache.org
> Inviato: Mer 24 febbraio 2010, 18:54:32
> Oggetto: Re: If you could have one feature in Solr...
> 
> 
> On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote:
> 
> > Decent documentation. 
> 
> What parts do you feel are lacking?  Or is it just across the board?  Wikis 
> are 
> both good and bad for documentation, IMO.
> 
> -Grant




Reply via email to