Ivane,
Yes, you can use Lucene for this. 10 mil. documents is not much, if
you use adequate hardware. You can use boost individual documents
(check the javadocs for Document and Field classes).
Are you aware of Nutch, though? It sounds like you are not, and Nutch
is probably the best tool for
Hello!
I am currently choosing technology for web crawler and search engine
that will index between 1 and 10 million of documents (with storing
documents). For some parts of the project I'll most likely choose
existing software, for some I'll have to right new code, but at the end
it should be