Hello Otis, thanks for the info. Would it a be an improvement to be able to specify in the schema.xml wether or not the URI should be stored or not in a field which name you can also specify in the schema? It might be very well possible that you do not "own" the xml documents you index over http, and at the same time, you do not want to store its contents in the index. Since at indexing time the uri is known, adding it to the index is trivial.
Regards Ard You have to store the URI in a Field yourself. That means you need to define that field in the schema and you have to set its value when adding documents. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Ard Schrijvers <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, June 12, 2007 9:02:25 AM Subject: RE: storing the document URI in the index Hello Erik, thanks for the fast answer (sry for my mail not indenting but must use webmail :-( ), but the problem I am facing is that I do not see solr storing the location of the documents it indexed. So, I need to store the location of a document in a field, but I do not see where solr would do this. Fetching the document will be done with the simple cocoon generator, so that is no problem, but of course, I need the url/uri to be in the index. I know I need it as a UN_TOKENIZED STORED field, but just see with LUKE that the location is not present in lucene index when solr "crawls" some directory with xml files, Regards Ard Schrijvers Yes. Set the field to be store and non-indexed, field type "string" is what I use. > Or is everybody used to storing the contents of a document in the > lucene index (doesn't this imply a much larger index though?), so > instead of retrieving the document's content through a seperate > fetch over http/filesystem just show the result from the stored > content field? This all depends on the needs of your project. Its perfectly fine to store the text outside of the index, and that is the way it really has to be done for very large indexes where as few fields as possible are "stored". If you're also asking about Solr fetching the remote resource, that is a different story altogether, and no it does not do that. [though with the streaming capability you can feed in a document entirely from a URL, but I haven't experimented with that feature yet myself] Erik