RE: storing the document URI in the index

Ard Schrijvers Tue, 12 Jun 2007 06:23:39 -0700

Hello Otis, 

thanks for the info. Would it a be an improvement to be able to specify in the 
schema.xml wether or not the URI should be stored or not in a field which name 
you can also specify in the schema? It might be very well possible that you do 
not "own" the xml documents you index over http, and at the same time, you do 
not want to store its contents in the index. Since at indexing time the uri is 
known, adding it to the index is trivial.


Regards Ard




You have to store the URI in a Field yourself.  That means you need to define 
that field in the schema and you have to set its value when adding documents.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Ard Schrijvers <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, June 12, 2007 9:02:25 AM
Subject: RE: storing the document URI in the index

Hello Erik, 

thanks for the fast answer (sry for my mail not indenting but must use webmail 
:-( ), but the problem I am facing is that I do not see solr storing the 
location of the documents it indexed. So, I need to store the location of a 
document in a field, but I do not see where solr would do this. Fetching the 
document will be done with the simple cocoon generator, so that is no problem, 
but of course, I need the url/uri to be in the index. I know I need it as a 
UN_TOKENIZED STORED field, but just see with LUKE that the location is not 
present in lucene index when solr "crawls" some directory with xml files,

Regards Ard Schrijvers


Yes.  Set the field to be store and non-indexed, field type "string"  
is what I use.

> Or is everybody used to storing the contents of a document in the  
> lucene index (doesn't this imply a much larger index though?), so  
> instead of retrieving the document's content through a seperate  
> fetch over http/filesystem just show the result from the stored  
> content field?

This all depends on the needs of your project.  Its perfectly fine to  
store the text outside of the index, and that is the way it really  
has to be done for very large indexes where as few fields as possible  
are "stored".

If you're also asking about Solr fetching the remote resource, that  
is a different story altogether, and no it does not do that.  [though  
with the streaming capability you can feed in a document entirely  
from a URL, but I haven't experimented with that feature yet myself]

    Erik

RE: storing the document URI in the index

Reply via email to