Thanks Otis. I tweaked the Solr example app a little and then uploaded a
~55KB document to it a couple of thousand times (changing the ID each time).
The solr/data directory was 72MB on disc after adding the document 2000
times, so it seems that the index is growing by approximately 36KB for each
document. That seems reasonable.

I guess I need to do some research into expected data volumes now, and
limits on Lucene index size.

Cheers,
Phil


Otis Gospodnetic wrote:
> 
> Phil,
> 
> From what you described so far, I don't see any red flags.  I would pay
> attention to reading those timestamps (covered on the Wiki and ML
> archives), that's all.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: philmccarthy <philmccar...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, January 13, 2009 8:49:33 PM
>> Subject: Indexing the same data in many records
>> 
>> 
>> Hi,
>> 
>> I'd like to use Solr to index some webserver logs, in order to allow easy
>> ad-hoc querying and analysis. Each Solr Document will represent a single
>> request to the webserver, with fields for time, request URL, referring
>> URL
>> etc.
>> 
>> I'm also planning to fetch the page source of each referring URL, and add
>> that as an indexed field in the Solr document. The aim is to allow
>> queries
>> like "find hits to /xyz.html where the referring page contains the word
>> 'foobar'".
>> 
>> Since hundreds or even thousands of hits may all come from the same
>> referring page, would this approach be horribly inefficient? (Note the
>> page
>> source won't be stored in each Document, just indexed). Am I going to
>> dramatically increase the index size if I do this?
>> 
>> If so, is there a more elegant way to do what I want?
>> 
>> Many thanks,
>> Phil
>> 
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21468706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to