I changed the heap size (Xmx1582m was as high as I could go). The import is at 
about 5% now, and from that I now estimate about 13 hours. It's hard to say 
though.. it keeps going up little by little.

If I get approval to use Solr for this project, I'll have them install a 64bit 
jvm instead, but is there anything else I can do?

Devon Baumgarten
Application Developer

-----Original Message-----
From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
Sent: Wednesday, February 22, 2012 10:32 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unusually long data import time?

Oh sure! As best as I can, anyway.

I have not set the Java heap size, or really configured it at all. 

The server running both the SQL Server and Solr has:
* 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
* 64 GB RAM
* One Solr instance (no shards)

I'm not using faceting.
My schema has these fields:
  <field name="Id" type="string" indexed="true" stored="true" /> 
  <field name="RecordId" type="int" indexed="true" stored="true" /> 
  <field name="RecordType" type="string" indexed="true" stored="true" /> 
  <field name="Name" type="LikeText" indexed="true" stored="true" 
termVectors="true" /> 
  <field name="NameFuzzy" type="FuzzyText" indexed="true" stored="true" 
termVectors="true" /> 
  <copyField source="Name" dest="NameFuzzy" /> 
  <field name="NameType" type="string" indexed="true" stored="true" />

Custom types:

        PatternReplaceCharFilterFactory ("\W+" => "")
        StopFilterFactory (~40 words in stoplist)
        LengthFilterFactory (min:3, max:512)

        PatternReplaceCharFilterFactory ("\W+" => "")
        StopFilterFactory (~40 words in stoplist)
        LengthFilterFactory (min:3, max:512)

Devon Baumgarten

-----Original Message-----
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: Wednesday, February 22, 2012 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

Import times will depend on:
- hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
- Java configuration (heap size, etc)
- Lucene/Solr configuration (many ...)
- Index configuration - how many fields, indexed how; faceting, etc
- OS configuration (this usually to a lesser degree; _usually_)
- Network issues if non-local
- DB configuration (driver, etc)

If you can give more information about the above, people on this list
should be able to better indicate whether 18 hours sounds right for
your situation.

-Glen Newton

On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
<dbaumgar...@nationalcorp.com> wrote:
> Hello,
> Would it be unusual for an import of 160 million documents to take 18 hours?  
> Each document is less than 1kb and I have the DataImportHandler using the 
> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
> stored procedure that contains only a select from my target table.
> Is there any way I can speed this up? I saw recently someone on this list 
> suggested a new user could get all their Solr data imported in under an hour. 
> I sure hope that's true!
> Devon Baumgarten


Reply via email to