I doubt you'll find any significant difference in indexing speed. But the
post.jar file is really intended as a demo program to quickly get the
examples working. It was never intended to be a production-ready
program. I'd think about using something like SolrJ etc. to index the docs.

And I'm assuming your documents are in the approved Solr format, somthing
like
<add>
<doc>
  <field name="myfield">value for field</field>
    .
    .
</doc>
<doc>
   .
   .
   .
</doc>
</add>

solr will not index arbitrary XML. If you're trying to do this, you'll
need to transform
your arbitrary XML into the above format, consider SolrJ or something
like that in
this case.

Best
Erick

On Wed, Jun 20, 2012 at 10:40 AM, Bruno Mannina <bmann...@free.fr> wrote:
> Little question please:
>
> I have directories with around 30 files of 40Mo with around 17 000 doc for
> each files.
>
> is it better to index:
> - file by file with java -jar 1.xml, java -jar 2.xml, etc....
> or
> - all at the same time with java -jar *.xml
>
> All files are verified, so my question is just concerning speed
>
> Thx for your comments,
> Bruno
>
>
>
> Le 20/06/2012 05:44, Lance Norskog a écrit :
>>
>> M. Della Bitta is right- we're not talking about post.jar, but starting
>> Solr:
>>
>>
>> java -xMx300m -jar start.jar
>>
>> On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
>> <erickerick...@gmail.com>  wrote:
>>>
>>> Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's
>>> seems
>>> like it defaults to Integer.MAX_VALUE, so you're fine....
>>>
>>> And it's all deprecated in 4.x, will be gone
>>>
>>> Best
>>> Erick
>>>
>>> On Tue, Jun 19, 2012 at 7:07 AM, Bruno Mannina<bmann...@free.fr>  wrote:
>>>>
>>>> Actually -Xmx512m and no effect
>>>>
>>>> Concerning  maxFieldLength, no problem it's commented
>>>>
>>>> Le 19/06/2012 13:02, Erick Erickson a écrit :
>>>>
>>>>> Then try -Xmx600M
>>>>> next try -Xmx900M
>>>>>
>>>>>
>>>>> etc. The idea is to bump things on separate runs.
>>>>>
>>>>> But be a little cautious here. Look in your solrconfig.xml file, you'll
>>>>> see
>>>>> a commented-out line
>>>>> <maxFieldLength>10000</maxFieldLength>
>>>>>
>>>>> The default behavior for Solr/Lucene is to index the first 10,000
>>>>> tokens
>>>>> (not characters, think of tokens as words for not) in each
>>>>> document and throw the rest on the floor. At the sizes you're talking
>>>>> about,
>>>>> that's probably not a problem, but do be aware of it.
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Tue, Jun 19, 2012 at 5:44 AM, Bruno Mannina<bmann...@free.fr>
>>>>>  wrote:
>>>>>>
>>>>>> Like that?
>>>>>>
>>>>>> java -Xmx300m -jar post.jar myfile.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 19/06/2012 11:11, Lance Norskog a écrit :
>>>>>>
>>>>>>> Ah! Java memory size is a java command line option:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html
>>>>>>>
>>>>>>> You would try increasing the memory size in stages up to maybe 300m.
>>>>>>>
>>>>>>> On Tue, Jun 19, 2012 at 2:04 AM, Bruno Mannina<bmann...@free.fr>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 19/06/2012 10:51, Lance Norskog a écrit :
>>>>>>>>
>>>>>>>>> 675 doc/s is respectable for that server. You might move the memory
>>>>>>>>> allocated to Java up and down- there is a balance between amount of
>>>>>>>>> memory in Java v.s. the OS disk buffer.
>>>>>>>>
>>>>>>>>
>>>>>>>> How can I do that ? is there an option during my command line or in
>>>>>>>> a
>>>>>>>> config
>>>>>>>> file?
>>>>>>>> sorry for this newbie question :(
>>>>>>>>
>>>>>>>>
>>>>>>>>> And, of course, use the latest trunk.
>>>>>>>>
>>>>>>>> Solr 3.6
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Tue, Jun 19, 2012 at 12:10 AM, Bruno Mannina<bmann...@free.fr>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Correction: file size is 40 Mo !!!
>>>>>>>>>>
>>>>>>>>>> Le 19/06/2012 09:09, Bruno Mannina a écrit :
>>>>>>>>>>
>>>>>>>>>>> Dear All,
>>>>>>>>>>>
>>>>>>>>>>> I would like to know if the indexation speed is right.
>>>>>>>>>>>
>>>>>>>>>>> I have a 40Go file size with around 27 000 docs inside.
>>>>>>>>>>> I index around 20 fields,
>>>>>>>>>>>
>>>>>>>>>>> My (old) test server is a DualCore 3.06GHz Intel Xeon with only
>>>>>>>>>>> 1Go
>>>>>>>>>>> Ram
>>>>>>>>>>>
>>>>>>>>>>> The file takes 40 seconds with the command line:
>>>>>>>>>>> java -jar post.jar myfile.xml
>>>>>>>>>>>
>>>>>>>>>>> Could I increase this speed or reduce this time?
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>> PS: Newbie user
>>>>>>>>>>>
>>>>>>>>>>>
>>
>>
>

Reply via email to