Re: Fastest way to use solrj

Noble Paul നോബിള്‍ नोब्ळ् Wed, 27 Jan 2010 00:24:49 -0800

how many fields are there in each doc? the binary format just reduces
overhead. it does not touch/compress the payload


2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>:
> I have 3 millon documents, each having 5000 chars. The xml file is
> about 15GB. The binary file is also about 15GB.
>
> I was a bit surprised about this. It doesn't bother me much though. At
> least it performs better.
>
> /Tim
>
> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>> if you write only a few docs you may not observe much difference in
>> size. if you write large no:of docs you may observe a big difference.
>>
>> 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>:
>>> I got the binary format to work perfectly now. Performance is better
>>> than with xml. Thanks!
>>>
>>> Although, it doesn't look like a binary file is smaller in size than
>>> an xml file?
>>>
>>> /Tim
>>>
>>> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>>> 2010/1/21 Tim Terlegård <tim.terleg...@gmail.com>:
>>>>> Yes, it worked! Thank you very much. But do I need to use curl or can
>>>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
>>>>> use BinaryWriter then I don't know how to do this.
>>>> if your data is serialized using JavaBinUpdateRequestCodec, you may
>>>> POST it using curl.
>>>> If you are writing directly , use CommonsHttpSolrServer
>>>>>
>>>>> /Tim
>>>>>
>>>>> 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
>>>>>> 2010/1/20 Tim Terlegård <tim.terleg...@gmail.com>:
>>>>>>>>>> BinaryRequestWriter does not read from a file and post it
>>>>>>>>>
>>>>>>>>> Is there any other way or is this use case not supported? I tried 
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> $ curl <host>/solr/update/javabin -F stream.file=/tmp/data.bin
>>>>>>>>> $ curl <host>/solr/update -F stream.body=' <commit />'
>>>>>>>>>
>>>>>>>>> Solr did read the file, because solr complained when the file wasn't
>>>>>>>>> in the format the JavaBinUpdateRequestCodec expected. But no data is
>>>>>>>>> added to the index for some reason.
>>>>>>>
>>>>>>>> how did you create the file /tmp/data.bin ? what is the format?
>>>>>>>
>>>>>>> I wrote this in the first email. It's in the javabin format (I think).
>>>>>>> I did like this (groovy code):
>>>>>>>
>>>>>>>   fieldId = new NamedList()
>>>>>>>   fieldId.add("name", "id")
>>>>>>>   fieldId.add("val", "9-0")
>>>>>>>   fieldId.add("boost", null)
>>>>>>>   fieldText = new NamedList()
>>>>>>>   fieldText.add("name", "text")
>>>>>>>   fieldText.add("val", "Some text")
>>>>>>>   fieldText.add("boost", null)
>>>>>>>   fieldNull = new NamedList()
>>>>>>>   fieldNull.add("boost", null)
>>>>>>>   doc = [fieldNull, fieldId, fieldText]
>>>>>>>   docs = [doc]
>>>>>>>   root = new NamedList()
>>>>>>>   root.add("docs", docs)
>>>>>>>   fos = new FileOutputStream("data.bin")
>>>>>>>   new JavaBinCodec().marshal(root, fos)
>>>>>>>
>>>>>>> /Tim
>>>>>>>
>>>>>> JavaBin is a format.
>>>>>> use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
>>>>>> updateRequest, OutputStream os)
>>>>>>
>>>>>> The output of this can be posted to solr and it should work
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

Reply via email to