how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload
2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>: > I have 3 millon documents, each having 5000 chars. The xml file is > about 15GB. The binary file is also about 15GB. > > I was a bit surprised about this. It doesn't bother me much though. At > least it performs better. > > /Tim > > 2010/1/27 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >> if you write only a few docs you may not observe much difference in >> size. if you write large no:of docs you may observe a big difference. >> >> 2010/1/27 Tim Terlegård <tim.terleg...@gmail.com>: >>> I got the binary format to work perfectly now. Performance is better >>> than with xml. Thanks! >>> >>> Although, it doesn't look like a binary file is smaller in size than >>> an xml file? >>> >>> /Tim >>> >>> 2010/1/27 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >>>> 2010/1/21 Tim Terlegård <tim.terleg...@gmail.com>: >>>>> Yes, it worked! Thank you very much. But do I need to use curl or can >>>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't >>>>> use BinaryWriter then I don't know how to do this. >>>> if your data is serialized using JavaBinUpdateRequestCodec, you may >>>> POST it using curl. >>>> If you are writing directly , use CommonsHttpSolrServer >>>>> >>>>> /Tim >>>>> >>>>> 2010/1/20 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: >>>>>> 2010/1/20 Tim Terlegård <tim.terleg...@gmail.com>: >>>>>>>>>> BinaryRequestWriter does not read from a file and post it >>>>>>>>> >>>>>>>>> Is there any other way or is this use case not supported? I tried >>>>>>>>> this: >>>>>>>>> >>>>>>>>> $ curl <host>/solr/update/javabin -F stream.file=/tmp/data.bin >>>>>>>>> $ curl <host>/solr/update -F stream.body=' <commit />' >>>>>>>>> >>>>>>>>> Solr did read the file, because solr complained when the file wasn't >>>>>>>>> in the format the JavaBinUpdateRequestCodec expected. But no data is >>>>>>>>> added to the index for some reason. >>>>>>> >>>>>>>> how did you create the file /tmp/data.bin ? what is the format? >>>>>>> >>>>>>> I wrote this in the first email. It's in the javabin format (I think). >>>>>>> I did like this (groovy code): >>>>>>> >>>>>>> fieldId = new NamedList() >>>>>>> fieldId.add("name", "id") >>>>>>> fieldId.add("val", "9-0") >>>>>>> fieldId.add("boost", null) >>>>>>> fieldText = new NamedList() >>>>>>> fieldText.add("name", "text") >>>>>>> fieldText.add("val", "Some text") >>>>>>> fieldText.add("boost", null) >>>>>>> fieldNull = new NamedList() >>>>>>> fieldNull.add("boost", null) >>>>>>> doc = [fieldNull, fieldId, fieldText] >>>>>>> docs = [doc] >>>>>>> root = new NamedList() >>>>>>> root.add("docs", docs) >>>>>>> fos = new FileOutputStream("data.bin") >>>>>>> new JavaBinCodec().marshal(root, fos) >>>>>>> >>>>>>> /Tim >>>>>>> >>>>>> JavaBin is a format. >>>>>> use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest >>>>>> updateRequest, OutputStream os) >>>>>> >>>>>> The output of this can be posted to solr and it should work >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ----------------------------------------------------- >>>>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ----------------------------------------------------- >>>> Noble Paul | Systems Architect| AOL | http://aol.com >>>> >>> >> >> >> >> -- >> ----------------------------------------------------- >> Noble Paul | Systems Architect| AOL | http://aol.com >> > -- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com