So I whipped up a quick SolrJ client and ran it against the document that I referenced earlier. When I retrieve the doc and just print its field/value pairs to stdout it ends like this:
http://brockwine.com/images/output1.png It appears to be some kind of garbage characters. -Rupert On Tue, Aug 25, 2009 at 12:19 PM, Uri Boness<ubon...@gmail.com> wrote: > Hi, > > This is a very strange behavior and the fact that it is cause by one > specific field, again, leads me to believe it's still a data issue. Did you > try using SolrJ to query the data as well? If the same thing happens when > using the binary protocol, then it's probably not a data issue. On the other > hand, if it works fine, then at least you can inspect the data to see where > things go wrong. Sorry for insisting on that, but I cannot think of anything > else that can cause this problem. > > If anyone else have a better idea, I'm actually very curious to hear about > it. > > Uri > > Rupert Fiasco wrote: >> >> The text file at: >> >> http://brockwine.com/solr.txt >> >> Represents one of these truncated responses (this one in XML). It >> starts out great, then look at the bottom, boom, game over. :) >> >> I found this document by first running our bigger search which breaks >> and then zeroing in a specific broken document by using the rows/start >> parameters. But there are any unknown number of these "broken" >> documents - a lot I presume. >> >> -Rupert >> >> On Tue, Aug 25, 2009 at 9:40 AM, Avlesh Singh<avl...@gmail.com> wrote: >> >>> >>> Can you copy-paste the source data indexed in this field which causes the >>> error? >>> >>> Cheers >>> Avlesh >>> >>> On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco <rufia...@gmail.com> >>> wrote: >>> >>> >>>> >>>> Using wt=json also yields an invalid document. So after more >>>> investigation it appears that I can always "break" the response by >>>> pulling back a specific field via the "fl" parameter. If I leave off a >>>> field then the response is valid, if I include it then Solr yields an >>>> invalid document - a truncated document. This happens in any response >>>> format (xml, json, ruby). >>>> >>>> I am using the SolrJ client to add documents to in my index. My field >>>> is a normal "text" field type and the text itself is the first 1000 >>>> characters of an article. >>>> >>>> >>>>> >>>>> It can very well be an issue with the data itself. For example, if the >>>>> >>>> >>>> data >>>> >>>>> >>>>> contains un-escaped characters which invalidates the response >>>>> >>>> >>>> When I look at the document in using wt=xml then all XML entities are >>>> escaped. When I look at it under wt=ruby then all single quotes are >>>> escaped, same for json, so it appears that all escaping it taking >>>> place. The core problem seems to be that the document is just >>>> truncated - it just plain end of files. Jetty's log says its sending >>>> back an HTTP 200 so all is well. >>>> >>>> Any ideas on how I can dig deeper? >>>> >>>> Thanks >>>> -Rupert >>>> >>>> >>>> On Mon, Aug 24, 2009 at 4:31 PM, Uri Boness<ubon...@gmail.com> wrote: >>>> >>>>> >>>>> It can very well be an issue with the data itself. For example, if the >>>>> >>>> >>>> data >>>> >>>>> >>>>> contains un-escaped characters which invalidates the response. I don't >>>>> >>>> >>>> know >>>> >>>>> >>>>> much about ruby, but what do you get with wt=json? >>>>> >>>>> Rupert Fiasco wrote: >>>>> >>>>>> >>>>>> I am seeing our responses getting truncated if and only if I search on >>>>>> our main text field. >>>>>> >>>>>> E.g. I just do some basic like >>>>>> >>>>>> title_t:arthritis >>>>>> >>>>>> Then I get a valid document back. But if I add in our larger text >>>>>> field: >>>>>> >>>>>> title_t:arthritis OR text_t:arthritis >>>>>> >>>>>> then the resultant document is NOT valid XML (if using wt=xml) or Ruby >>>>>> (using wt=ruby). If I run these through curl on the command its >>>>>> truncated and if I run the search through the web-based admin panel >>>>>> then I get an XML parse error. >>>>>> >>>>>> This appears to have just started recently and the only thing we have >>>>>> done is change our indexer from a PHP one to a Java one, but >>>>>> functionally they are identical. >>>>>> >>>>>> Any thoughts? Thanks in advance. >>>>>> >>>>>> - Rupert >>>>>> >>>>>> >>>>>> >> >> >