Thank you for clarifying.

The logging line is this:

logger.info("db retrieve time=" + (System.currentTimeMillis() - start) + ", 
query=" +
         rb.getQuery().toString().replaceAll("\\p{Cntrl}", "_") + ", indexIds=" 
+ getIndexIds(rb));

(The replaceAll call is used to clean out the binary.)

The a complete log looks like this:  (I removed some values and inserted Zs.)

2013-03-19 01:36:58,648 INFO  
[org.apache.solr.handler.component.DatabaseComponent] (http-8080-19) [] [] [] 
[] [] ip-10-212-91-229/10.212.91.229   db retrieve time=53, 
query=+(+(givenname:ZZZZ^1.8 | givenname_standard:ZZZZ^1.08 | 
givenname:?^-3.6179998 | givenname:Z^0.17999999) +(surname:ZZZZ^1.8 | 
surname_standard:ZZZZ^1.08) +(birth_year:1855^0.495 | birth_year:1856^0.495 | 
(-marriage_year:[1850 TO 1854]^1.0E-4 -death_year:[1850 TO 1854]^1.0E-4 
-residence_year:[1850 TO 1854]^1.0E-4 -other_year:[1850 TO 1854]^1.0E-4 
+est_birth_year_range:[180 TO 185]^-1.005)) +((+(birth_place:amherst,1929953 | 
birth_place_ancestors:amherst,1929953^0.99 | birth_place:amherst,6279984 | 
birth_place_ancestors:amherst,6279984^0.99 | birth_place:novascotia,1927164^0.7 
| birth_place_ancestors:novascotia,1927164^0.69 | 
birth_place:cumberland,1929953^0.7 | 
birth_place_ancestors:cumberland,1929953^0.69 | birth_place:canada,-1^0.2)) | 
(+birth_place:?^-2.01 +((record_place:amherst,1929953^0.7 | 
record_place_ancestors:amherst,1929953^0.69299996 | 
record_place:amherst,6279984^0.7 | 
record_place_ancestors:amherst,6279984^0.69299996 | 
record_place:novascotia,1927164^0.48999998 | 
record_place_ancestors:novascotia,1927164^0.48299998 | 
record_place:cumberland,1929953^0.48999998 | 
record_place_ancestors:cumberland,1929953^0.48299998 | 
record_place:canada,-1^0.14))))) is_principal:T^0.01 
(collection_id:`__z_[^0.027 collection_id:`__nB+^0.026 
collection_id:`__Zl_^0.025 collection_id:`__i49^0.024 
collection_id:`__Pq%^0.023 collection_id:`__VCS^0.022 
collection_id:`__WbH^0.021 collection_id:`__Yu_^0.02 collection_id:`__UF&^0.019 
collection_id:`__I2g^0.018 collection_id:`__PP_^0.016999999 
collection_id:`__Ysv^0.015999999 collection_id:`__Oe_^0.014999999 
collection_id:`__Ysw^0.013999999 collection_id:`__Wi_^0.012999998 
collection_id:`__fLi^0.011999998 collection_id:`__XRk^0.010999998 
collection_id:`__Uz[^0.009999998 collection_id:`__SE_^0.008999998 
collection_id:`__Ysx^0.007999998 collection_id:`__Ysh^0.0069999974 
collection_id:`__fLh^0.0059999973 collection_id:`__f _^0.004999997 
collection_id:`__`^C^0.003999997 collection_id:`__fKM^0.002999997 
collection_id:`__Szo^0.001999997 collection_id:`__f ]^9.99997E-4) 
record_type:`_____^0.11 record_country:Canada^0.1 record_subcountry:Canada,Nova 
Scotia^0.1, indexIds=5649621248770, 5649707485955, 5649774056450, 
5650368372995, 5650800358658, 40314148353, 17914147586, 77849158944, 
77849158945, 77849158946, 77849158947, 77849158948, 77849158949, 77849158950, 
77849158951, 77849158952, 77849158953, 77849158954, 77849158955, 77849158956  


We have seen these types of issues (though the opposite) when querying with 
non-encoded ints.  

When preparing the query we have to encode the collection IDs like this:

        Query q = new TermQuery(new Term(SolrTag.COLLECTION_ID.getName(), 
type.readableToIndexed(Integer.toString(collectionId))));

So perhaps I am using the wrong term when I used encoded, maybe it should have 
been Indexed?  But that seems to have other meanings would be potentially more 
confusing.  These are the Terms that are being printed above that remain in the 
non-readable format when toString is called.  (Perhaps we should be using 
something other than readableToIndexed?)


Thanks!


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, March 18, 2013 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Query.toString printing binary in the output...

If you simply attach &debug=all to your URL, you should see the query come back 
in your response, XML, JSON, whatever. If that also shows bizarre characters, 
then that will give you some idea whether it's in Solr or not.

But you haven't given us much info about how/where you call toString. You may 
be getting into trouble with character sets (although I'd find that quite odd, 
but its a possibility.

What I'm really finding confusing is that you're mentioning Term alongside
query.toString() (at least that's what I think you're saying), which has 
nothing at all to do with Terms, it's just the query string passed in. So I'm 
really puzzled as to what you're doing to get this kind of output, it almost 
looks like you're trying to print out the _results_ of a query, not the query.

So some clarification would be helpful...

Best
Erick


On Mon, Mar 18, 2013 at 12:01 PM, Andrew Lundgren <lundg...@familysearch.org
> wrote:

> I am sorry, I don't follow what you mean by debug=query.  Can you 
> elaborate on that a bit?
>
> Thanks!
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, March 17, 2013 8:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Query.toString printing binary in the output...
>
> Hmmm, without looking at the code, somehow when you specify 
> debug=query you get readable results, maybe that code would be a place to 
> start?
>
> And are you looking for the parsed output? Otherwise you could print 
> original query.
>
> Not much help....
> Erick
>
>
> On Fri, Mar 15, 2013 at 3:24 PM, Andrew Lundgren
> <lundg...@familysearch.org>wrote:
>
> > We use the toString call on the query in our logs.  For some numeric 
> > types, the encoded form of the number is being printed instead of 
> > the readable form.
> >
> > This makes tail and some other tools very unhappy...
> >
> > Here is a partial example of a query.toString() that would have had 
> > binary in it.  As a short term work around I replaced all 
> > non-printable characters in the string with an '_'.
> >
> > (collection_id:`__z_[^0.027 collection_id:`__nB+^0.026
> > collection_id:`__Zl_^0.025 collection_id:`__i49^0.024
> > collection_id:`__Pq%^0.023 collection_id:`__VCS^0.022
> > collection_id:`__WbH^0.021 collection_id:`__Yu_^0.02
> > collection_id:`__UF&^0.019 collection_id:`__I2g^0.018
> > collection_id:`__PP_^0.016999999 collection_id:`__Ysv^0.015999999
> > collection_id:`__Oe_^0.014999999 collection_id:`__Ysw^0.013999999
> > collection_id:`__Wi_^0.012999998 collection_id:`__fLi^0.011999998
> > collection_id:`__XRk^0.010999998 collection_id:`__Uz[^0.009999998
> > collection_id:`__SE_^0.008999998 collection_id:`__Ysx^0.007999998
> > collection_id:`__Ysh^0.0069999974 collection_id:`__fLh^0.0059999973 
> > collection_id:`__f _^0.004999997 collection_id:`__`^C^0.003999997
> > collection_id:`__fKM^0.002999997 collection_id:`__Szo^0.001999997 
> > collection_id:`__f ]^9.99997E-4)
> >
> > But, as you can see, that is less than useful...
> >
> > I spent some time looking at the source and found that Term does not 
> > contain the type of the embedded data.  Any possible solutions to 
> > this short of walking the query and getting the type of each field 
> > from the schema and creating my own print function?
> >
> > Thanks!
> >
> > --
> > Andrew
> >
> >
> >
> >
> >  NOTICE: This email message is for the sole use of the intended
> > recipient(s) and may contain confidential and privileged information.
> > Any unauthorized review, use, disclosure or distribution is 
> > prohibited. If you are not the intended recipient, please contact 
> > the sender by reply email and destroy all copies of the original message.
> >
> >
>
>
>  NOTICE: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. 
> Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient, please contact the 
> sender by reply email and destroy all copies of the original message.
>
>


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.

Reply via email to