Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-20 Thread Thomas Mueller
Hi, >1) Next to getSize() iterator we also added getTotalSize(). I don't >like the name because it is actually more something like: >getTotalSizeWithoutCheckingACLs(). Hm, wouldn't that be a security problem? Couldn't it be better (from a security perspective) if you can only get this number if

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-20 Thread Ard Schrijvers
Hello, Sorry to chime in so late in this thread, hope my remarks are still welcome. I did read the entire thread, and won't reply in line, but just try to recap and explain how we got around it in Hippo repository. The problem is obvious: *** How to get efficiently a correct count of total hits

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-19 Thread Thomas Mueller
Hi, >How do I know it's for sure more than 20 Because the PrefetchIterator will try to prefetch 20 nodes. >or whatever my page size happens to be? If you have a higher page size then you need getSize(max). >>Please note if you use offset and limit, getSize() will return the size >>of >> the re

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-19 Thread Alexander Klimetschek
On 14.09.2012, at 07:41, Thomas Mueller wrote: > If getSize() returns -1 then you know for sure there are more than 20 > results, so you know you have to display 'next page'. How do I know it's for sure more than 20, or whatever my page size happens to be? -1 simply means: "no idea". That the i

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, >The idea with the timeout sounds good, but what should we recommend an >application to do if getSize() takes too long and returns -1? > >Imagine while paging search results, the first page query is fast enough >(getSize() returns something), but the second is too long and now returns >-1: sho

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Alexander Klimetschek
On 13.09.2012, at 01:40, Thomas Mueller wrote: > Sounds good. I guess we could do both: always prefetch 20 nodes, and if > there is still time, fetch more up to 0.1 seconds or so, or at most 200 > nodes. I guess 200 should be enough to for a GUI to decide what to display > (10 pages for 20 nodes

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Jukka Zitting
Hi, On Thu, Sep 13, 2012 at 10:40 AM, Thomas Mueller wrote: > Sounds good. I guess we could do both: always prefetch 20 nodes, and if > there is still time, fetch more up to 0.1 seconds or so, or at most 200 > nodes. I guess 200 should be enough to for a GUI to decide what to display > (10 pages

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, Sounds good. I guess we could do both: always prefetch 20 nodes, and if there is still time, fetch more up to 0.1 seconds or so, or at most 200 nodes. I guess 200 should be enough to for a GUI to decide what to display (10 pages for 20 nodes per page). Regards, Thomas On 9/13/12 10:27 AM, "

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Jukka Zitting
Hi, On Thu, Sep 13, 2012 at 10:22 AM, Thomas Mueller wrote: > Yes. Let's discuss the value now! 1000 sounds OK in general, however there > is a potential performance problem. How about instead of a size limit we set a time limit on getSize()? I.e. we limit the method to use no more than the ment

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, >return the correct size if the result set has fewer than >something like 1000 entries. That should cover most practical cases Yes. Let's discuss the value now! 1000 sounds OK in general, however there is a potential performance problem. For Jackrabbit 2.x, if there are more than a few millio

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, >>(and getSize() would still return -1 I guess?) > >For backward compatibility I'd leave the behaviour "as unchanged as >possible". That is, return -1 if the size is not quickly available. Yes, that what I had in mind. The question is now, what exactly *is* quickly available. I suggest that t

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Jukka Zitting
Hi, On Thu, Sep 13, 2012 at 9:56 AM, Thomas Mueller wrote: > [...] When I opened OAK-300, it was for the JCR API implementation. [...] Ah, I see, sorry for confusing the matter. For the JCR API I'd simply start by ensuring that getSize() will always return the correct size if the result set has

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, The main question is still what the JCR API method getSize() should return. A new method getSize(int max) is nice, and of course we can do that. But I guess people will not use it in the near future because it's not part of the JCR API. Regards, Thomas On 9/12/12 8:06 PM, "Michael Marth

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Thomas Mueller
Hi, >There's no need for the Oak API to reflect JCR in all its details. Sure. First we need to define how the JCR API implementation is supposed to behave. Based on that we can then still decide what the Oak API should look like. The Oak API is (more or less) an implementation detail. Of course t

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-13 Thread Michael Dürig
On 12.9.12 19:06, Michael Marth wrote: As an alternative: we could use a separate method getSize(int max) which * if called with max == -1 returns the exact size if quickly available, * returns -1 otherwise, and * returns the exact size but not more then max when called with max >= 0. This al

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Alexander Klimetschek
On 12.09.2012, at 01:23, Michael Dürig wrote: > As an alternative: we could use a separate method getSize(int max) which > > * if called with max == -1 returns the exact size if quickly available, > * returns -1 otherwise, and > * returns the exact size but not more then max when called with max

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Alexander Klimetschek
On 12.09.2012, at 00:33, Thomas Mueller wrote: >> Display the actual number of search results to the user? > > Do you want to risk that the method getSize() takes 1.5 hours just to > display the actual number of search results to the user? Well, the application can decide if there are other opt

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Jukka Zitting
Hi, On Wed, Sep 12, 2012 at 10:37 AM, Thomas Mueller wrote: > Yes, that would work as well. I have added that to OAK-300. The > disadvantage is that it's a new API (not part of the JCR specification). > The two options I proposed don't require a new API. There's no need for the Oak API to reflec

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Michael Marth
> As an alternative: we could use a separate method getSize(int max) which > > * if called with max == -1 returns the exact size if quickly available, > * returns -1 otherwise, and > * returns the exact size but not more then max when called with max >= 0. > > This allows for estimates but leaves

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Thomas Mueller
Hi, >As an alternative: we could use a separate method getSize(int max) which Yes, that would work as well. I have added that to OAK-300. The disadvantage is that it's a new API (not part of the JCR specification). The two options I proposed don't require a new API. Regards, Thomas

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Michael Dürig
Hi, On 11.9.12 11:08, Jukka Zitting wrote: Instead I'd propose the following design: * The getSize() method always returns the size, by buffering all results in memory if necessary. * A separate hasSize() method can be used to check if the size is quickly available (i.e. if getSize() will co

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-12 Thread Thomas Mueller
Hi, >>>2. The client does need to know the size, so it calls getSize() and >> >> I currently can't come up with a convincing use case - what is your use >> case? > >Display the actual number of search results to the user? Do you want to risk that the method getSize() takes 1.5 hours just to disp

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-11 Thread Alexander Klimetschek
On 11.09.2012, at 07:18, Thomas Mueller wrote: >> 2. The client does need to know the size, so it calls getSize() and > > I currently can't come up with a convincing use case - what is your use > case? Display the actual number of search results to the user? >> has to iterate through all resul

Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-11 Thread Thomas Mueller
Hi, I'm worried about queries that return a huge number of rows, for example 1 million nodes. If getSize() is supposed to return the correct result, it could potentially take hours (when reading 100 nodes per second). I'm more in favour of returning -1 if there are more than just a few rows (for e

The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())

2012-09-11 Thread Jukka Zitting
Hi, [moving this to oak-dev@ for a broader discussion] On Tue, Sep 11, 2012 at 9:55 AM, Thomas Mueller (JIRA) wrote: > [...] For compatibility with Jackrabbit 2.0, and for ease of use, it would be > good to > have a clearly defined way to get the size of the result. [...] I've always found the