Re: [Neo4j] REST results pagination

Rick Bullotta Thu, 21 Apr 2011 07:21:19 -0700

Fwiw, we use an "idiot resistant" (no such thing as "idiot proof") approach 
that clamps the number of returned items on the server side by default. We 
allow the user to explicitly request to do something foolish and ask for more 
data, but it requires a conscious effort.



----- Reply message -----
From: "Jacob Hansson" <ja...@voltvoodoo.com>
Date: Thu, Apr 21, 2011 10:06 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" <user@lists.neo4j.org>

On Thu, Apr 21, 2011 at 2:59 PM, Rick Bullotta
<rick.bullo...@thingworx.com>wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>

I disagree. They have improved significantly, for sure, but that is no
reason to download massive amounts of data that will never be used.


>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>

I agree with the suckiness of paging and the awesomeness of filtering - but
what do you do when the users filter returns 40 million results? You somehow
have to tell the user that "damn, that filter, it returned forty freaking
million results, you need to refine your search buddy".

The way the user expects that to happen is through presenting a paged,
infinite scrolled or similar interface, where she can see how many results
where returned and act on that feedback.


> 3) machine users could care less about paging
>
>
Agreed, streaming is a much better way for machines to talk about data that
doesn't fit in memory.


> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>

Not necessarily true. You need all the data that you want to visualize, but
that is not necessarily all the data the user has asked for. You can be
clever about the visualization to keep it uncluttered, and "paging"-like
behaviours may be a way to do that.


>
> Discuss and debate please!
>
> Rick
>
>
>
> ----- Reply message -----
> From: "Craig Taverner" <cr...@amanzi.com>
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" <user@lists.neo4j.org>
>
> >
> > I assume this:
> >    Traverser x = Traversal.description().traverse( someNode );
> >    x.nodes();
> >    x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> for providing sorted results fast and with low memory footprint where
> possible, based on the existance or non-existance of the necessary indices.
> Then all the developer needs to do to make things really fast is put in the
> right index. For some data, that would be lucene and for others it would be
> a graph index. If we get to this point, I think we will have closed a key
> usability gap with relational databases.
>
> There are cases where you need to add this sort of meta data to your domain
> > model, where the sorting logic is too complex, and you see that in
> > relational dbs as well, where people create lookup tables for various
> > things. There are for sure valid uses for that too, but the generic
> > approach
> > I believe covers the *vast* majority of the common use cases.
> >
>
> Perhaps. But I'm not sure the two extremes are as lop-sided as you think. I
> think large data users are very interested in Neo4j.
>
> I agree, this is important. I'd like to change "the need for pagination on
> > very large result sets" to "the ability to return very large result sets
> > over the wire". That opens up the debate to solutions like http
> streaming,
> > which do not have the problems that come with keeping state on the server
> > between calls.
> >
>
> I think there are two separate, but related, problems to solve. One is the
> transfer of large result-sets over the wire for people that need that. The
> other is efficiently providing the small page of results from a large
> dataset. Most of our discussion has so far focused on the latter.
>
> For the former, I did a bit of experimenting last year and was able to
> compact my JSON by several times by moving all meta-data into a header
> section. This works very well for data that has a repeating structure, for
> example a large number of records with similar schema. I know schema is a
> nasty word in the nosql world, but it is certainly common for data to have
> a
> repeating pattern, especially when dealing with very large numbers. Then
> you
> find that something like CSV is actually an efficient format, since the
> bulk
> of the text is only the data. We did this in JSON by simply specifying a
> meta-data element (with the headers) and then a contents section with a
> long
> array of values. It worked very well indeed, even though we have a
> half-dozen different 'schema's in the document, it was still much more
> efficient than specifying the meaning of every field as usually done in
> JSON
> or XML.
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



--
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] REST results pagination

Reply via email to