[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-16 Thread Edward Hartwell Goose
Thanks very much all. We've had discussions with our client and have opted to make a short term fix to this problem by changing the requirements (the easy option ;) ). So, this means that we're not going to be trying to solve this problem over the next 1-2 weeks. But we will be looking into it aft

Re: [google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-15 Thread Stephen Johnson
Hi Ed, One issue you'll have is where you have a large time frame to query over and some cars that make infrequent journeys and cars that make very frequent journeys so you'll have to scan a lot of key entries to find all cars, so as a further optimization, if you are able to somehow classify a car

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-15 Thread Edward Hartwell Goose
The query window is between now and 2 months ago, with granularity of a minute. So 15/02/2011 09:28:59 is the same as 15/02/2011 09:28:01. The number of cars is likely to be very slow increasing, probably won't see big jumps at all. The majority will start at the beginning of our beta period. Once

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-15 Thread Edward Hartwell Goose
Copy and pasted from StackOverflow so everyone sees the answer: It's a really good idea, but: because we're looking for the latest journey in a unknown time frame, you can't store a how many days old it is (or equivalent), because if a car stopped making journeys for a week, the last journey would

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-15 Thread Edward Hartwell Goose
We're returning to a web browser. Specifically as JSON. I'll look into that video! On Feb 14, 8:35 pm, Calvin wrote: > Are you returning results to a web browser, or a specialized client?  One of > the Google I/O talks demonstrates spawning a crapload of tasks in parallel > to collect results, a

Re: [google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Robert Kluin
Hi Ed, I think Stephen's most recent idea (key name as date + car id) might work well. I have a suggestion that is somewhat similar to Nick's idea. How wide and variable is the query window? How variable are the occurrences of car journey data? Once a car starts having journeys, will it con

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread nickmilon
I have faced same kind of problem some time ago. I tried some of the solutions suggested here (in memory sort and filtering, encoding things into keys etc. and I have benchmarked those for both latency and cpu cycles using some test data around 100K entities) An other approach I have taken is encod

Re: [google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Stephen Johnson
Okay I think I got something that might work. Reverse the StartDate and CarId for the key from what I said above so the key would look like this: 2011:02:14:17:13:33:123 and the KEYS ONLY query then is: select __key__ where __key__ >= MAKEKEY(StartDate + CarId) && __key__ <= MAKEKEY(EndDate + CarI

Re: [google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Stephen Johnson
Or maybe it blocks on different result sets just not on getting the next fetch block?? Hmmm. Sounds like a tough problem. On Mon, Feb 14, 2011 at 2:09 PM, Stephen Johnson wrote: > Are you using .asList (which I think blocks like you describe), but I > thought asIterable or asIterator wasn't suppo

Re: [google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Stephen Johnson
Are you using .asList (which I think blocks like you describe), but I thought asIterable or asIterator wasn't suppose to. (if you're using Java). On Mon, Feb 14, 2011 at 12:38 PM, Edward Hartwell Goose wrote: > Hi Calvin & Stephen, > > Thanks for the ideas. > > Calvin: > We can't do the filtering

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Calvin
Are you returning results to a web browser, or a specialized client? One of the Google I/O talks demonstrates spawning a crapload of tasks in parallel to collect results, and in that demo most of the tasks had completed by the time the web page had refreshed (it might be the one someone linked

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Edward Hartwell Goose
Hi Calvin & Stephen, Thanks for the ideas. Calvin: We can't do the filtering in memory. We potentially have a car making a journey (the car analogy isn't so good...) making a journey every 3 seconds, and we could have up to 2,000 cars. We need to be able to look back up to 2 months, so it could

[google-appengine] Re: A difficult app engine optimisation problem - selecting distinct entities across a large table

2011-02-14 Thread Calvin
Can you do filtering in memory? This query would give you all of the journeys for a list of cars within the date range: carlist = ['123','333','543','753','963','1236'] start_date = datetime.datetime(2011, 1, 30) end_date = datetime(2011, 2, 10) journeys = Journey.all().filter('start >', start_d