Re: selecting data for computation

2010-07-01 Thread Tim Gustafson
A general-case good design for this is to have status fields, as suggested, and who-owns-this-row fields. Set the rows initially to un-owned (NULL), and status 'new' or similar. The get_lock approach was ultimately successful, and I think better than using a status field as it is entirely

Re: selecting data for computation

2010-06-28 Thread Baron Schwartz
Tim, What you have described is really a queue, although you are not seeing it from that angle. This is a very common pattern in this kind of usage. It's also tricky to make it correct (as you have discovered), and both performance and scalability are problems in most implementations. A

Re: selecting data for computation

2010-06-25 Thread Dan Nelson
In the last episode (Jun 24), Tim Gustafson said: Hrmm, something didn't work quite right. Here's the actual query: select x, y, zoom from map_images where image_data_size = 0 and get_lock(concat(x, ',', y, ',', zoom), 0) = 1 order by zoom , sqrt(pow(x, 2) + pow(y, 2)) limit 1 I've added

Re: selecting data for computation

2010-06-25 Thread Joerg Bruehe
Hi Tim, all! Tim Gustafson wrote: [[...]] There is not an index on the work_data column as it's a longblob and I was under the impression that indexing a longblob field wasn't helpful. Maybe I should add a work_data_size field as an integer, index that, and search for records where

Re: selecting data for computation

2010-06-25 Thread Tim Gustafson
That shouldn't happen, since get_lock() will only succeed for one mysql connection at a time. I figured it out. Because there was an order by clause, the get_lock function was being called for each row in the result set, and since each call to get_lock releases any previous locks you had,

selecting data for computation

2010-06-24 Thread Tim Gustafson
I'm working on a compute job that queries mySQL for work packets, processes the work packet and then updates the row in mySQL with the results of the job. Currently, from each compute instance I'm doing something like: select * from work_packets where work_data is null order by rand() limit 1;

Re: selecting data for computation

2010-06-24 Thread Dan Nelson
In the last episode (Jun 24), Tim Gustafson said: I'm working on a compute job that queries mySQL for work packets, processes the work packet and then updates the row in mySQL with the results of the job. Currently, from each compute instance I'm doing something like: select * from

Re: selecting data for computation

2010-06-24 Thread Tim Gustafson
select * from work_packets where work_data is null and get_lock(id,0)=1 limit 1; Sorry, I should have said that my example was pseudo-code that I was typing from memory. The query you've suggested is certainly an improvement - I don't know why I though that the get_lock needed to be in a

Re: selecting data for computation

2010-06-24 Thread Tim Gustafson
Hrmm, something didn't work quite right. Here's the actual query: select x, y, zoom from map_images where image_data_size = 0 and get_lock(concat(x, ',', y, ',', zoom), 0) = 1 order by zoom , sqrt(pow(x, 2) + pow(y, 2)) limit 1 I've added the image_data_size column and indexed it, and set its