When you look in the logs to see why new instances were started, you should
see right before a warmup request a request that takes longer than the
minimum pending latency. Is that not the case? I too am trying to understand
how instances are started so I can avoid extra instance hours and this i
I have been playing around with the settings for a while now and have
come to the conclusion that enabling concurrent requests i.e. true does not mean that the scheduler chooses to
send multiple requests to active instances - it still starts more. I
am also confident that Min Pending Latency is not
At the moment I've found that the best way to keep it capped to 1 instance
is to ensure that the 1 instance you have does not die. The problem is when
there are 0 instances running, and 2 quick requests come in, the scheduler
will startup 2 instances to handle both requests. So what I did was
I have an app that has instances page looking like this:
http://i.imgur.com/YROrD.png
It's a very small app with billing disabled. It will not work within free
quota after the pricing change simply because the scheduler is no good.
I think one way to fix this would be to open-source the schedu