I was actually looking at Spring Batch (and a couple of other solutions). I
don’t think Spring Batch could be of much help here.

My conclusion is similar to what you are saying - implementing lightweight
job coordinator is much easier.

Row-level locking works well when you are dealing with a simple queue table
- you do a pessimistic lock on N rows, process them and give a chance to
another host in the cluster. Unfortunately only one of my background jobs
is suitable for this type of refactoring.

Other jobs process records that shouldn’t be locked for a considerable
amount of time.

So currently I’m thinking of the following scenario:

- pass deployment ID via environment to all containers (ECS can do this
quite easily)
- use a simple table with records containing job name, current cluster
deployment ID and state
- first background executor that is able to lock an appropriate job row
starts working, the other(s) are cancelled



On Tue, Jun 27, 2017 at 10:16 PM, Dmitry Gusev <dmitry.gu...@gmail.com>
wrote:

> Hi Ilya,
>
> If you have Spring in your classpath you may look at Spring Batch.
>
> For our projects we've built something similar -- a custom jobs framework
> on top of PostgreSQL.
>
> The idea is that there a coordinator service (Tapestry service) that runs
> in a thread pool and constantly polls special DB tables for new records.
> For every new unit of work it creates instance of a worker (using
> `ObjectLocator.autobuild()`) that's capable of processing the job.
>
> The polling can be optimised well for performance using row-level locks &
> DB indexing.
>
> Coordinator runs in the same JVM as the rest of the app so there's no
> dedicated process.
> It integrates with tapestry's EntityManager so that you could create a job
> in transaction.
>
> When running in a cluster every JVM has its own coordinator -- this it how
> the jobs get distributed.
>
> But you're saying that row-level locking doesn't work for some of your
> use-cases, can you be more concrete here?
>
>
> On Tue, Jun 27, 2017 at 9:35 PM, Ilya Obshadko <ilya.obsha...@gmail.com>
> wrote:
>
> > I’ve recently expanded my Tapestry application to run multiple hosts.
> While
> > it’s quite OK for the web-faced part (sticky load balancer does most of
> the
> > job), it’s not very straightforward with background jobs.
> >
> > Some of them can be quite easily distributed using database row-level
> > locks, but this doesn’t work for every use case I have.
> >
> > Are there any suggestions about this? I’d prefer not to have a dedicated
> > process running background tasks. Ideally, I want to dynamically
> distribute
> > background jobs between hosts in cluster, based on current load status.
> >
> >
> > --
> > Ilya Obshadko
> >
>
>
>
> --
> Dmitry Gusev
>
> AnjLab Team
> http://anjlab.com
>



-- 
Ilya Obshadko

Reply via email to