Hi all!
I've been doing a couple of little PR's here and there on the airflow project,
and I'd like to propose some slightly bigger changes.
The summary is that I would like to propose using Swagger to define the airflow
API and as a wrapper for the endpoint handlers, passing off most of the he
@Max
What I've been thinking about recently is creating an abstraction for the
serialization process. I think in general it makes sense to have for e.g.
dynamic DAGs to have a service that periodically serializes DAGs and
uploads them to e.g. a database via some new Airflow DAG Uploader Service.
Th
In my experience, there are two major wins to chase here. Neither are
simple, nor is this the first discussion around them. In the past there was
an attempt to use Pickling to handle these challenges.
The first is that with dynamic dags (they are evaluated as python code
after all), it is possible
The airlow cli can function as this rest client and does not need to be at the
same server as where airflow is running.
Direct DB access is bad from a separation of concerns perspective as you can
change task statusses, insert arbitrary things etc.
B.
Sent from my iPhone
> On 1 Feb 2019, at
Hi ,
I am seaching how to substitute Apache Sqoop
I am analyzing SparkJDBCOperator, but i dont understand how i have to use .
It a version of SparkSubmit operator, for include as conection JDBC
conection ?
I need to include Spark code?
Any example?
Thanks, I am very lost
Regards,
Iván Robl
Actually the opposite. If we're going to have a REST API, users should
interact with it over http(s), using a rest client. If a user has SSH
access to the server running airflow I don't see the security concern of
having the CLI access the DB in the same manner the REST API does.
In my diagram all
The rescheduling sensors are available in Airflow 1.10.2 and can be used by
setting argument mode=“reschedule” in your sensor.
Cheers, Bas
> On 1 Feb 2019, at 10:57, raman gupta wrote:
>
> Thanks Fokko,
> We are exploring K8executor, But the number of such long running jobs would
> be in 1000(
Thanks Fokko,
We are exploring K8executor, But the number of such long running jobs would
be in 1000(s). So having some non-blocking mechanism would help.
Rescheduling in sensors sounds good. Will explore it. Is it available in
Airflow 1.10.1.
Thanks,
Raman Gupta
On Fri, Feb 1, 2019 at 2:57 PM Dr
Hi Raman,
Right now this is the way to go.
Recently there has been a change to the sensor, in which it will be
rescheduled instead of blocking. So this is something that you might want
to explore. Otherwise, you might want to choose a more scalable executer
such as the Celery or Kubernetes execut