Airflow would be good but you will probably have to modify it to support
stream processing. Any DAG based manager would be useful in your case.
Luigi works too, but airflow has a sleeker UI.
You could also try streamsets. GCP provides composer which uses airflow and
dataflow for beam. AWS has Glue
Apache Airflow is a scheduling system that can help manage data pipelines.
I have seen Airflow is used to manage a few thousand hive/spark/presto
pipelines.
-Rui
On Fri, Feb 8, 2019 at 4:08 PM Sridevi Nookala <
snook...@parallelwireless.com> wrote:
> Hi,
>
>
> Our analytics app has many data
Hi,
Our analytics app has many data pipelines , some in python /java (using beam)
etc,
Any suggestions for a pipeline manager/scheduler framework that
manages/orchestrates these different pipelines.
thanks
Sri
Hi all!
Scio 0.7.1 has just been released. It includes a few new features and
improvements from 0.7.0:
https://github.com/spotify/scio/releases/tag/v0.7.1
*"Taxidea Taxus"*
Features
- New HashCode-based partitioning method for keyed SCollections (#1654
Hi folks,
I using query select * from VIEW_*1* after View_*2*, on database , and
next step is collect rows and export to CSV.
I actual in this point:
PCollection> view1 = p.apply(JdbcIO.>read()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(