Hi,

Very thanks for initiating the discussion!

Also +1 to drop the current DataSet based Gelly library so that we could 
finally drop the 
legacy DataSet API. 

For whether to keep the graph computing ability, from my side graph query / 
graph computing and
chaining them with the preprocessing pipeline should be an actually existent 
requirements. 
Currently we also already have the basis for a graph computing library on 
DataStream API
with the new iteration library[1], thus it would be already feasible to have a 
stream / batch
unified graph computing library on top of the DataStream API. And it would 
indeed be most suitable as 
a separate ecosystem project. 

Best,
Yun

[1] https://cwiki.apache.org/confluence/x/hAEBCw


 ------------------Original Mail ------------------
Sender:Martijn Visser <mart...@ververica.com>
Send Date:Wed Jan 5 02:58:53 2022
Recipients:Zhipeng Zhang <zhangzhipe...@gmail.com>
CC:David Anderson <dander...@apache.org>, Till Rohrmann <trohrm...@apache.org>, 
dev <d...@flink.apache.org>, User <user@flink.apache.org>
Subject:Re: [DISCUSS] Drop Gelly

Hi Zhipeng,

I think that we're seeing more code being externalised, for example with the 
Flink Remote Shuffle service [1] and the ongoing discussion on the external 
connector repository [2], it makes sense to go for your second option. Maybe it 
fits under Flink Extended [3]. 

The main question becomes who can contribute and maintain this library. Another 
(intermediate) solution might also be to find someone who can migrate/move the 
current Gelly codebase to use Flink's DataStream API in batch mode, so it 
wouldn't be using the DataSet API anymore. This has recently also happened with 
the State Processor API [4]. 

Best regards,

Martijn

[1] https://github.com/flink-extended/flink-remote-shuffle
[2] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
[3] https://github.com/flink-extended/
[4] https://issues.apache.org/jira/browse/FLINK-24912
On Tue, 4 Jan 2022 at 14:01, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote:

Hi Martijin,

Thanks for the feedback. I am not proposing  to bundle the new graph library 
with Alink. I am +1 for dropping the DataSet-based Gelly library, but we 
probably need a new graph library in Flink for the possible migration.

We haven't decided what to do yet and probably need more discussion. There are 
some possible solutions:
1. We include a new DataStream-based graph library in FlinkML[1], given that 
graphs and machine learning algorithms are more often used together [2][3][4]. 
To achieve this, we could reuse the `AlgoOperator` interface in FlinkML.
2. We include a new DataStream-based graph library as a separate module/repo. 
This is consistent with existing libraries like Spark [5].

What do you think?


[1] https://github.com/apache/flink-ml
[2] https://arxiv.org/abs/1403.6652
[3] https://arxiv.org/abs/1503.03578
[4] https://github.com/apache/spark

Best,
Zhipeng
Martijn Visser <mart...@ververica.com> 于2022年1月4日周二 15:27写道:

Hi Zhipeng,

Good that you've reached out, I wasn't aware that Gelly is being used in Alink. 
Are you proposing to write a new graph library as a successor of Gelly and 
bundle that with Alink? 

Best regards,

Martijn
On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang <zhangzhipe...@gmail.com> wrote:

Hi everyone,

Thanks for starting the discussion :)

We (Alink team [1]) are actually using part of the Gelly library to support 
graph algorithms (connected component, single source shortest path, etc.) for 
users in Alibaba Inc.

As DataSet API is going to be dropped, shall we also provide a new graph 
library based on DataStream runtime (similar as we did for machine learning)?

[1] https://github.com/Alibaba/alink
David Anderson <dander...@apache.org> 于2022年1月4日周二 00:01写道:

Most of the inquiries I've had about Gelly in recent memory have been from 
folks looking for a streaming solution, and it's only been a handful. 

+1 for dropping Gelly

David
On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann <trohrm...@apache.org> wrote:

I haven't seen any changes or requests to/for Gelly in ages. Hence, I would 
assume that it is not really used and can be removed.

+1 for dropping Gelly.

Cheers,
Till
On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser <mart...@ververica.com> wrote:

Hi everyone,

Flink is bundled with Gelly, a Graph API library [1]. This has been marked as 
approaching end-of-life for quite some time [2].

Gelly is built on top of Flink's DataSet API, which is deprecated and slowly 
being phased out [3]. It only works on batch jobs. Based on the activity in the 
Dev and User mailing lists, I don't see a lot of questions popping up regarding 
the usage of Gelly. Removing Gelly would reduce CI time and resources because 
we won't need to run tests for this anymore. 

I'm cross-posting this to the User mailing list to see if there are any users 
of Gelly at the moment. 

Let me know your thoughts.

Martijn Visser | Product Manager
mart...@ververica.com

[1] 
https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
[2] https://flink.apache.org/roadmap.html
[3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz


Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time



-- 
best,
Zhipeng



-- 
best,
Zhipeng

Reply via email to