Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread Jing Zhang
Hi, Lincoln Thanks a lot for the feedback. > Regarding the hint name ‘USE_HASH’, could we consider more candidates? Things are a little different from RDBMS in the distributed world, and we also aim to solve the data skew problem, so all these incoming hints names should be considered together.

[jira] [Created] (FLINK-25474) Idea Scala plugin can not compile RexExplainUtil

2021-12-28 Thread godfrey he (Jira)
godfrey he created FLINK-25474: -- Summary: Idea Scala plugin can not compile RexExplainUtil Key: FLINK-25474 URL: https://issues.apache.org/jira/browse/FLINK-25474 Project: Flink Issue Type:

Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #2

2021-12-28 Thread Dong Lin
+1 (non-binding) - Verified that the checksums and GPG files match the corresponding release files - Verified that the source distributions do not contain any binaries - Built the source distribution with Maven to ensure all source files have Apache headers - Verified that all POM files point to

Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

2021-12-28 Thread Jingsong Li
Thanks Till for your suggestions. Personally, I like flink-warehouse, this is what we want to convey to the user, but it indicates a bit too much scope. How about just calling it flink-store? Simply to convey an impression: this is flink's store project, providing a built-in store for the flink

Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread Jing Zhang
Hi Yuan and Lincoln, thanks a lot for the attention. I would answer the email one by one. To Yuan > How shall we deal with CDC data? If there is CDC data in the pipeline, IMHO, shuffle by join key will cause CDC data disorder. Will it be better to use primary key in this case? Good question. The

[jira] [Created] (FLINK-25473) Azure pipeline failed due to stopped hearing from agent Azure Pipelines 11

2021-12-28 Thread Yun Gao (Jira)
Yun Gao created FLINK-25473: --- Summary: Azure pipeline failed due to stopped hearing from agent Azure Pipelines 11 Key: FLINK-25473 URL: https://issues.apache.org/jira/browse/FLINK-25473 Project: Flink

[jira] [Created] (FLINK-25472) Update to Log4j 2.17.1

2021-12-28 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-25472: -- Summary: Update to Log4j 2.17.1 Key: FLINK-25472 URL: https://issues.apache.org/jira/browse/FLINK-25472 Project: Flink Issue Type: Technical Debt

[jira] [Created] (FLINK-25471) wrong result if table toDataStream then keyey by sum

2021-12-28 Thread zhangzh (Jira)
zhangzh created FLINK-25471: --- Summary: wrong result if table toDataStream then keyey by sum Key: FLINK-25471 URL: https://issues.apache.org/jira/browse/FLINK-25471 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-25470) Add/Expose/differentiate metrics of checkpoint size between changelog size vs materialization size

2021-12-28 Thread Yuan Mei (Jira)
Yuan Mei created FLINK-25470: Summary: Add/Expose/differentiate metrics of checkpoint size between changelog size vs materialization size Key: FLINK-25470 URL: https://issues.apache.org/jira/browse/FLINK-25470

Re: [DISCUSS] FLIP-203: Incremental savepoints

2021-12-28 Thread Yu Li
Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below are my two cents: 1. How about adding a "Term Definition" section and clarify what "native format" (the "native" data persistence format of the current state backend) and "canonical format" (the "uniform" format that supports

[DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2021-12-28 Thread Xingbo Huang
Hi everyone, I would like to start a discussion thread on "Support PyFlink Runtime Execution in Thread Mode" We have provided PyFlink Runtime framework to support Python user-defined functions since Flink 1.10. The PyFlink Runtime framework is called Process Mode, which depends on an

Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread Lincoln Lee
Hi Jing, Thanks for bringing up this discussion! Agree that this join hints should benefit both bounded and unbounded cases as Martin mentioned. I also agree that implementing the query hint is the right way for a more general purpose since the dynamic table options has a limited scope.

Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread zst...@163.com
Hi Jing, Thanks very much for your FLIP. I have some points: - How shall we deal with CDC data? If there is CDC data in the pipeline, IMHO, shuffle by join key will cause CDC data disorder. Will it be better to use primary key in this case? - If the shuffle keys can be customized when

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2021-12-28 Thread Nicholas Jiang
Hi David, Thanks for your feedback of the FLIP. I addressed the comments above and share the thoughts about the question mentioned: *About how this will work with the OperatorCoordinator for re-processing of the historical data using the OperatorCoordinator?* OperatorCoordinator will

[jira] [Created] (FLINK-25469) FlinkKafkaProducerITCase.testScaleUpAfterScalingDown fails on AZP

2021-12-28 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-25469: - Summary: FlinkKafkaProducerITCase.testScaleUpAfterScalingDown fails on AZP Key: FLINK-25469 URL: https://issues.apache.org/jira/browse/FLINK-25469 Project: Flink

Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread Jing Zhang
Hi Martijn, Thanks a lot for your attention. I'm sorry I didn't explain the motivation clearly. I would like to explain it in detail, and then give response on your questions. A lookup join is typically used to enrich a table with data that is queried from an external system. Many Lookup table

[jira] [Created] (FLINK-25468) Local recovery fails if local state storage and RocksDB working directory are not on the same volume

2021-12-28 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-25468: - Summary: Local recovery fails if local state storage and RocksDB working directory are not on the same volume Key: FLINK-25468 URL:

[DISCUSS] FLIP-201: Persist local state in working directory

2021-12-28 Thread Till Rohrmann
Hi everyone, I would like to start a discussion about using the working directory to persist local state for faster recovery (FLIP-201) [1]. Persisting the local state will be beneficial if a crashed process is restarted with the same working directory. In this case, Flink does not have to

Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-28 Thread Martijn Visser
Hi Jing, Thanks a lot for the explanation and the FLIP. I definitely learned something when reading more about `use_hash`. My interpretation would be that the primary benefit of a hash lookup join would be improved performance by allowing the user to explicitly optimise the planner. I have a

[VOTE] Apache Flink ML Release 2.0.0, release candidate #2

2021-12-28 Thread Yun Gao
Hi everyone, Please review and vote on the release candidate #2 for the version 2.0.0 of Apache Flink ML, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Testing Guideline** You can find here [1] a page in the project wiki on

[jira] [Created] (FLINK-25467) Job failed during initialization of JobManager

2021-12-28 Thread tim.yuan (Jira)
tim.yuan created FLINK-25467: Summary: Job failed during initialization of JobManager Key: FLINK-25467 URL: https://issues.apache.org/jira/browse/FLINK-25467 Project: Flink Issue Type: Bug

Re: Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #1

2021-12-28 Thread Yun Gao
Ah, sorry for that I used the wrong key... I'll cancel this candidate and initiate a new release candidate... --Original Mail -- Sender:Dong Lin Send Date:Tue Dec 28 16:12:29 2021 Recipients:Yun Gao CC:dev Subject:Re: [VOTE] Apache Flink ML Release 2.0.0,

Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

2021-12-28 Thread Till Rohrmann
Hi Jingsong, I think that developing flink-dynamic-storage as a separate sub project is a very good idea since it allows us to move a lot faster and decouple releases from Flink. Hence big +1. Do we want to name it flink-dynamic-storage or shall we use a more descriptive name? dynamic-storage

Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

2021-12-28 Thread Martijn Visser
Hi Jingsong, That sounds promising! +1 from my side to continue development under flink-dynamic-storage as a Flink subproject. I think having a more in-depth interface will benefit everyone. Best regards, Martijn On Tue, 28 Dec 2021 at 04:23, Jingsong Li wrote: > Hi all, > > After some

Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #1

2021-12-28 Thread Dong Lin
Thank you Yun for releasing Flink ML! I downloaded the artifacts from [1], installed the keys from [2] and then tried "gpg --verify apache-flink-ml-2.0.0.tar.gz.asc apache-flink-ml-2.0.0.tar.gz". It seems that the signature could not be verified due to "No public key". [1]