Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Xintong Song
Hi Aeden, IIUC, the topic being read has 36 partitions means that your source task has a parallelism of 36. What's the parallelism of other tasks? Is the job taking use of all the 72 (18 TMs * 4 slots/TM) slots? I'm afraid currently there's no good way to guarantee subtasks of a task are spread

Got “pyflink.util.exceptions.TableException: findAndCreateTableSource failed.” when running PyFlink example

2021-03-14 Thread Yik San Chan
(The question is cross-posted on StackOverflow https://stackoverflow.com/questions/66632765/got-pyflink-util-exceptions-tableexception-findandcreatetablesource-failed-w ) I am running below PyFlink program (copied from

回复:1.12 yarn-per-job提交作业失败

2021-03-14 Thread smq
感谢解答 -- 原始邮件 -- 发件人: Paul Lam https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#yarn-application-queue

回复:1.12 yarn-per-job提交作业失败

2021-03-14 Thread smq
多谢回答 -- 原始邮件 -- 发件人: Paul Lam https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#yarn-application-queue

Re: 1.12 yarn-per-job提交作业失败

2021-03-14 Thread Paul Lam
从 Flink 1.12 开始,-yqu 等 YARN 相关的参数被移除了,可以使用 [1] 来代替。 [1 ]https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#yarn-application-queue Best, Paul Lam >

Upsert Kafka 的format 为什么要求是INSERT-ONLY的

2021-03-14 Thread 刘首维
Hi all, 最近在测试Upsert Kafka,在验证的过程中,发现Validation的时候要求format的changelog-mode 必须是insert-only的,请问这是什么原因呢。 如果不是的话,请直接指正我,谢谢。 Flink version 1.12.1

Re: Python DataStream API Questions -- Java/Scala Interoperability?

2021-03-14 Thread Shuiqiang Chen
Hi Kevin, Sorry for the late reply. Actually, you are able to pass arguments to the constructor of the Java object when instancing in Python. Basic data types (char/boolean/int/long/float/double/string, etc) can be directory passed while complex types (array/list/map/POJO, etc) must be converted

Flink sql 实现全局row_number()分组排序

2021-03-14 Thread Tian Hengyu
在做实时数仓的时候,有需求要使用flink sql实现全局的row_number(),请教下各位有啥方案吗? 目前想的是,将流进行row number处理后存储到hbase中,然后每次处理流数据都和hbase进行关联,row_number处理后将最新结果存入hbase中,即通过对hbase的实时读写实现全局row_number(). 请问以上方法可行不,,实时读hbase关联,然后在写入最新数据到hbase,效率会有问题吗,这样能满足实时的需求吗? -- Sent from: http://apache-flink.147419.n8.nabble.com/

??????????????????flink ????????????????????????kafka,mysql??

2021-03-14 Thread Asahi Lee
?? ??flink ??

Can I use PyFlink together with PyTorch/Tensorflow/PyTorch

2021-03-14 Thread Yik San Chan
Hi community, I am exploring PyFlink and I wonder if it is possible to use PyFlink together with all these ML libs that ML engineers normally use: PyTorch, Tensorflow, Scikit Learn, Xgboost, LightGBM, etc. According to this SO thread

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Yang Wang
If the HA related ConfigMaps still exists, then I am afraid the data located on the distributed storage should also exist. So I suggest to delete the HA related storage as well. Delete all the HA related data manually should help in your current situation. After then you could recover from the

Re: Extracting state keys for a very large RocksDB savepoint

2021-03-14 Thread Andrey Bulgakov
If anyone is interested, I reliazed that State Processor API was not the right tool for this since it spends a lot of time rebuilding RocksDB tables and then a lot of memory trying to read from it. All I really needed was operator keys. So I used SavepointLoader.loadSavepointMetadata to get

1.12 yarn-per-job提交作业失败

2021-03-14 Thread smq
我在用这个命令提交的时候会报 flink Application rejected by queue placement policy 这个应该是没有指定queue 但是我在命令中加了-yqu 这个参数,在web界面看quene 的时候,不是我指定的,而是default 。 另外,我用旧命令提交作业可以正常运行。请问有人碰到过这个问题吗。

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
All I can think is, that any update on a state key, which I do in my ProcessFunction, creates an update ( essentially an append on rocksdb ) which does render the previous value for the key, a tombstone , but that need not reflect on the count ( as double or triple counts ) atomically, thus the

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
The reason I ask is that I have a "Process Window Function" on that Session Window and I keep key scoped Global State. I maintain a TTL on that state ( that is outside the Window state ) that is roughly the current WM + lateness. I would imagine that keys for that custom state are *roughly*

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Chesnay Schepler
Is this a brand-new job, with the cluster having all 18 TMs at the time of submission? (or did you add more TMs while the job was running) On 3/12/2021 5:47 PM, Aeden Jameson wrote: Hi Matthias, Yes, all the task managers have the same hardware/memory configuration. Aeden On Fri, Mar 12,

Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
Hey folks, Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys". Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a

Re: Questions with State Processor Api

2021-03-14 Thread Maminspapin
Please, someone help me to understand is State Processor Api a solve or not for a task. I want to add to state 'Events' some target actions of user and remove them if cancel action is received. Every X period I need to check this state if it's time to make some communication with user. If yes,

Re: Handling Bounded Sources with KafkaSource

2021-03-14 Thread Maciej Obuchowski
Hey Rion, We solved this issue by using usual, unbounded streams, and using awaitility library to express conditions that would end the test - for example, having particular data in a table. IMO this type of testing has the advantage that you won't have divergent behavior from production as you

pyflink使用的一些疑问

2021-03-14 Thread qian he
你好, 最近项目想使用flink进行分布式计算,之前项目是Python的pandas项目,想尝试用pyflink进行项目改造,在使用dataset做批处理时,对于Java的版本没有相关map reduce函数,所以有以下疑问: 1.Python flink的SDK还没支持dataset吗? 2.是不是有其他替代方法? 3.如果还没支持,有计划支持的时间吗? 4.flink table为啥不支持map reduce操作? 5.我们项目使用dataframe来处理数据,能放到flink上做分布式运算吗?dataframe直接转化为table的方式,table不支持map