Task Assignment

2020-04-22 Thread Navneeth Krishnan
Hi All, Is there a way for an upstream operator to know how the downstream operator tasks are assigned? Basically I want to group my messages to be processed on slots in the same node based on some key. Thanks

Re: Flink

2020-04-22 Thread Navneeth Krishnan
Thanks a lot Timo. I will take a look at it. But does flink automatically scale up and down at this point with native integration? Thanks On Tue, Apr 14, 2020 at 11:27 PM Timo Walther wrote: > Hi Navneeth, > > it might be also worth to look into Ververica Plaform for this. The > community

Re: batch range sort support

2020-04-22 Thread Jingsong Li
Hi, Benchao, Glad to see your requirement about range partition. I have a branch to support range partition: [1] Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range

batch range sort support

2020-04-22 Thread Benchao Li
Hi, Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data. And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. After enabling this config, I found that it's not implemented completely now. This

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
你的意思是,UberJar 不放在 lib 里,在用户程序里通过线程上下文 ClassLoader 能加载到 KafkaTableSourceSinkFactory 吗?(同时 class loading 为 child-first) 》》是的 On Thu, Apr 23, 2020 at 11:42 AM tison wrote: > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class > >这个能拿到 > > 你的意思是,UberJar 不放在 lib 里,在用户程序里通过线程上下文 ClassLoader 能加载到

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread tison
>》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class >这个能拿到 你的意思是,UberJar 不放在 lib 里,在用户程序里通过线程上下文 ClassLoader 能加载到 KafkaTableSourceSinkFactory 吗?(同时 class loading 为 child-first) 如果是这样,听起来 client 的 classloading 策略没啥问题,似乎是 SPI 加载那边的 ClassLoader 有问题。之前 FileSystem 相关解析就出过类似的 ClassLoader 的 BUG

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
我尝试进行了添加,程序依然无法运行,异常信息和上面一致,下面是我的shade配置: org.apache.maven.plugins maven-shade-plugin package shade com.akulaku.data.main.StreamMain

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread Jingsong Li
Hi, Flink的connector发现机制是通过java spi服务发现机制的,所以你的services下文件不包含Kafka相关的内容就不会加载到。 > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的 只是类文件是没有用的,没地方引用到它。 你试试[1]中的方法?添加combine.children [1] https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104 Best, Jingsong Lee On Thu, Apr

回复: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-22 Thread 1101300123
我给你一些数据和代码吧!和我真实场景错误一样 订单主表:orders 13点两条记录;order_state是状态 0取消 1待支付 {"order_no":"order1","order_state":1,"pay_time":"","create_time":"2020-04-01 13:00:00","update_time":"2020-04-01 13:00:00"} {"order_no":"order2","order_state":1,"pay_time":"","create_time":"2020-04-01

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
我这面采用shade打包方式进行了尝试,发现依然运行出错,运行错误日志与assembly打包产生的错误日志一致,就是上面提到的错误,而且shade和assembly打包产生的 META-INF/services/org.apache.flink.table.factories.TableFactory文件及里面的内容一致,而且两种打包方式运行时是都能加载到KafkaFactory类文件的,所以貌似不是打包导致的问题,而更像是bug 下面是我maven插件配置: org.apache.maven.plugins

Re: Flink 1.10.0 stop command

2020-04-22 Thread seeksst
Thanks a lot. I’m glad to hear that and looking forward to 1.10.1 it there more plan about stop command? it will replace cancel in future. Is the state.savepoints.dir required at the end? 原始邮件 发件人:tisonwander4...@gmail.com 收件人:Yun tangmyas...@live.com 抄送:seeksstseek...@163.com;

Re: RocksDB default logging configuration

2020-04-22 Thread Bajaj, Abhinav
Bumping this one again to catch some attention. From: "Bajaj, Abhinav" Date: Monday, April 20, 2020 at 3:23 PM To: "user@flink.apache.org" Subject: RocksDB default logging configuration Hi, Some of our teams ran into the disk space issues because of RocksDB default logging configuration -

flink couchbase sink

2020-04-22 Thread 令狐月弦
Greetings, We have been using couchbase for large scale data storage. I saw there is a flink sink for couchbase: https://github.com/isopropylcyanide/flink-couchbase-data-starter But it was for flink 1.6.0, and not got updated for some months. Do you know if there is another available flink

Re: Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Som Lima
I followed the link , may be same Suraj is advertising DataBricks webinar going on live right now. On Wed, 22 Apr 2020, 18:38 Gary Yao, wrote: > Hi Suraj, > > This question has been asked before: > > >

Re: Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Gary Yao
Hi Suraj, This question has been asked before: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-consumers-don-t-honor-group-id-td21054.html Best, Gary On Wed, Apr 22, 2020 at 6:08 PM Suraj Puvvada wrote: > > Hello, > > I have two JVMs that run

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-22 Thread Hailu, Andreas
Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see: 2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019 2020-04-22 13:25:52,567

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-22 Thread Yun Tang
Hi Shachar The basic rule to remove old data: 1. No two running Flink jobs use the same checkpoint directory (at the job-id level). 2. Files are not recorded in the latest checkpoint metadata are the candidate to remove. 3. If the checkpoint directory is still been written by Flink

Re: Flink 1.10.0 stop command

2020-04-22 Thread tison
To be precise, the cancel command would succeed on cluster side but the response *might* lost so that client throws with TimeoutException. If it is the case, this is the root which will be fixed in 1.10.1. Best, tison. tison 于2020年4月23日周四 上午1:20写道: > 'flink cancel' broken because of >

Re: Flink 1.10.0 stop command

2020-04-22 Thread tison
'flink cancel' broken because of https://issues.apache.org/jira/browse/FLINK-16626 Best, tison. Yun Tang 于2020年4月23日周四 上午1:18写道: > Hi > > I think you could still use ./bin/flink cancel to cancel the job. > What is the exception thrown? > > Best > Yun Tang > -- >

Re: Flink 1.10.0 stop command

2020-04-22 Thread Yun Tang
Hi I think you could still use ./bin/flink cancel to cancel the job. What is the exception thrown? Best Yun Tang From: seeksst Sent: Wednesday, April 22, 2020 18:17 To: user Subject: Flink 1.10.0 stop command Hi, When i test 1.10.0, i found i must to

Re: 每天0点数据写入Elasticsearch异常且kafka数据堆积

2020-04-22 Thread oliver yunchang
非常感谢Leonard Xu和zhisheng的回复 > es index 的 mapping 是否提前设置好了? 提前设置好了,提前创建索引的mapping如下: { "xxx-2020.04.23": { "mappings": { "doc": { "dynamic_templates": [ { "string_fields": { "match": "*", "match_mapping_type": "string",

Re: Changing number of partitions for a topic

2020-04-22 Thread Suraj Puvvada
Thanks, I'll check it out. On Mon, Apr 20, 2020 at 6:23 PM Benchao Li wrote: > Hi Suraj, > > There is a config option[1] to enable partition discovery, which is > disabled by default. > The community discussed to enable it by default[2], but only aims to the > new Source API. > > [1] >

Re: Two questions about Async

2020-04-22 Thread Gary Yao
> Bytes Sent but Records Sent is always 0 Sounds like a bug. However, I am unable to reproduce this using the AsyncIOExample [1]. Can you provide a minimal working example? > Is there an Async Sink? Or do I just rewrite my Sink as an AsyncFunction followed by a dummy sink? You will have to

Re: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-22 Thread Leonard Xu
赞详细的分析! 没能复现你说的问题,最后一步的分析应该有点小问题,我看下了jdbc mysql的实现 com/mysql/jdbc/PreparedStatement.java#executeBatchInternal() 1289行 是会判断batchedArgs数组的大小后会直接返回的,应该不会执行,你可以进一步调试确认下 ``` if (this.batchedArgs == null || this.batchedArgs.size() == 0) { return new long[0]; } ``` 祝好,

Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Suraj Puvvada
Hello, I have two JVMs that run LocalExecutionEnvorinments each using the same consumer group.id. i noticed that the consumers in each instance has all partitions assigned. I was expecting that the partitions will be split across consumers across the two JVMs Any help on what might be happening

回复: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-22 Thread 1101300123
好的,我先换了看看,之后建jira 在2020年4月22日 22:38,Jingsong Li 写道: Hi, - JDBC是upsert sink,所以你需要toUpsertStream,而不是toRetractStream,建议你用完整的DDL来插入mysql的表。 - 这个异常看起来是JDBC的bug,你可以建个JIRA来跟踪吗? Best, Jingsong Lee On Wed, Apr 22, 2020 at 9:58 PM 1101300123 wrote: 我在SQL关联后把结果写入mysql出现 No value specified

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
@Igal, this sounds more comprehensive (better) than just opening DataStreams: "basically exposing the core Flink job that is the heart of stateful functions. " Great! -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Igal Shilman
Hi Annemarie, There are plans to make stateful functions more easily embeddable within a Flink Job, perhaps skipping ingress/egress routing abstracting all together and basically exposing the core Flink job that is the heart of stateful functions. Although these plans are not concrete yet I

Re: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-22 Thread Jingsong Li
Hi, - JDBC是upsert sink,所以你需要toUpsertStream,而不是toRetractStream,建议你用完整的DDL来插入mysql的表。 - 这个异常看起来是JDBC的bug,你可以建个JIRA来跟踪吗? Best, Jingsong Lee On Wed, Apr 22, 2020 at 9:58 PM 1101300123 wrote: > > > 我在SQL关联后把结果写入mysql出现 No value specified for parameter 1错误? > 我的版本是1.10.0,代码如下 >

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
> Specify "query" and "provider" Yes, your proposal looks reasonable to me. Key can be "scan.***" like in [1]. > specify parameters Maybe we need add something like "scan.parametervalues.provider.type", it can be "bound, specify, custom": - when bound, using old partitionLowerBound and

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
Hi Annemarie, Unfortunately this is not possible at the moment, but DataStream as in/egress is in the plans as much as I know. -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com On Wed, Apr 22, 2020 at 10:26 AM

Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Annemarie Burger
I was wondering if it is possible to use a Stateful Function within a Flink pipeline. I know they work with different API's, so I was wondering if it is possible to have a DataStream as ingress for a Stateful Function. Some context: I'm working on a streaming graph analytics system, and want to

关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-22 Thread 1101300123
我在SQL关联后把结果写入mysql出现 No value specified for parameter 1错误? 我的版本是1.10.0,代码如下 JDBCUpsertTableSink build = JDBCUpsertTableSink.builder() .setTableSchema(results.getSchema()) .setOptions(JDBCOptions.builder() .setDBUrl("MultiQueries=true=true=UTF-8")

Re: About “distribute the function jar I submitted to all taskManager” question

2020-04-22 Thread Yang Wang
I think you could use Flink distributed cache to make the files available on all TaskManagers. For example, *env.registerCachedFile(cacheFilePath, "cacheFile", false);* and then, using the following code to get the registered file in the operator.

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Som Lima
Thanks for the link. On Wed, 22 Apr 2020, 12:19 Jark Wu, wrote: > Hi Som, > > You can have a look at ths documentation: > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#create-a-tableenvironment > It describe how to create differnet TableEnvironments based > on

Re: Change to StreamingFileSink in Flink 1.10

2020-04-22 Thread Averell
Thanks @Seth Wiesman and all. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

????: ?????? ??flink-connector-kafka??????????Subscribe????

2020-04-22 Thread dixingxin...@163.com
??kafka balance??PK?? kafka partition??offset??kafka balance dixingxin...@163.com i'mpossible ?? 2020-04-22

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread Jingsong Li
Hi, 如果org.apache.flink.table.factories.TableFactory里面没有KafkaTableSourceSinkFactory,那就是打包有问题。不清楚1.9的是怎么运行起来的,但是所有的jar的meta-inf-services文件都没有KafkaTableSourceSinkFactory,那也不应该能运行起来的。 推荐打包方式用shade,shade会merge meta-inf-services的文件的。 Best, Jingsong Lee On Wed, Apr 22, 2020 at 7:31 PM 宇张 wrote: > >

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
》也确认下org.apache.flink.table.factories.TableFactory的内容,里面有没有KafkaTableSourceSinkFactory 这个我看了一下我先前flink1.9的工程,应用程序Jar里面也是没有这个类的,但是程序运行加载是没问题的,这么对比貌似就不是maven打包的问题了。 On Wed, Apr 22, 2020 at 7:22 PM 宇张 wrote: > >

About ??distribute the function jar I submitted to all taskManager?? question

2020-04-22 Thread ??????
I want to make UDTF into a jar package??Then load the jar in the Main method of Job through dynamic loading and get the UDTF class. But in this way,flink does not automatically distribute Jar to tashManager??So it caused an error?? I find that FlinkClient provides the -C Bringing -C with

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
》也确认下org.apache.flink.table.factories.TableFactory的内容,里面有没有KafkaTableSourceSinkFactory 这个没有,只有org.apache.flink.formats.json.JsonRowFormatFactory 》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class 这个能拿到 这么看来 貌似是 mvn打包有问题: mvn clean package -DskipTests 依赖范围为默认 On Wed, Apr 22, 2020 at 7:05 PM

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Jark Wu
Hi Som, You can have a look at ths documentation: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#create-a-tableenvironment It describe how to create differnet TableEnvironments based on EnvironmentSettings. EnvironmentSettings is a setting to setup a table's

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread Jingsong Li
Hi, 也确认下org.apache.flink.table.factories.TableFactory的内容,里面有没有KafkaTableSourceSinkFactory > 这个指的是打印当前线程的classloader嘛Thread.currentThread().getContextClassLoader() 是的,拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class Best, Jingsong Lee On Wed, Apr 22, 2020 at 7:00 PM 宇张 wrote: > 看下你打包的

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, You can configure table name for JDBC source. So this table name can be a rich sql: "(SELECT public.A.x, public.B.y FROM public.A JOIN public.B on public.A.pk = public.B.fk )" So the final scan query statement will be: "select ... from (SELECT

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
看下你打包的 UberJar 里有没一个内容包括 1、下面这个文件是存在的 org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory 的文件 META-INF/services/org.apache.flink.table.factories.TableFactory 2、flink版本1.10,Standalone模式启动服务(start-cluster.sh),flink run运行(/software/flink-1.10.0/bin/flink run -c

Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Lasse Nedergaard
Hi Yun Thanks for looking into it and forwarded it to the right place. Med venlig hilsen / Best regards Lasse Nedergaard > Den 22. apr. 2020 kl. 11.06 skrev Yun Tang : > >  > Hi Lasse > > After debug locally, this should be a bug in Flink (even the latest version). > However, the bug

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Sorry Jingsong but I didn't understand your reply..Can you better explain the following sentences please? Probably I miss some Table API background here (I used only JDBOutputFormat). "We can not use a simple "scan.query.statement", because in JDBCTableSource, it also deal with project pushdown

Flink 1.10.0 stop command

2020-04-22 Thread seeksst
Hi, When i test 1.10.0, i found i must to set savepoint path otherwise i can’t stop the job. I confuse about this, beacuse as i know, savepoint offen large than checkpoint, so i usually resume job from checkpoint. Another problem is sometimes job throw exception and i can’t trigger a

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread Jingsong Li
Hi, 先确认下你的Jar包里有没有 meta-inf-services的文件?里面确定有Kafka? 如果有,再确认下"TableEnvironmentImpl.sqlQuery"调用时候的ThreadClassLoader? 因为现在默认是通过ThreadClassLoader来获取Factory的。 Best, Jingsong Lee On Wed, Apr 22, 2020 at 5:30 PM 宇张 wrote: > 我这面使用Standalone模式运行Flink任务,但是Uber >

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Thanks for the explanation. You can create JIRA for this. For "SELECT public.A.x, public.B.y FROM public.A JOIN public.B on public.A.pk = public.B.fk . " We can not use a simple "scan.query.statement", because in JDBCTableSource, it also deal with

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Som Lima
For sake of brevity the code example does not show the complete code for setting up the environment using EnvironmentSettings class EnvironmentSettings settings = EnvironmentSettings.newInstance()... As you can see comparatively the same protocol is not followed when showing setting up the

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread tison
虽然你放到 lib 下就能行了听起来是个 BUG,能不能说明一下你的 Flink 版本还有具体的启动命令。 FLINK-13749 可能在早期版本上没有,另外 Standalone 的类加载如果是 PerJob 有更改过。 Best, tison. tison 于2020年4月22日周三 下午5:48写道: > 看下你打包的 UberJar 里有没一个内容包括 > > org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory > > 的文件 > >

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread tison
看下你打包的 UberJar 里有没一个内容包括 org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory 的文件 META-INF/services/org.apache.flink.table.factories.TableFactory Best, tison. 宇张 于2020年4月22日周三 下午5:30写道: > 我这面使用Standalone模式运行Flink任务,但是Uber >

Re: flink1.10关于jar包冲突问题

2020-04-22 Thread 宇张
好的,接下来工程中我会把 不需要的传递依赖都应该 exclude 掉,而遇到的jar包冲突问题会进行记录,感谢解疑 On Wed, Apr 22, 2020 at 2:16 PM tison wrote: > 能具体看一下报错吗?一般来说 Flink 自己需要的依赖都会 shaded 起来,不需要的传递依赖都应该 exclude 掉。暴露成 API > 的类别一般需要封装或者使用稳定的接口。 > > 这可能是一个工程上的问题,你可以具体罗列一下遇到的 JAR 包冲突问题,看一下怎么解。 > > Best, > tison. > > > 宇张 于2020年4月22日周三

关于Flink1.10 Standalone 模式任务提交

2020-04-22 Thread 宇张
我这面使用Standalone模式运行Flink任务,但是Uber Jar里面的TableSourceFactory不能被加载,即使设置了classloader.resolve-order: child-first,只有放在lib目录才能加载得到,我看发布文档跟改了类加载策略,但是我不知道为什么Uber Jar里面的Factory不能被加载 Flink Client respects Classloading Policy (FLINK-13749 )

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Because in my use case the parallelism was not based on a range on keys/numbers but on a range of dates, so I needed a custom Parameter Provider. For what regards pushdown I don't know how Flink/Blink currently works..for example, let's say I have a Postgres catalog containing 2 tables (public.A

Re: Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-22 Thread Jingsong Li
Hi, Sorry for the mistake, [1] is related, but this bug has been fixed totally in [2], so the safe version should be 1.9.3+ and 1.10.1+, so there is no safe released version now. 1.10.1 will been released very soon. [1]https://issues.apache.org/jira/browse/FLINK-13702

?????? ??flink-connector-kafka??????????Subscribe????

2020-04-22 Thread i'mpossible
Hi?? ??kafka??Flink??kafka??rebalance??

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, You are right about the lower and upper, it is a must to parallelize the fetch of the data. And filter pushdown is used to filter more data at JDBC server. Yes, we can provide "scan.query.statement" and "scan.parameter.values.provider.class" for jdbc connector. But maybe we need be careful

Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Yun Tang
Hi Lasse After debug locally, this should be a bug in Flink (even the latest version). However, the bug should be caused in network stack with which I am not very familiar and not so easy to find root cause directly. After discussion with our network guys in Flink, we decide to first create

?????? ????0??????????Elasticsearch??????kafka????????

2020-04-22 Thread Oliver
Mapping??template "xxx-2020.04.23": { "mappings": { "doc": { "dynamic_templates": [ { "string_fields": { "match": "*", "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ], "properties": {

Re: Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-22 Thread Jingsong Li
Hi, Just like Jark said, it may be FLINK-13702[1]. Has been fixed in 1.9.2 and later versions. > Can it be a thread-safe problem or something else? Yes, it is a thread-safe problem with lazy materialization. [1]https://issues.apache.org/jira/browse/FLINK-13702 Best, Jingsong Lee On Tue, Apr

Re: 每天0点数据写入Elasticsearch异常且kafka数据堆积

2020-04-22 Thread zhisheng
hi, es index 的 mapping 是否提前设置好了? 我看到异常 : > failed to process cluster event (put-mapping) within 30s 像是自动建 mapping 超时了 Leonard Xu 于2020年4月22日周三 下午4:41写道: > Hi, > > 提前创建的索引的shard配置是怎么样的?集群的reallocation、relocation配置怎样的? > 可以从这方面找思路排查下看看 > > 祝好, > Leonard Xu > > > > > 在 2020年4月22日,16:10,Oliver

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Maybe I am wrong but support pushdown for JDBC is one thing (that is probably useful) while parameters providers are required if you want to parallelize the fetch of the data. You are not mandated to use NumericBetweenParametersProvider, you can use the ParametersProvider you prefer, depending on

Re: 关于StreamingFileSink

2020-04-22 Thread Jingsong Li
Hi, 按我的理解:.part-4-13.inprogressx/part-4-14.inprogressx 就是残留文件了,因为它所在checkpoint并没有finish,所以它不会被读到,也不会影响作业的运行,也不会继续改变了。 Best, Jingsong Lee On Tue, Apr 21, 2020 at 4:38 PM Leonard Xu wrote: > Hello,图挂了,可以搞个图床了挂链接到邮件列表。。。 > 另外问下为什么不从最新的cp开始恢复作业呢?这样我理解会有脏数据吧。 > > > 在

Re: 每天0点数据写入Elasticsearch异常且kafka数据堆积

2020-04-22 Thread Leonard Xu
Hi, 提前创建的索引的shard配置是怎么样的?集群的reallocation、relocation配置怎样的? 可以从这方面找思路排查下看看 祝好, Leonard Xu > 在 2020年4月22日,16:10,Oliver 写道: > > hi, > > > 我有一个任务是使用flink将kafka数据写入ES,纯ETL过程, > 现在遇到的问题是:每天0点之后数据写入ES异常,同时监控发现kafka消息开始堆积,重启任务后,kafka消息堆积现象逐渐恢复,如果不重启则堆积问题一直存在。 > > > 想咨询下这种问题应该怎么样排查和处理? > > >

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Marta Paes Moreira
Hi, Lucas. There was a lot of refactoring in the Table API / SQL in the last release, so the user experience is not ideal at the moment — sorry for that. You can try using the DDL syntax to create your table, as shown in [1,2]. I'm CC'ing Timo and Jark, who should be able to help you further.

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, Now in JDBCTableSource.getInputFormat, It's written explicitly: WHERE XXX BETWEEN ? AND ?. So we must use `NumericBetweenParametersProvider`. I don't think this is a good and long-term solution. I think we should support filter push-down for JDBCTableSource, so in this way, we can write the

????0??????????Elasticsearch??????kafka????????

2020-04-22 Thread Oliver
hi?? ??flink??kafkaESETL?? ??0??ES,kafka??kafka ?? flink??1.10 ES??6.x

Re: json中date类型解析失败

2020-04-22 Thread tison
应该是有内置的 UDF FROM_UNIXTIME 可以用的 Best, tison. Leonard Xu 于2020年4月22日周三 下午1:15写道: > Hi > 报错是因为'format.ignore-parse-errors' > 参数是在社区最新的版本才支持的,FLINK-16725在1.11应该也会修复,如果需要使用的话可以等1.11发布后使用或者自己编译master分支, > 即使有了这个参数你的问题也无法解决,对你的case每行记录都会解析错误所以会过滤掉所有数据。 >

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-22 Thread Chesnay Schepler
Which Flink version are you using? Have you checked the history server logs after enabling debug logging? On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote: Hi, I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I

Re: flink1.10关于jar包冲突问题

2020-04-22 Thread tison
能具体看一下报错吗?一般来说 Flink 自己需要的依赖都会 shaded 起来,不需要的传递依赖都应该 exclude 掉。暴露成 API 的类别一般需要封装或者使用稳定的接口。 这可能是一个工程上的问题,你可以具体罗列一下遇到的 JAR 包冲突问题,看一下怎么解。 Best, tison. 宇张 于2020年4月22日周三 上午11:52写道: > 在使用Flink1.10时,遇到最多的问题就是jar包冲突问题,okio这个包flink-parent引用的就有四个版本,还有一些没办法< > exclusions>的包,请问社区有没有优化jar包冲突的提议。 >

Re: 关于Flink SQL UDAF中调用createAccumulator的调用问题

2020-04-22 Thread Benchao Li
Hi 首维, 非常开心我的回答能对你有所帮助。 1, 我感觉你这个想法其实就是把state backend 设置为filesystem的表现。因为在state backend是filesystem的时候,state是放到内存的,也就是更新state是不需要序列化和反序列化的,性能相对要高一些。如果使用rocksdb的话,state就是放到rocksdb里了。我理解这里之所以是这样设计,是因为某些state本身会比较大,内存不一定放得下,此时如果用rocksdb作为state backend,就不会占用heap的内存。 2, 其实这里的销毁的意思就是clear