Re: EXTERNAL: Re: Should I use a Sink or Connector? Or Both?

2020-03-04 Thread Arvid Heise
Hi Fernando, How much data are you trying to write? If you just use single messages for testing, it could be that the default bulk settings are not working well. If so, could you please adjust the following settings and report back? public enum SinkOption { BULK_FLUSH_MAX_ACTIONS,

Re: Teradata as JDBC Connection

2020-03-04 Thread Arvid Heise
Hi Norm, the error message already points to the main issue: your property names are not correct. *Unsupported property keys: drivername update-mode password dburl username* You should use the builder to properly configure the sink [1]. [1]

Re: Rocksdb Serialization issue

2020-03-04 Thread Arvid Heise
Hi David, the obvious reason is that your state stored an enum value that is not present anymore. It tries to deserialize the 512. entry in your enum that is not available. However, since it's highly unlikely that you actually have that many enum values in the same enum class, we are actually

Re: How to use self defined json format when create table from kafka stream?

2020-03-04 Thread Jark Wu
Hi Lei, Currently, Flink SQL doesn't support to register a binlog format (i.e. just define "order_id" and "order_no", but the json schema has other binlog fields). This is exactly what we want to support in FLIP-105 [1] and FLIP-95. For now, if you want to consume such json data, you have to

Re: History server UI not working

2020-03-04 Thread Yang Wang
If all the rest api could be viewed successfully, then the reason may be js cache. You could try to force a refresh(e.g. Cmd+Shft+R for Mac). It solved my problem before. Best, Yang pwestermann 于2020年3月4日周三 下午8:40写道: > We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server

Re: Hive Source With Kerberos认证问题

2020-03-04 Thread Rui Li
能不能先用doAs的方式来试一下,比如注册HiveCatalog的部分在UserGroupInformation.getLoginUser().doAs()里做,排查下是不是HiveMetaStoreClient没有用上你登录用户的信息。 另外你的hive版本是2.1.1么?从stacktrace上来看跟2.1.1的代码对不上,比如 HiveMetaStoreClient.java的第562行:

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin
Hi Zhu Zhu and Abhinav, I am able to verify the recovery from checkpoint based on your suggestions, thanks a lot for the help! Eleanore On Wed, Mar 4, 2020 at 5:40 PM Bajaj, Abhinav wrote: > I implemented a custom function that throws up a runtime exception. > > > > You can extend from simpler

Re: Question about runtime filter

2020-03-04 Thread Jingsong Li
Great exploration. And thanks for your information. I believe you have a deep understanding of Flink's internal mechanism. Best, Jingsong Lee On Thu, Mar 5, 2020 at 12:09 PM faaron zheng wrote: > I finally got through the runtimefilter in 1.10, the reason why it didn't > call commit method is

Re: Question about runtime filter

2020-03-04 Thread faaron zheng
I finally got through the runtimefilter in 1.10, the reason why it didn't call commit method is in OperatorCodeGenerator. It should call endInput() method correctly in generateOneInputStreamOperator. A complete process of runtimefilter is: 1.Add insert and remove rule in batch rules. 2.Build side

How to use self defined json format when create table from kafka stream?

2020-03-04 Thread wangl...@geekplus.com.cn
I want to rigister a table from mysql binlog like this: tEnv.sqlUpdate("CREATE TABLE order(\n" + "order_id BIGINT,\n" + "order_no VARCHAR,\n" + ") WITH (\n" + "'connector.type' = 'kafka',\n" ... + "'update-mode' = 'append',\n" + "

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Xintong Song
Hi Abhinav, Do you mind sharing the complete 'jobmanager.log'? org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot > request, no ResourceManager connected. > Sometimes you see this log because the ResourceManager is not yet connect when the slot request arrives the

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Bajaj, Abhinav
I implemented a custom function that throws up a runtime exception. You can extend from simpler MapFunction or more complicated RichParallelSourceFunction depending on your use case. You can add logic to throw a runtime exception on a certain condition in the map or run method. .

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Zhu Zhu
Hi Eleanore, You can change your application tasks to throw exceptions in a certain frequency. Alternatively, if the application has external dependencies (e.g. source), you can trigger failures manually by manipulating the status of the external service (e.g. shutdown the source service, or

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Zhu Zhu
Hi Eleanore, You can change your application tasks to throw exceptions in a certain frequency. Alternatively, if the application has external dependencies (e.g. source), you can trigger failures manually by manipulating the status of the external service (e.g. shutdown the source service, or

Re: StreamingFileSink Not Flushing All Data

2020-03-04 Thread Austin Cawley-Edwards
Hey Kostas, We’re a little bit off from a 1.10 update but I can certainly see if that CompressWriterFactory might solve my use case for when we do. If there is anything I can do to help document that feature, please let me know. Thanks! Austin On Wed, Mar 4, 2020 at 4:58 AM Kostas Kloudas

How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin
Hi, I have a flink application and checkpoint is enabled, I am running locally using miniCluster. I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint? Thanks a lot! Eleanore

How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin
Hi, I have a flink application and checkpoint is enabled, I am running locally using miniCluster. I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint? Thanks a lot! Eleanore

Rocksdb Serialization issue

2020-03-04 Thread David Morin
Hello, I have this Exception in my datastream app and I can't find the root cause. I consume data from Kafka and it fails when I try to get a value from my MapState in RocksDB. It was working in previous release of my app but I can't find the cause of this error.

RE: Teradata as JDBC Connection

2020-03-04 Thread Norm Vilmer (Contractor)
Same error with this change: public class Teradata extends ConnectorDescriptor { /** * Constructs a {@link ConnectorDescriptor}. */ public Teradata() { super("jdbc", 1, false); } @Override protected Map toConnectorProperties() { Map map = new

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
While I setup to reproduce the issue with debug logs, I would like to share more information I noticed in INFO logs. Below is the sequence of events/exceptions I notice during the time zookeeper was disrupted. I apologize in advance as they are a bit verbose. * Zookeeper seems to be down

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
Thanks Xintong for pointing that out. I will dig deeper and get back with my findings. ~ Abhinav Bajaj From: Xintong Song Date: Tuesday, March 3, 2020 at 7:36 PM To: "Bajaj, Abhinav" Cc: "user@flink.apache.org" Subject: Re: JobMaster does not register with ResourceManager in high

Re: Unable to recover from savepoint and checkpoint

2020-03-04 Thread Puneet Kinra
I killed the task manager and job manager forcefully by the kill -9 command and while recovering I am checking the flag returned by the isRestored method in the Intializestate function. anyways I figured the issue and fixed it thanks for the support. On Tue, Mar 3, 2020 at 7:24 PM Gary Yao

Re: Flink's Either type information

2020-03-04 Thread Arvid Heise
Hi Jacopo, to prevent type erasure in Java, you need to create a sub-type that contains only reified types. Instead of using a generic type with bound variables in stream.process(new MyKeyedBroadcastProcessFunction()); you can use stream.process(new MyKeyedBroadcastProcessFunction()

Re: checkpoint _metadata file has >20x different in size among different check-points

2020-03-04 Thread Arvid Heise
Hi Yu, are you using incremental checkpoints [1]? If so, then the smaller checkpoints would be the deltas and the larger the complete state. [1] https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html On Wed, Mar 4, 2020 at 6:41 PM Yu Yang wrote: > Hi all, > > We have a

Re: Very large _metadata file

2020-03-04 Thread Jacob Sevart
Kostas and Gordon, Thanks for the suggestions! I'm on RocksDB. We don't have that setting configured so it should be at the default 1024b. This is the full "state.*" section showing in the JobManager UI. [image: Screen Shot 2020-03-04 at 9.56.20 AM.png] Jacob On Wed, Mar 4, 2020 at 2:45 AM

checkpoint _metadata file has >20x different in size among different check-points

2020-03-04 Thread Yu Yang
Hi all, We have a flink job that does check-pointing per 10 minutes. We noticed that for the check-points of this job, the _metadata file size can vary a lot. In some checkpoint, we observe that _metadata file size was >900MB, while in some other check-points of the same job, the _metadata file

Re: Building with Hadoop 3

2020-03-04 Thread Stephan Ewen
Have you tried to just export Hadoop 3's classpath to `HADOOP_CLASSPATH` and see if that works out of the box? If the main use case is HDFS access, then there is a fair chance it might just work, because Flink uses only a small subset of the Hadoop FS API which is stable between 2.x and 3.x, as

Teradata as JDBC Connection

2020-03-04 Thread Norm Vilmer (Contractor)
Using Flink 1.10 and coding in Java 11, is it possible use to write to Teradata in append mode? MySQL, PostgreSQL, and Derby are the only supported drivers listed. Thanks. https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#connectors I created the ConnectorDescriptor

CFP: Workshop on Large Scale RDF Analytics (LASCAR-20) at ESWC'20

2020-03-04 Thread Hajira Jabeen
** We apologize for cross-postings. We appreciate your great help in forwarding this CFP to your colleagues and friends.

RE: Flink's Either type information

2020-03-04 Thread jacopo.gobbi
Hi all, Yes my problem is that I do not create the function inline but create a function directly when creating the data stream job. My code (which I cannot share) is exactly like your example, Yun, are you aware if there is a way to prevent code erasure? Kind regards, Jacopo Gobbi From:

Re: EXTERNAL: Re: Should I use a Sink or Connector? Or Both?

2020-03-04 Thread Castro, Fernando C.
Thank you guys. So I have no idea of why data is not being pushed to Elasticsearch… ☹ My complete code is at https://stackoverflow.com/questions/60512064/flink-is-not-adding-any-data-to-elasticsearch-but-no-errors Btw, for some reason I still need to pass .documentType to the Elasticsearch

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi! Initially we were looking at 2) but 1) would be the best solution. I think both are would be very valuable. My only concern related to using the Schema Registry as a Catalog is the interaction with other Catalogs in the system. Maybe you are using a Hive catalog to track a bunch of tables,

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Jark Wu
Yes. From my perspective, deriving schema from schema registry is the most important use case of FLINK-16420. Some initial idea about this: 1) introduce a SchemaRegisteryCatalog to allow users run queries on existing topics without manual table definition. see FLINK-12256 2) provide a connector

History server UI not working

2020-03-04 Thread pwestermann
We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server UI now seems to be broken. It doesn't load and always just displays a blank screen. The individual endpoints (e.g. /jobs/overview) still work. Could this be an issue caused by the Angular update for the regular UI? --

Flink Serialization as stable (kafka) output format?

2020-03-04 Thread Theo Diefenthal
Hi, Without knowing too much about flink serialization, I know that Flinks states that it serializes POJOtypes much faster than even the fast Kryo for Java. I further know that it supports schema evolution in the same way as avro. In our project, we have a star architecture, where one flink

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi Jark, Thank you for the clarification this is exactly what I was looking for, especially for the second part regarding schema registry integration. This question came up as we were investigating how the schema registry integration should look like :) Cheers, Gyula On Wed, Mar 4, 2020 at

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Jark Wu
Hi Gyula, That's a good point and is on the roadmap. In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10. On the contrary, we are planning to derive table schema from format schema if it

Re: Very large _metadata file

2020-03-04 Thread Tzu-Li (Gordon) Tai
Hi Jacob, Apart from what Klou already mentioned, one slightly possible reason: If you are using the FsStateBackend, it is also possible that your state is small enough to be considered to be stored inline within the metadata file. That is governed by the "state.backend.fs.memory-threshold"

回复: Re: flink 1.8 内的StreamExecutionEnvironment 对于 FileInputFormat 多file path 不兼容问题咨询

2020-03-04 Thread 王智
我的需求是2,现在我使用的是execEnv.createInput(inputFormat()), 我先去试试 env.addSource(new InputFormatSourceFunction(..)...)。 多谢~ 原始邮件 发件人:"JingsongLee"< lzljs3620...@aliyun.com.INVALID ; 发件时间:2020/3/4 17:40 收件人:"user-zh"< user-zh@flink.apache.org ; 主题: Re: flink 1.8 内的StreamExecutionEnvironment 对于

Re: Question on the Kafka connector parameter "connector.properties.zookeeper.connect"

2020-03-04 Thread Jark Wu
Hi Weike, You are right. It is not needed since Kafka 0.9+. We already have an issue to make it optional. See https://issues.apache.org/jira/browse/FLINK-16125. We are planning to fix it in 1.10.1 too. Best, Jark On Wed, 4 Mar 2020 at 18:23, Weike Dong wrote: > Hi, > > > > Recently I have

Question on the Kafka connector parameter "connector.properties.zookeeper.connect"

2020-03-04 Thread Weike Dong
Hi, Recently I have found that in the Flink Kafka Connector, the parameter "connector.properties.zookeeper.connect" is made mandatory for users. Therefore without it, Flink would throw an exception saying "Caused by: org.apache.flink.table.api.ValidationException: Could not find required

CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi All! I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns. The format generally defines the available columns so maybe we could simply use them as is if we want. This would be very helpful when exploring

Re: Very large _metadata file

2020-03-04 Thread Kostas Kloudas
Hi Jacob, Could you specify which StateBackend you are using? The reason I am asking is that, from the documentation in [1]: "Note that if you use the MemoryStateBackend, metadata and savepoint state will be stored in the _metadata file. Since it is self-contained, you may move the file and

Re: StreamingFileSink Not Flushing All Data

2020-03-04 Thread Kostas Kloudas
Hi Austin, I will have a look at your repo. In the meantime, given that [1] is already merged in 1.10, would upgrading to 1.10 and using the newly introduced CompressWriterFactory be an option for you? It is unfortunate that this feature was not documented. Cheers, Kostas [1]

Re: flink 1.8 内的StreamExecutionEnvironment 对于 FileInputFormat 多file path 不兼容问题咨询

2020-03-04 Thread JingsongLee
Hi, 你的需求是什么?下列哪种? - 1.想用unbounded source,continuous的file source,监控文件夹,发送新文件,且需要支持多文件夹 - 2.只是想用bounded的input format,需要支持多文件 如果是1,现在仍然不支持。 如果是2,那你可以用env.addSource(new InputFormatSourceFunction(..)...)来支持多文件。 Best, Jingsong Lee --

[DISCUSS] FLIP-111: Docker image unification

2020-03-04 Thread Andrey Zagrebin
Hi All, If you have ever touched the docker topic in Flink, you probably noticed that we have multiple places in docs and repos which address its various concerns. We have prepared a FLIP [1] to simplify the perception of docker topic in Flink by users. It mostly advocates for an approach of

flink 1.8 内的StreamExecutionEnvironment 对于 FileInputFormat 多file path 不兼容问题咨询

2020-03-04 Thread 王智
我在使用flink 1.8 自定义 FileInputFormat 的时候遇到了不兼容问题,初步看了源代码,不知道自己得出的结论是否正确,并想了解一下后续趋势和进展,麻烦各位大佬抽空指点一下,先行感谢~~ 问题1: StreamExecutionEnvironment 为什么要做这样的限制?ContinuousFileMonitoringFunction 的作用是什么? 相关的代码描述如下 StreamExecutionEnvironment 对 FileInputFormat 对象有特殊的处理逻辑 if (inputFormat instanceof