Re: Does operator uid() have to be unique across all jobs?

2019-10-24 Thread Victor Wong
Hi, “uid” is mainly useful when you upgrade your application. It’s used to match the operator state stored in the savepoint. As suggested in [1], “it is highly recommended to assign unique IDs to all operators of an application that might be upgraded in the future.” [1]. https://ci.apache

Re: Could not load the native RocksDB library

2019-10-24 Thread Congxian Qiu
FYI Maybe this is an env problem. I encountered this problem when running flink 1.9 on k8s, but it was success when running on yarn. did not figure out why this happened, will update here after find it out. Best, Congxian Thad Truman 于2019年10月23日周三 上午1:33写道: > Hi Samya, > > > > Were you able

Re: How to create an empty test stream

2019-10-24 Thread Dmitry Minaev
Thanks, I'll check it out. Actually I realized I can always put a filter operator that'll effectively remove everything from the stream. -- Dmitry On Thu, Oct 24, 2019 at 2:29 AM vino yang wrote: > Hi Dmitry, > > Perhaps an easy way is to customize a source function. Then in the run > method, s

Re: Does operator uid() have to be unique across all jobs?

2019-10-24 Thread Dian Fu
Just adding one more point: Changing the parallelism of the operators may affect the chaining of the operators, which will also affect the generated uid. So the uid of stateful operators should also be set in this case. > 在 2019年10月25日,上午9:51,Dian Fu 写道: > > Hi Min, > > The uid is used to mat

Re: Does operator uid() have to be unique across all jobs?

2019-10-24 Thread Dian Fu
Hi Min, The uid is used to matching the operator state stored in the checkpoint/savepoint to an operator[1]. So you only need to specify the uid for stateful operators. 1) If you have not specified the uid for an operator, it will generate a uid for it in a deterministic way[2] for it. The gene

Flink 1.5+ performance in a Java standalone environment

2019-10-24 Thread Jakub Danilewicz
Hi, I have recently tried to upgrade Flink from 1.2.0 to the newest version and noticed that starting from the version 1.5 the performance is much worse when processing fixed graphs in a standalone JVM environment (Java 8). This affects all the use-cases when a Gelly graph (pre-built from a fixed

RE: Does operator uid() have to be unique across all jobs?

2019-10-24 Thread min.tan
Hi, I have some simple questions on the uid as well. 1) Do we add a uid for every operator e.g. print(), addSink and addSource? 2) For chained operators, do we need to uids for each operator? Or just the last operator? e.g. .map().uid("some-id").print().uid("print-id"); Rega

Re: Does operator uid() have to be unique across all jobs?

2019-10-24 Thread John Smith
Ok cool. Thanks BTW this seems a bit cumbersome... .map().uid("some-id").name("some-id"); On Wed, 23 Oct 2019 at 21:13, Dian Fu wrote: > Yes, you can use it in another job. The uid needs only to be unique within > a job. > > > 在 2019年10月24日,上午5:42,John Smith 写道: > > > > When setting uid()

Re: Using STSAssumeRoleSessionCredentialsProvider for cross account access

2019-10-24 Thread Vinay Patil
Hi, Can someone pls help here , facing issues in Prod . I see the following ticket in unresolved state. https://issues.apache.org/jira/browse/FLINK-8417 Regards, Vinay Patil On Thu, Oct 24, 2019 at 11:01 AM Vinay Patil wrote: > Hi, > > I am trying to access dynamo streams from a different a

Guarantee of event-time order in FlinkKafkaConsumer

2019-10-24 Thread Wojciech Indyk
Hi! I use Flink 1.8.0 with Kafka 2.2.1. I need to guarantee of correct order of events by event timestamp. I generate periodic watermarks every 1s. I use FlinkKafkaConsumer with AscendingTimestampExtractor. The code (and the same question) is here: https://stackoverflow.com/questions/58539379/guara

Can a Flink query outputs nested json?

2019-10-24 Thread srikanth flink
I'm working on Flink SQL client. Input data is json format and contains nested json. I'm trying to query the nested json from the table and expecting the output to be nested json instead of string. I've build the environment file to define a table schema as: > format: > type: json >

Re: How to create an empty test stream

2019-10-24 Thread vino yang
Hi Dmitry, Perhaps an easy way is to customize a source function. Then in the run method, start an empty loop? But I don't understand the meaning of starting a stream pipeline without generating data. Best, Vino Dmitry Minaev 于2019年10月24日周四 上午6:16写道: > Hi everyone, > > I have a pipeline where

Re: Docker flink 1.9.1

2019-10-24 Thread vino yang
OK, sounds good to me. Farouk 于2019年10月24日周四 下午3:23写道: > Hi > > I checked there is 2 PR on github for Flink 1.9.1. > > It should come soon enough :) > > Thanks > > Farouk > > Le mer. 23 oct. 2019 à 17:46, vino yang a écrit : > >> Hi Farouk, >> >> Not long after Flink 1.9.1 was released, the com

Re: JDBCInputFormat does not support json type

2019-10-24 Thread Fanbin Bu
Looks like SnowflakeColumnMetadata treats VARIANT as VARCHAR case VARIANT: colType = Types.VARCHAR; extColTypeName = "VARIANT"; break; and SnowflakeResultSet just return the string of the field switch(type) { case Types.VARCHAR: case Types.CHAR: return getString(columnIndex); What

JDBCInputFormat does not support json type

2019-10-24 Thread Fanbin Bu
Hi there, Flink Version: 1.8.1 JDBC driver: net.snowflake.client.jdbc.SnowflakeDriver Here is the code snippet: val rowTypeInfo = new RowTypeInfo( Array[TypeInformation[_]]( new RowTypeInfo( Array[TypeInformation[_]](BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TY

Re: Docker flink 1.9.1

2019-10-24 Thread Farouk
Hi I checked there is 2 PR on github for Flink 1.9.1. It should come soon enough :) Thanks Farouk Le mer. 23 oct. 2019 à 17:46, vino yang a écrit : > Hi Farouk, > > Not long after Flink 1.9.1 was released, the community may not have time > to provide the corresponding Dockerfiles. I can give

Re: Flink StreamingFileSink part file behavior

2019-10-24 Thread Paul Lam
Hi, StreamingFileSink can write to many buckets at the same time, and it uses BucketAssigner to determine the Bucket for each record. WRT you questions, the records would be written to the expected bucket even if they arrive out of order. You can refer to [1] for more information. [1] https:

Flink 1.9 measuring time taken by each operator in DataStream API

2019-10-24 Thread Komal Mariam
Hello, I have a few questions regarding flink’s dashboard and monitoring tools. I have a fixed number of records that I process through the datastreaming API on my standalone cluster and want to know how long it takes to process them. My questions are: 1)How can I see the time taken in milli