Re: migrate AbstractStreamOperator from 1.0 to 1.4

2018-03-13 Thread Filippo Balicchia
Hi, if some can be interested, restoreState was removed from issue https://issues.apache.org/jira/browse/FLINK-4196 and snapshotOperatorState was replace from snapshotState in 1.2 --Filippo 2018-03-12 15:37 GMT+01:00 Filippo Balicchia : > Hi, > > I'm newbie in Flink and in streaming Engine and

Re: Record Delivery Guarantee with Kafka 1.0.0

2018-03-13 Thread Chirag Dewan
Hi, Still stuck around this.  My understanding is, this is something Flink can't handle. If the batch-size of Kafka Producer is non zero(which ideally should be), there will be in-memory records and data loss(boundary cases). Only way I can handle this with Flink is my checkpointing interval, w

Re: Share state across operators

2018-03-13 Thread m@xi
Thank a lot Timo! Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-13 Thread Yan Zhou [FDS Science]
Hi, I am using flink sql in my application. It simply reads records from kafka source, converts to table, then runs an query to have over window aggregation for each record. Time lag watermark assigner with 10ms time lag is used. The performance is not ideal. the end-to-end latency, which is th

Re: State serialization problem when we add a new field in the object

2018-03-13 Thread Aljoscha Krettek
Hi, I'm afraid Flink does currently not support changing the schema of state when restoring from a savepoint. Best, Aljoscha > On 13. Mar 2018, at 07:36, kla wrote: > > Hi guys, > > I have the flink streaming job running (1.2.0 version) which has the > following state: > > private transient

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-13 Thread Fabian Hueske
Hi Seth, Thanks for sharing how you resolved the problem! The problem might have been related to Flink's key groups which are used to assign key ranges to tasks. Not sure why this would be related to ZooKeeper being in a bad state. Maybe Stefan (in CC) has an idea about the cause. Also, it would

Re: Partial aggregation result sink

2018-03-13 Thread Fabian Hueske
Hi, Chesnay is right. SQL and Table API do not support early window results and no allowed lateness to update results with late arriving data. If you need such features, you should use the DataStream API. Best, Fabian 2018-03-13 12:10 GMT+01:00 Chesnay Schepler : > I don't think you can specif

sorting data into sink

2018-03-13 Thread Telco Phone
Does any know if this is a correct assumption DataStream sorted = stream.keyBy("partition"); Will automattically put same record to the same sink thread ? The behavior I am seeing is that a Sink setup with multiple threads is see data from the same hour. Any good examples of how to sort data so

Re: Dependency Injection and Flink

2018-03-13 Thread Steven Wu
Xiaochuan, We are doing exactly as you described. We keep the injector as a global static var. But we extend from FlinkJobManager and FlinkTaskManager to override main method and initialize the injector (and other things) during JVM startup, which does cause tight code coupling. It is a little pa

Dependency Injection and Flink

2018-03-13 Thread XiaoChuan Yu
Hi, I'm evaluating Flink with the intent to integrate it into a Java project that uses a lot of dependency injection via Guice. What would be the best way to work with DI/Guice given that injected fields aren't Serializable? I looked at this StackOverflow answer so far. To my understanding the str

Re: Extremely large job serialization produced by union operator

2018-03-13 Thread Fabian Hueske
Hi Bill, The size of the program depends on the number and complexity SQL queries that you are submitting. Each query might be translated into a sequence of multiple operators. Each operator has a string with generated code that will be compiled on the worker nodes. The size of the code depends on

Re: Event time join

2018-03-13 Thread Fabian Hueske
Hi, A Flink application does not have a problem if it ingests two streams with very different throughput as long as they are somewhat synced on their event-time. This is typically the case when ingesting real-time data. In such scenarios, an application would not buffer more data than necessary.

State serialization problem when we add a new field in the object

2018-03-13 Thread kla
Hi guys, I have the flink streaming job running (1.2.0 version) which has the following state: private transient ValueState>> userState; With following configuration: final ValueStateDescriptor>> descriptor = new ValueStateDescriptor<>("userState", TypeInformation.of(new UserTyp

Re: UUIDs generated by Flink SQL

2018-03-13 Thread Fabian Hueske
Hi Gregory, Your understanding is correct. It is not possible to assign UUID to the operators generated by the SQL/Table planner. To be honest, I am not sure whether the use case that you are describing should be the scope of the "officially" supported use cases of the API. It would require in dep

Too many open files on Bucketing sink

2018-03-13 Thread galantaa
Hey all, I'm using bucketing sink with a bucketer that creates partition per customer per day. I sink the files to s3. it suppose to work on around 500 files at the same time (according to my partitioning). I have a critical problem of 'Too many open files'. I've upload two taskmanagers, each with

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-13 Thread Seth Wiesman
It turns out the issue was due to our zookeeper installation being in a bad state. I am not clear enough on flink’s networking internals to explain how this manifested as a partition not found exception, but hopefully this can serve as a starting point for other’s who run into the same issue. [

Re: Global Window, Trigger and Watermarks, Parallelism

2018-03-13 Thread dim5b
I have looked into the CEP library. I have posted an issued on stackoverflow. https://stackoverflow.com/questions/49047879/global-windows-in-flink-using-custom-triggers-vs-flink-cep-pattern-api However the pattern matches all possible solution on the stream of events.Does pattern have a notion o

Re: HDFS data locality and distribution

2018-03-13 Thread Chesnay Schepler
Hello, You said that "data is distributed very badly across slots"; do you mean that only a small number of subtasks is reading from HDFS, or that the keyed data is only processed by a few subtasks? Flink does prioritize date locality over date distribution when reading the files, but the fu

Re: Partial aggregation result sink

2018-03-13 Thread Chesnay Schepler
I don't think you can specify custom triggers when using purer SQL, but maybe Fabian or Timo know a SQL way of implementing your goal. On 12.03.2018 13:16, 李玥 wrote: Hi Chirag, Thank for your reply! I found a provided ContinuousEventTimeTrigger should be worked in my situation. Most examples a

Re: Global Window, Trigger and Watermarks, Parallelism

2018-03-13 Thread Chesnay Schepler
Hello, Event-time and watermarks can be used to deal with out-of-order events, but since you're using global windows (opposed to time-based windows) you have to implement the logic for doing this yourself. Conceptually, what you would have to do is to not create your TripEv when receiving a

Flink web UI authentication

2018-03-13 Thread Sampath Bhat
Hello I would like to know if flink supports any user level authentication like username/password for flink web ui. Regards Sampath S

Re: What's the best way to clean up the expired rocksdb state

2018-03-13 Thread Chesnay Schepler
Hello, yes, i think you'll need to use a ProcessFunction and clean up the state manually. On 11.03.2018 15:13, sundy wrote: hi: my streaming application always do Key by the some keys with event timestamp, such as keyBy( “qps_1520777430”), so the expired keys(1 hours ago) are useless. And

Flink kafka connector with JAAS configurations crashed

2018-03-13 Thread sundy
Hi ,all I use the code below to set kafka JASS config, the serverConfig.jasspath is /data/apps/spark/kafka_client_jaas.conf, but on flink standalone deployment, it crashs. I am sure the kafka_client_jass.conf is valid, cause other applications(Spark streaming) are still working fine with