Re: Apache Flink Reading CSV Files ,Transform and Writting Back to CSV using Paralliesm

2017-08-25 Thread Lokesh Gowda
Hi Robert my question was if I need to read and write the csv file of size which will be in gb how i can distribute the data sink to write into files 1gb exactly and since I am New to flink I am not sure about this Regards Lokesh.r On Sat, Aug 26, 2017 at 2:56 AM Robert Metzger wrote: > Hi

Re: akka timeout

2017-08-25 Thread Steven Wu
this is a stateless job. so we don't use RocksDB. yeah. network can also be a possibility. will keep it in the radar. unfortunately, our metrics system don't have the tcp metrics when running inside containers. On Fri, Aug 25, 2017 at 2:09 PM, Robert Metzger wrote: > Hi, > are you using the Roc

Re: Even out the number of generated windows

2017-08-25 Thread Bowen Li
Hi Robert, We use kinesis sink (FlinkKinesisProducer). The main pain is the Kinesis Producer Library (KPL) that FlinkKinesisProducer uses. KPL is basically a java wrapper with a c++ core. It's slow, unstable, easy to crash, memory-and-CPU-consuming (it sends traffic via HTTP), and can't handle hig

Thoughts - Monitoring & Alerting if a Running Flink job ever kills

2017-08-25 Thread Raja . Aravapalli
Hi, Is there a way to set alerting when a running Flink job kills, due to any reasons? Any thoughts please? Regards, Raja.

Re: [EXTERNAL] Re: Security Control of running Flink Jobs on Flink UI

2017-08-25 Thread Ted Yu
Logged FLINK-7525, referring to this thread. On Fri, Aug 25, 2017 at 3:23 PM, Raja.Aravapalli wrote: > Ability to disable it will be a super helpful. > > > > +1 to the idea. > > > > > > Regards, > > Raja. > > > > > > *From: *Ted Yu > *Date: *Friday, August 25, 2017 at 4:56 PM > *To: *Robert Met

Re: [EXTERNAL] Re: Security Control of running Flink Jobs on Flink UI

2017-08-25 Thread Raja . Aravapalli
Ability to disable it will be a super helpful. +1 to the idea. Regards, Raja. From: Ted Yu Date: Friday, August 25, 2017 at 4:56 PM To: Robert Metzger Cc: Raja Aravapalli , "user@flink.apache.org" Subject: [EXTERNAL] Re: Security Control of running Flink Jobs on Flink UI bq. introduce a

Re: Security Control of running Flink Jobs on Flink UI

2017-08-25 Thread Ted Yu
bq. introduce a special config flag to disable the Cancel functionality +1 Similar config is used in other project(s) such as hbase. On Fri, Aug 25, 2017 at 2:54 PM, Robert Metzger wrote: > Hi Raja, > > you can actually disable the UI by setting the port to a negative number. > The configurat

Re: Security Control of running Flink Jobs on Flink UI

2017-08-25 Thread Robert Metzger
Hi Raja, you can actually disable the UI by setting the port to a negative number. The configuration property is "jobmanager.web.port". I'm not sure how well this is tested, but from the code it seems that this is the behavior of Flink. If that doesn't work, I would propose to add a change to Fli

Re: Reset Kafka Consumer using Flink Consumer 10 API

2017-08-25 Thread Robert Metzger
Hi, it seems from the stack trace, that you are calling the restoreState() method yourself in ReplayTest.getKafkaSource(ReplayTest.java:50): org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase. restoreState(FlinkKafkaConsumerBase.java:388) at in.dailyhunt.cis.enrichment.pin

Re: Flink-HTM integration

2017-08-25 Thread Robert Metzger
Hi, maybe creating an issue in the GitHub project will notify the project members? On Thu, Aug 24, 2017 at 5:19 PM, AndreaKinn wrote: > Hi, > Is there here someone who used Flink-HTM library > https://github.com/htm-community/flink-htm > ? > I'm tryi

Re: Even out the number of generated windows

2017-08-25 Thread Robert Metzger
Hi Bowen, (very nice graphics :) ) I don't think you can do anything about the windows itself (unless you are able to build the windows yourself using the ProcessFunction, playing some tricks because you know your data), so I should focus on reducing the pain in "burning down your sink". Are ther

Re: Apache Flink Reading CSV Files ,Transform and Writting Back to CSV using Paralliesm

2017-08-25 Thread Robert Metzger
Hi Lokesh, I'm not sure if I fully understood your question. But you can not write the result in a single file from multiple writers. If you want to process the data fully distributed, you'll also have to write it distributed. On Wed, Aug 23, 2017 at 8:07 PM, Lokesh R wrote: > Hi Team, > > I am

Re: Possible conflict between in Flink connectors

2017-08-25 Thread Federico D'Ambrosio
Hi, At the beginning, I was wondering myself that too, and I don't know why hbase-common wasn''t being downloaded and included, so I added it explicitly. I was in the process to write that maybe I've solved this weird issue: apparently the shading worked and the ClassDefNotFound issue was caused

Re: Possible conflict between in Flink connectors

2017-08-25 Thread Ted Yu
Looking at dependencies for flink-hbase, we have : [INFO] +- org.apache.hbase:hbase-server:jar:1.3.1:compile [INFO] | +- org.apache.hbase:hbase-common:jar:1.3.1:compile [INFO] | +- org.apache.hbase:hbase-protocol:jar:1.3.1:compile [INFO] | +- org.apache.hbase:hbase-procedure:jar:1.3.1:compile [

Re: akka timeout

2017-08-25 Thread Robert Metzger
Hi, are you using the RocksDB state backend already? Maybe writing the state to disk would actually reduce the pressure on the GC (but of course it'll also reduce throughput a bit). Are there any known issues with the network? Maybe the network bursts on restart cause the timeouts? On Fri, Aug 25

Re: Possible conflict between in Flink connectors

2017-08-25 Thread Robert Metzger
Hi, why do you need to add hbase-common as a separate dependency? Doesn't the "flink-hbase" dependency transitively pull in hbase? On Fri, Aug 25, 2017 at 6:35 PM, Ted Yu wrote: > If Guava 18.0 is used to build hbase 1.3, there would be compilation > errors such as the following: > > [ERROR] /m

Re: Serialization issues with DataStreamUtils

2017-08-25 Thread Robert Metzger
Hi Vinay, could you provide the full stack trace and the Data types you are using in your streaming application, to fully understand the problem? Regards, Robert On Fri, Aug 25, 2017 at 6:51 PM, vinay patil wrote: > Hi Guys, > > I am using DataStreamUtils for unit testing, the test case succe

Re: Which window function to use to start a window at anytime

2017-08-25 Thread Bowen Li
Hi Aljoscha, Thank you very much! We imagined it's going to be very expensive to achieve that, and your answer verified our understanding of how Flink works. Regards, Bowen On Fri, Aug 25, 2017 at 8:18 AM, Aljoscha Krettek wrote: > Hi, > > I'm afraid this is not possible right now because it

Serialization issues with DataStreamUtils

2017-08-25 Thread vinay patil
Hi Guys, I am using DataStreamUtils for unit testing, the test case succeeds when it is run individually but I get the following error when all the tests are run: Serialization trace: fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) at org

Re: Possible conflict between in Flink connectors

2017-08-25 Thread Ted Yu
If Guava 18.0 is used to build hbase 1.3, there would be compilation errors such as the following: [ERROR] /mnt/disk2/a/1.3-h/hbase-server/src/main/java/org/ apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:[271,25] error: cannot find symbol [ERROR] symbol: method stopAndWai

Re: akka timeout

2017-08-25 Thread Steven Wu
Bowen, Heap size is ~50G. CPU was actually pretty low (like <20%) when high GC pause and akka timeout was happening. So maybe memory allocation and GC wasn't really an issue. I also recently learned that JVM can pause for writing to GC log for disk I/O. that is another lead I am pursuing. Thanks,

Possible conflict between in Flink connectors

2017-08-25 Thread Federico D'Ambrosio
Hello everyone, I'm new to Flink and am encountering a nasty problem while trying to submit a streaming Flink Job. I'll try to explain it as thoroughly as possible. Premise: I'm using an HDP 2.6 hadoop cluster, with hadoop version 2.7.3.2.6.1.0-129, Flink compiled from sources accordingly (maven 3

Re: Reload DistributedCache file?

2017-08-25 Thread Aljoscha Krettek
Hi, You are right in your analysis that updating the entry in the DistributedCache will not in fact update the file on already running operators. The DistributedCache is only read when initially setting up operations. You might be able to use a file source (with CONTINUOUS processing mode) and

Re: stream partitioning to avoid network overhead

2017-08-25 Thread Aljoscha Krettek
Hi, Quick remark: operator chaining is only possible when the parallelism of the upstream and downstream operators is the same. So having the same parallelism is not the standard or desired way, it's the only way to achieve chaining. Best, Aljoscha > On 16. Aug 2017, at 18:26, Karthik Deivasig

Re: JobManager HA behind load balancer

2017-08-25 Thread Aljoscha Krettek
Hi Shannon, I think this will be reworked as part of the FLIP-6 efforts. Your problem comes up in Kubernetes where several JMs would sit behind a K8s service. I think the solution is for the JMs to act as proxies (they contact the JM leader, get the data and return to the client) instead of red

Re: Time zones problem

2017-08-25 Thread Aljoscha Krettek
Hi, I'm afraid you do indeed need to have a separate window for each timezone, yes. You can probably use an hourly window to pre-aggregate the results and then have specific daily windows for the different timezones with aggregate different (pre-aggregated) hourly results. Does that work for y

Re: Which window function to use to start a window at anytime

2017-08-25 Thread Aljoscha Krettek
Hi, I'm afraid this is not possible right now because it would require keeping state in the WindowAssigner (per key) about what the start timestamp for a specific key is. I think you could emulate that behaviour by having a stateful FlatMap that keeps track of all keys and their respective tim

Re: Avoiding duplicates in joined stream

2017-08-25 Thread Aljoscha Krettek
Hi, The problem with reduplication in a streaming pipeline is that you need to keep all data that you ever saw or do the de-duplication only on a window. You can do the first by writing a keyed FlatMap operation that keeps state and only emits an incoming element if it hasn't been seen so far.

Re: Distribute crawling of a URL list using Flink

2017-08-25 Thread Aljoscha Krettek
Hi, It is not available for the Batch API, you would have to use the DataStream API. Best, Aljoscha > On 15. Aug 2017, at 01:16, Kien Truong wrote: > > Hi, > > Admittedly, I have not suggested this because I thought it was not available > for batch API. > > Regards, > Kien > On Aug 15,

Re: Flink isn't logging once we use RollingFileAppender

2017-08-25 Thread Aljoscha Krettek
Hi, I just tried this with your settings and it worked on my machine. Can your Flink processes write to the specified location? I think /opt might not be writable by your users. Maybe try putting something like /tmp/flink.log there. Best, Aljoscha > On 9. Aug 2017, at 16:45, Hussein Baghdadi

Re: Question regarding check memory and CPU used for job

2017-08-25 Thread Aljoscha Krettek
Hi, This might be a bit late but I think you can only get that data from the Flink web dashboard or by manually checking on the machines. Best, Aljoscha > On 14. Jul 2017, at 20:16, Claire Yuan wrote: > > Hi all, > I am currently running a flink session on YARN and try to access the > coun

Re: trigger testing

2017-08-25 Thread Aljoscha Krettek
Hi, Could you please post a code example of how you set up your window/trigger? Maybe also with some more code around it? Best, Aljoscha > On 12. Jul 2017, at 14:31, jad mad wrote: > > I'm testing with ContinuousEventTimeTrigger with a TumblingWindow. > > let's say in time frame A, B, C ther

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-25 Thread Aljoscha Krettek
Hi, I don't think implementing a StoppableSource is an option here since you want to use the Flink Kafka source. What is your motivation for this? Especially, why are you using Kafka if the input is bounded and you will shut down the job at some point? Also, regarding StoppableSource: this wil

Re: Apache beam and Flink

2017-08-25 Thread Aljoscha Krettek
Hi, It's currently not possible to mix-and-match and I doubt that this will be possible in the future. Because Beam is a portable API we try and ensure that no API of the underlying Runner can bleed through. Best, Aljoscha > On 13. Aug 2017, at 16:03, Ted Yu wrote: > > I found this: > http://

Re: Problem with Flink restoring from checkpoints

2017-08-25 Thread Aljoscha Krettek
Hi, Sorry for getting back to this so late but I think the underlying problem is that S3 does not behave as expected by the Bucketing Sink. See this message by Stephan on the topic: https://lists.apache.org/thread.html/34b8ede3affb965c7b5ec1e404918b39c282c258809dfd7a6c257a61@%3Cuser.flink.apach

Re: Flink doesn't free YARN slots after restarting

2017-08-25 Thread Bowen Li
Hi Till, What I mean is: can the sliding windows for different item have different start time? Here's an example of what we want: - for item A: its first event arrives at 2017/8/24-01:*12:24*, so the 1st window should be 2017/8/24-01:*12:24* - 2017/8/25-01:*12:23*, the 2nd window would be 2017/8/2

Re: Flink doesn't free YARN slots after restarting

2017-08-25 Thread Till Rohrmann
Hi Bowen, having a sliding window of one day with a slide of one hour basically means that each window is closed after 24 hours and the next closing happens one hour later. Only when the window is closed/triggered, you compute the window function which generates the window output. That's why you s