Re: Let BucketingSink roll file on each checkpoint

2018-03-20 Thread XilangYan
Thank you! Fabian HDFS small file problem can be avoid with big checkpoint interval. Meanwhile, there is potential data lose problem in current BucketingSink. Say we consume data in kafka, when checkpoint is requested, kafka offset is update, but in-progress file in BucketingSink is remained. If

Is Hadoop 3.0 integration planned?

2018-03-20 Thread Jayant Ameta
Jayant Ameta

Re: Restart hook and checkpoint

2018-03-20 Thread Ashish Pokharel
I definitely like the idea of event based checkpointing :) Fabian, I do agree with your point that it is not possible to take a rescue checkpoint consistently. The basis here however is not around the operator that actually failed. It’s to avoid data loss across 100s (probably 1000s of paralle

Re: CsvSink

2018-03-20 Thread karim amer
Never mind I found the error and has nothing to do with flink. Sorry On Tue, Mar 20, 2018 at 12:12 PM, karim amer wrote: > here is the output after fixing the scala issues > > https://gist.github.com/karimamer/9e3bcf0a6d9110c01caa2ebd14aa7a8c > > On Tue, Mar 20, 2018 at 11:39 AM, karim amer > w

Re: [ANNOUNCE] Weekly community update #12

2018-03-20 Thread Stephan Ewen
Great initiative, highly appreciated, Till! On Mon, Mar 19, 2018 at 7:06 PM, Till Rohrmann wrote: > Dear community, > > I've noticed that Flink has grown quite a bit in the past. As a > consequence it can be quite challenging to stay up to date. Especially for > community members who don't foll

Re: Strange behavior on filter, group and reduce DataSets

2018-03-20 Thread Stephan Ewen
To diagnose that, can you please check the following: - Change the Person data type to be immutable (final fields, no setters, set fields in constructor instead). Does that make the problem go away? - Change the Person data type to not be a POJO by adding a dummy fields that is never used, bu

Re: CsvSink

2018-03-20 Thread karim amer
here is the output after fixing the scala issues https://gist.github.com/karimamer/9e3bcf0a6d9110c01caa2ebd14aa7a8c On Tue, Mar 20, 2018 at 11:39 AM, karim amer wrote: > Never mind after importing > > import org.apache.flink.api.scala._ > > theses errors went away and i still have the original

Re: CsvSink

2018-03-20 Thread karim amer
Never mind after importing import org.apache.flink.api.scala._ theses errors went away and i still have the original problem. Sorry my bad On Tue, Mar 20, 2018 at 11:04 AM, karim amer wrote: > To clarify should i file a bug report on sbt hiding the errors in the > previous email ? > > On Tue,

Re: CsvSink

2018-03-20 Thread karim amer
To clarify should i file a bug report on sbt hiding the errors in the previous email ? On Tue, Mar 20, 2018 at 9:44 AM, karim amer wrote: > After switching to Maven from Sbt I got these errors > Error:(63, 37) could not find implicit value for evidence parameter of > type org.apache.flink.api.co

Re: CsvSink

2018-03-20 Thread karim amer
After switching to Maven from Sbt I got these errors Error:(63, 37) could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.quickstart.DataStreamtotableapi.Calls] val namedStream = dataStream.map((value:String) => { Er

Re: CsvSink

2018-03-20 Thread karim amer
Hi Fabian Sorry if i confused you The first error is from Nico's code Not my code or snippet I am still having the original problem in my snippet where it's writing a blank csv file even though i get [success] Total time: 26 s, completed Mar 20, 2018 9:28:06 AM After running the job Cheers, karim

Re: Flink CEP window for 1 working day

2018-03-20 Thread shishal
Thanks Fabian, So by non working day, I mean, I have a list of non working day in a year, which I can use to compare. I am very new to Flink and Flick CEP. Initially I thought there is a way to have within(time) value expression dynamically. So now I guess that's not possible. If I understand c

Re: Flink CEP window for 1 working day

2018-03-20 Thread Fabian Hueske
Hi, I'm afraid, Flink CEP does not distinguish work days from non-work days. Of course, you could implement the logic in a DataStream program (probably using ProcessFunction). Best, Fabian 2018-03-20 15:44 GMT+01:00 shishal : > I am using flink CEP , and to match a event pattern in given time w

Flink CEP window for 1 working day

2018-03-20 Thread shishal
I am using flink CEP , and to match a event pattern in given time window we use *.within(Time.days(1))* Now in one of the case I need to wait for 1 working day instead of 1 day. Is there any way to do that in Flink CEP? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabb

Re: Strange behavior on filter, group and reduce DataSets

2018-03-20 Thread Fabian Hueske
Hi Simone and Flavio, I created FLINK-9031 [1] for this issue. Please have a look and add any detail that you think could help to resolve the problem. Thanks, Fabian [1] https://issues.apache.org/jira/browse/FLINK-9031 2018-03-19 16:35 GMT+01:00 simone : > Hi Fabian, > > This simple code repro

Re: Metric Registry Warnings

2018-03-20 Thread PedroMrChaves
I have multiple sources but with distinct names and UIDs. More information about my execution environment: Flink Version: 1.4.2 bundled with hadoop 2.8 State backend: Hadoop 2.8 Job compiled for version 1.4.2 using the Scala version libs from Scala version 2.11. Am using the com.github.davidb t

Re: Metric Registry Warnings

2018-03-20 Thread Chesnay Schepler
FLINK-7100 is about taskmanager metrics being registered twice, whereas here we're dealing with job metrics. Do you have multiple sources? If so, do they have unique names? On 20.03.2018 15:06, Fabian Hueske wrote: Hi Pedro, Can you reopen FLINK-7100 and post a comment with your error message

Re: Help Required for Log4J

2018-03-20 Thread Puneet Kinra
Hi Fabin thanks for reply I fixed the issue that i was facing. On Tue, Mar 20, 2018 at 7:31 PM, Fabian Hueske wrote: > Hi, > > TBH, I don't have much experience with logging, but you might want to > consider using Side Outputs [1] to route invalid records into a separate > stream. > The stream

Re: Error while reporting metrics - ConcorrentModificationException

2018-03-20 Thread Chesnay Schepler
A wrapped Kafka metric was accessing state of the consumer while said state was modified. As far as I can tell this is a Kafka issue and there's nothing we can do. Unless this happens frequently it should be safe to ignore it. On 20.03.2018 15:02, PedroMrChaves wrote: Hello, I have the follo

Re: Metric Registry Warnings

2018-03-20 Thread Fabian Hueske
Hi Pedro, Can you reopen FLINK-7100 and post a comment with your error message and environment? Thanks, Fabian 2018-03-20 14:58 GMT+01:00 PedroMrChaves : > Hello, > > I still have the same issue with Flink Version 1.4.2. > > java.lang.IllegalArgumentException: A metric named > .taskmanager.6aa8

Re: Help Required for Log4J

2018-03-20 Thread Fabian Hueske
Hi, TBH, I don't have much experience with logging, but you might want to consider using Side Outputs [1] to route invalid records into a separate stream. The stream can then separately handled, be written to files or Kafka or wherever. Best, Fabian [1] https://ci.apache.org/projects/flink/flink

Error while reporting metrics - ConcorrentModificationException

2018-03-20 Thread PedroMrChaves
Hello, I have the following error while trying to report metrics to influxdb using the DropwizardReporter. 2018-03-20 13:51:00,288 WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while reporting metrics java.util.ConcurrentModificationException at java.util.Lin

Re: Metric Registry Warnings

2018-03-20 Thread PedroMrChaves
Hello, I still have the same issue with Flink Version 1.4.2. java.lang.IllegalArgumentException: A metric named .taskmanager.6aa8d13575228d38ae4abdfb37fa229e.CDC.Source: EVENTS.1.numRecordsIn already exists at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91) at

Re: Updating Job dependencies without restarting Flink

2018-03-20 Thread Fabian Hueske
Hi, I'm quite sure that is not supported. You'd have to take a savepoint and restart the application. Depending on the sink system, you could start the new job before shutting the old job down. Best, Fabian 2018-03-20 10:31 GMT+01:00 Rohil Surana : > Hi, > > We have a lot of jobs on Flink clust

Re: Restart hook and checkpoint

2018-03-20 Thread Fabian Hueske
Well, that's not that easy to do, because checkpoints must be coordinated and triggered the JobManager. Also, the checkpointing mechanism with flowing checkpoint barriers (to ensure checkpoint consistency) won't work once a task failed because it cannot continue processing and forward barriers. If

Re: unable to addSource to StreamExecutionEnvironment?

2018-03-20 Thread Fabian Hueske
Thanks for reporting back! 2018-03-20 10:42 GMT+01:00 James Yu : > Just found out that IDE seems auto import wrong class. > While "org.apache.flink.streaming.api.datastream.DataStream" is required, > "org.apache.flink.streaming.api.scala.DataStream" was imported. > > This is a UTF-8 formatted mai

Re: Let BucketingSink roll file on each checkpoint

2018-03-20 Thread Fabian Hueske
Hi, The BucketingSink closes files once they reached a certain size (BatchSize) or have not been written to for a certain amount of time (InactiveBucketThreshold). While being written to, files are in an in-progress state and only moved to to completed once being closed. When that happens, other s

Flink remote debug not working

2018-03-20 Thread Ankit Chaudhary
Hey Guys, >From flink 1.4.+ onwards , I some how not able to use JVM args for remote debug, i.e., "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=". I am using: env.java.opts: "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=" in flink-conf.yaml. When I try to restart

Re: ListCheckpointed function - what happens prior to restoreState() being called?

2018-03-20 Thread Fabian Hueske
Hi Ken, The documentation page describes that first the state is restored / initialized and then the function's open() method is called [1]. I had a look at the code and it looks like the docs are correct [2] Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/internals/

Re: CsvSink

2018-03-20 Thread Fabian Hueske
Hi Karim, I cannot find a method invocation "tableEnv.registerDataStream("myTable2", set, 'A, 'B, 'C )" as shown in the error message in your example. It would help if you would keep error message and code consistent. Otherwise it's not possible to figure out what's going on. Best, Fabian 2018-0

Re: Kafka ProducerFencedException after checkpointing

2018-03-20 Thread Piotr Nowojski
Hi, Please increase transaction.timeout.ms to a greater value or decrease Flink’s checkpoint interval, I’m pretty sure the issue here is that those two values are overlapping. I think that’s even visible on the screenshots. First checkpoint completed started at 14:28:48 and ended at 14:30:43, w

Re: Kafka ProducerFencedException after checkpointing

2018-03-20 Thread Dongwon Kim
Hi Piotr, We have set producer's [transaction.timeout.ms] to 15 minutes and have used the default setting for broker (15 mins). As Flink's checkpoint interval is 15 minutes, it is not a situation where Kafka's timeout is smaller than Flink's checkpoint interval. As our first checkpoint just takes

Re: [ANNOUNCE] Apache Flink 1.3.3 released

2018-03-20 Thread Chesnay Schepler
Whoops, looks like we forgot to push the release button :) Thank you for notifying us. The artifacts should be available soon. On 20.03.2018 11:35, Philip Luppens wrote: Hi everyone, Thanks, but I don’t see the binaries for 1.3.3 being pushed anywhere in the Maven repositories [1]. Can we expe

Re: [ANNOUNCE] Apache Flink 1.3.3 released

2018-03-20 Thread Philip Luppens
Hi everyone, Thanks, but I don’t see the binaries for 1.3.3 being pushed anywhere in the Maven repositories [1]. Can we expect them to show up over there as well eventually? [1] https://repo.maven.apache.org/maven2/org/apache/flink/flink-java/ Kind regards, -Phil On Fri, Mar 16, 2018 at 3:36

Re: unable to addSource to StreamExecutionEnvironment?

2018-03-20 Thread James Yu
Just found out that IDE seems auto import wrong class. While "org.apache.flink.streaming.api.datastream.DataStream" is required, "org.apache.flink.streaming.api.scala.DataStream" was imported. This is a UTF-8 formatted mail --- James C.-C.Yu +88698871327

Help Required for Log4J

2018-03-20 Thread Puneet Kinra
Hi I have a use case in which i want to log bad records in the log file. I have configured the log4j property file is getting generated as well but it also going to flink logs as well i want to detach it from flink logs want to write to log file. .Here is configuration *(Note :AMSSource is the cu

Re: Kafka ProducerFencedException after checkpointing

2018-03-20 Thread Piotr Nowojski
Hi, What’s your Kafka’s transaction timeout setting? Please both check Kafka producer configuration (transaction.timeout.ms property) and Kafka broker configuration. The most likely cause of such error message is when Kafka's timeout is smaller then Flink’s checkpoint interval and transactions

Updating Job dependencies without restarting Flink

2018-03-20 Thread Rohil Surana
Hi, We have a lot of jobs on Flink cluster that are using some common dependencies and I wanted to know if there is a way to place those dependencies in the flink lib folder so they will be available to the applications without restarting the flink cluster, so that next time a job is started lates

Re: Migration to Flip6 Kubernetes

2018-03-20 Thread Till Rohrmann
Hi Edward, you're right that Flink's Kubernetes documentation has not been updated with respect to Flip-6. This will be one of the tasks during the Flink 1.5 release testing and is still pending. A Flink cluster can be run in two modes: session mode vs per-job mode. The former starts a cluster to

Re: pyflink not working

2018-03-20 Thread Chesnay Schepler
I've commented in the linked JIRA, let's move this discussion there. On 20.03.2018 10:00, Ganesh Manal wrote: Hi, Not able to execute the pyflink job using the pyflink script. Similar to already logged issue – https://issues.apache.org/jira/browse/FLINK-8909

RE: pyflink not working

2018-03-20 Thread Ganesh Manal
Forgot to mention, pyflink job is executing locally but not when executed with the yarn. Same is mentioned in - https://issues.apache.org/jira/browse/FLINK-8909 Thanks & Regards, Ganesh Manal From: Ganesh Manal Sent: Tuesday, March 20, 2018 2:31 PM To: user@fli

pyflink not working

2018-03-20 Thread Ganesh Manal
Hi, Not able to execute the pyflink job using the pyflink script. Similar to already logged issue – https://issues.apache.org/jira/browse/FLINK-8909 My question would be: how we will be able to execute the pyflink job? I am running flink-1.4.0. Thanks & Regards, Ganesh Manal

unable to addSource to StreamExecutionEnvironment?

2018-03-20 Thread James Yu
Hi, I am following the Taxi example provided on " http://training.data-artisans.com/exercises/taxiData.html";, however, I got the following error message when I copy addSource line into my Intellij IDE. error message --> Incompatible types. Required DataStream but 'addSource' was inferred to Data