Re: Connection refused while trying to query state

2019-07-02 Thread Kostas Kloudas
No problem! Glad I could help. Kostas On Tue, Jul 2, 2019 at 12:11 PM Avi Levi wrote: > No, it doesn't. Thanks for pointing it. I just noticed that I wasn't using > the proxy server address. > Thanks !!! > > > On Tue, Jul 2, 2019 at 12:16 PM Kostas Kloudas wrote: > >&g

Re: Connection refused while trying to query state

2019-07-02 Thread Kostas Kloudas
Hi Avi, Do you point the client to the correct address? This means where the "Queryable State Proxy Server @ ..." says? Cheers, Kostas On Sun, Jun 30, 2019 at 4:37 PM Avi Levi wrote: > Hi, > I am trying to query state (cluster 1.8.0 is running on my local machine) . > I do see in the logs

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Kostas Kloudas
Hi all, For GDPR: I am not sure about the regulatory requirements of GDPR but I would assume that the time for deletion starts counting from the time an organisation received the data (i.e. the wall-clock ingestion time of the data), and not the "event time" of the data. In other case, an

Re: StreamingFileSink seems to be overwriting existing part files

2019-03-29 Thread Kostas Kloudas
No problem! Cheers, Kostas On Fri, Mar 29, 2019 at 4:38 PM Bruno Aranda wrote: > Hi Kostas, > > Put that way, sounds fair enough. Many thanks for the clarification, > > Cheers, > > Bruno > > On Fri, 29 Mar 2019 at 15:32, Kostas Kloudas wrote: > >>

Re: StreamingFileSink seems to be overwriting existing part files

2019-03-29 Thread Kostas Kloudas
Hi Bruno, This is the expected behaviour as the job starts "fresh", given that you did not specify any savepoint/checkpoint to start from. As for the note that "One would expect that it finds the last part and gets the next free number?", I am not sure how this can be achieved safely and

Re: ingesting time for TimeCharacteristic.IngestionTime on unit test

2019-03-25 Thread Kostas Kloudas
Hi Avi, Good to hear that! Cheers, Kostas On Mon, Mar 25, 2019 at 3:37 PM Avi Levi wrote: > Thanks, I'll check it out. I got a bit confused with the Ingesting time > equals to null in tests but all is ok now , I appreciate that > > On Mon, Mar 25, 2019 at 1:01 PM Kostas Kl

Re: ingesting time for TimeCharacteristic.IngestionTime on unit test

2019-03-25 Thread Kostas Kloudas
Hi Avi, Just to verify your ITCase, I wrote the following dummy example and it seems to be "working" (ie. I can see non null timestamps and timers firing). StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Kostas Kloudas
Hi Rafi, Piotr is correct. In-progress files are not necessarily readable. The valid files are the ones that are "committed" or finalized. Cheers, Kostas On Thu, Mar 21, 2019 at 2:53 PM Piotr Nowojski wrote: > Hi, > > I’m not sure, but shouldn’t you be just reading committed files and ignore

Re: Timer question

2019-03-07 Thread Kostas Kloudas
, 2019 at 11:27 AM Kostas Kloudas wrote: > >> Hi Flavio, >> >> In general, deleting the redundant timers is definitely more >> memory-friendly. >> The reason why in the docs the code is presented the way it is, is: >> 1) it is mainly for pedagogical purpos

Re: Timer question

2019-03-07 Thread Kostas Kloudas
Hi Flavio, In general, deleting the redundant timers is definitely more memory-friendly. The reason why in the docs the code is presented the way it is, is: 1) it is mainly for pedagogical purposes, and 2) when the docs were written, Flink mechanism for deleting timers was not efficient as it

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Kostas Kloudas
Hi Averell, Did you have other failures before (from which you managed to resume successfully)? Can you share a bit more details about your job and potentially the TM/JM logs? The only thing I found about this is here https://forums.aws.amazon.com/thread.jspa?threadID=130172 but Flink does not

Re: StreamingFileSink on EMR

2019-02-26 Thread Kostas Kloudas
Hi Kevin, I cannot find anything obviously wrong from what you describe. Just to eliminate the obvious, you are specifying "hdfs" as the scheme for your file path, right? Cheers, Kostas On Tue, Feb 26, 2019 at 3:35 PM Till Rohrmann wrote: > Hmm good question, I've pulled in Kostas who worked

Re: StreamingFileSink causing AmazonS3Exception

2019-02-18 Thread Kostas Kloudas
gt;> >> This whole thing is documented here: >> https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/best-practices.html >> >> However, I found that just using the documented property didn't appear to >> work and I had to wrap the InputStream in the BufferedInputStr

Re: fllink 1.7.1 and RollingFileSink

2019-02-14 Thread Kostas Kloudas
;>} >>>>>>>>> >>>>>>>>>@Override >>>>>>>>>public boolean >>>>>>>>> shouldRollOnEve

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Kostas Kloudas
long time contributor and member of our community. >>>> He is starting and participating in lots of discussions on our mailing >>>> lists, working on topics that are of joint interest of Flink and Beam, and >>>> giving talks on Flink at many events. >>>> >>&g

Re: Exactly Once Guarantees with StreamingFileSink to S3

2019-02-07 Thread Kostas Kloudas
restore as well, which > led me to believe that we were only calling commit and not > commitAfterRecovery. > > Thanks for the clarification! > -Kaustubh > > On Wed, Feb 6, 2019 at 2:16 AM Kostas Kloudas wrote: > >> Hi Kaustubh, >> >> Your general understanding

Re: Avro serialization and deserialization to Kafka in Scala

2019-02-07 Thread Kostas Kloudas
Hi Wouter, I think Gordon or Igal are the best to answer this question. Cheers, Kostas On Thu, Feb 7, 2019 at 11:04 AM Wouter Zorgdrager wrote: > Hello all, > > > I saw the recent updates in Flink related to supporting Avro schema > evolution in state. I'm curious how Flink handles this

Re: Flink and S3 AWS keys rotation

2019-02-07 Thread Kostas Kloudas
Hi Antonio, I am cc'ing Till who may have something to say on this. Cheers, Kostas On Thu, Feb 7, 2019 at 1:32 PM Antonio Verardi wrote: > Hi there, > > I'm trying out to run Flink on Kubernetes and I run into a problem with > the way Flink sets up AWS credentials to talk with S3 and the way

Re: Exactly Once Guarantees with StreamingFileSink to S3

2019-02-06 Thread Kostas Kloudas
Hi Kaustubh, Your general understanding is correct. In this case though, the sink will call the S3Committer#commitAfterRecovery() method. This method, after failing to commit the MPU, it will check if the file is there and if the length is correct, and if everything is ok (which is the case in

Re: Is there a way to find the age of an element in a Global window?

2019-01-18 Thread Kostas Kloudas
Hi Harshith, The evictor has 2 methods: void evictBefore(Iterable> elements, int size, W window, EvictorContext evictorContext); void evictAfter(Iterable> elements, int size, W window, EvictorContext evictorContext); In the iterables, you have access to the elements and their timestamps, and the

Re: StreamingFileSink cannot get AWS S3 credentials

2019-01-16 Thread Kostas Kloudas
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems.html#built-in-file-systems > [2] https://issues.apache.org/jira/browse/FLINK-10383 > > Cheers, > Till > > On Wed, Jan 16, 2019 at 10:10 AM Kostas Kloudas > wrote: > >> Hi Taher, >> >

Re: StreamingFileSink cannot get AWS S3 credentials

2019-01-16 Thread Kostas Kloudas
Hi Taher, So you are using the same configuration files and everything and the only thing you change is the "s3://" to "s3a://" and the sink cannot find the credentials? Could you please provide the logs of the Task Managers? Cheers, Kostas On Wed, Jan 16, 2019 at 9:13 AM Dawid Wysakowicz

Re: SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Kostas Kloudas
Thanks you for the clarification, also can you please point > how StreamingFileSink uses TwoPhaseCommit. Can you also point out the > implementing class for that? > > > Regards, > Taher Koitawala > GS Lab Pune > +91 8407979163 > > > On T

Re: Multiple MapState vs single nested MapState in stateful Operator

2019-01-10 Thread Kostas Kloudas
Hi Gagan, I agree with Congxian! In MapState, when accessing the state/value associated with a key in the map, then the whole value is de-serialized (and serialized in case of a put()). Given this, it is more efficient to have many keys, with small state, than fewer keys with huge state. Cheers,

Re: [DISCUSS] Dropping flink-storm?

2019-01-10 Thread Kostas Kloudas
+1 to drop as well. On Thu, Jan 10, 2019 at 10:15 AM Ufuk Celebi wrote: > +1 to drop. > > I totally agree with your reasoning. I like that we tried to keep it, > but I don't think the maintenance overhead would be justified. > > – Ufuk > > On Wed, Jan 9, 2019 at 4:09 PM Till Rohrmann wrote: >

Re: SteamingFileSink with TwoPhaseCommit

2019-01-10 Thread Kostas Kloudas
Hi Taher, The StreamingFileSink implements a version of TwoPhaseCommit. Can you elaborate a bit on what do you mean by " TwoPhaseCommit is not being used"? Cheers, Kostas On Thu, Jan 10, 2019 at 9:29 AM Taher Koitawala wrote: > Hi All, > As per my understanding and the API of

Re: S3 StreamingFileSink never completes multipart uploads

2019-01-07 Thread Kostas Kloudas
ave > worded it better that it is one possible explanation. > > Sorry for the confusion! > > Addison > > > > > > On Fri, Jan 4, 2019 at 11:24 PM Kostas Kloudas wrote: > >> Hi Addison, >> >> From the information that Nick provides, how can you

Re: S3 StreamingFileSink never completes multipart uploads

2019-01-04 Thread Kostas Kloudas
Hi Addison, >From the information that Nick provides, how can you be sure that the root cause is the same? Cheers, Kostas On Fri, Jan 4, 2019, 22:10 Addison Higham Hi Nick, > > This is a known issue with 1.7.0, I have an issue opened up here: > https://issues.apache.org/jira/browse/FLINK-11187

Re: StreamingFileSink causing AmazonS3Exception

2018-12-14 Thread Kostas Kloudas
Hi Steffen, Thanks for reporting this. Internally Flink does not keep any open connections to S3. It only keeps buffers data internally up till the point they reach a min-size limit (by default 5MB) and then uploads them as a part of an MPU on one go. Given this, I will have to dig a bit dipper

Re: Stream in loop and not getting to sink (Parquet writer )

2018-12-03 Thread Kostas Kloudas
I guess I am doing something wrong there . if there is a > good example for that - it will be great . > > BR > Avi > > On Mon, Dec 3, 2018 at 4:11 PM Kostas Kloudas > wrote: > >> Hi Avi, >> >> For Bulk Formats like Parquet, unfortunately, we do not suppor

Re: Stream in loop and not getting to sink (Parquet writer )

2018-12-03 Thread Kostas Kloudas
. Cheers, Kostas On Sun, Dec 2, 2018 at 9:29 PM Avi Levi wrote: > Thanks Kostas. I will definitely look into that. but is the > StreamingFileSink also support setting the batch size by size and/or by > time interval like bucketing sink ? > > On Sun, Dec 2, 2018 at 5:09 PM Kostas K

Re: Stream in loop and not getting to sink (Parquet writer )

2018-12-02 Thread Kostas Kloudas
is is 400 MB, > bucketingSink.setBatchRolloverInterval(20 * 60 * 1000); // this is 20 mins > > > On Fri, Nov 30, 2018 at 3:59 PM Kostas Kloudas < > k.klou...@data-artisans.com> wrote: > >> And for a Java example which is actually similar to your pipeline, >> you

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-30 Thread Kostas Kloudas
And for a Java example which is actually similar to your pipeline, you can check the ParquetStreamingFileSinkITCase. On Fri, Nov 30, 2018 at 2:39 PM Kostas Kloudas wrote: > Hi Avi, > > At a first glance I am not seeing anything wrong with your code. > Did you verify that there

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-30 Thread Kostas Kloudas
gt; genericReocrd.put("ts", r.ts) > genericReocrd > } > stream.addSink { r => > println(s"In Sink $r") //getting this line > streamingSink > } > env.execute() > } > > Cheers > Avi > > On Thu, Nov 29, 20

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
close > } > in this case only the first record will be included in the file but not > the rest of the stream. > > > On Thu, Nov 29, 2018 at 11:07 AM Kostas Kloudas < > k.klou...@data-artisans.com> wrote: > >> Hi again Avi, >> >> In the first example tha

Re: Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Kostas Kloudas
to implement a custom source. Cheers, Kostas On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas wrote: > Hi Felipe, > > This seems related to your previous question about a custom scheduler that > knows which task to run on which machine. > As Chesnay said, this is a rather involved and

Re: Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Kostas Kloudas
Hi Felipe, This seems related to your previous question about a custom scheduler that knows which task to run on which machine. As Chesnay said, this is a rather involved and laborious task, if you want to do it as a general framework. But if you know what operation to push down, then why not

Re: Memory does not be released after job cancellation

2018-11-29 Thread Kostas Kloudas
Hi Nastaran, Can you specify what more information do you need? >From the discussion that you posted: 1) If you have batch jobs, then Flink does its own memory management (outside the heap, so it is not subject to JVM's GC) and although when you cancel the job, you do not see the memory

Re: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Kostas Kloudas
Hi Bernd, I think the Till, Stefan or Stephan (cc'ed) are the best to answer your question. Cheers, Kostas

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
Hi again Avi, In the first example that you posted (the one with the Kafka source), do you call env.execute()? Cheers, Kostas On Thu, Nov 29, 2018 at 10:01 AM Kostas Kloudas wrote: > Hi Avi, > > In the last snippet that you posted, you have not activated checkpoints. > &

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
Hi Avi, In the last snippet that you posted, you have not activated checkpoints. Checkpoints are needed for the StreamingFileSink to produce results, especially in the case of BulkWriters (like Parquet) where the part file is rolled upon reception of a checkpoint and the part is finalised (i.e.

Re: The best way to get processing time of each operator?

2018-10-23 Thread Kostas Kloudas
Hi Folani, Metrics is definitely one way, while the other can be that, depending on your job, if you have e.g. processFunctions, you can always attach different timestamps (depending on what you want to measure) and based on these, do the computations you need. Based on this you can for example

Re: FlinkKafkaProducer and Confluent Schema Registry

2018-10-22 Thread Kostas Kloudas
Hi Olga, Sorry for the late reply. I think that Gordon (cc’ed) could be able to answer your question. Cheers, Kostas > On Oct 13, 2018, at 3:10 PM, Olga Luganska wrote: > > Any suggestions? > > Thank you > > Sent from my iPhone > > On Oct 9, 2018, at 9:28 PM, Olga Luganska

Re: Are savepoints / checkpoints co-ordinated?

2018-10-22 Thread Kostas Kloudas
Hi Anand, Did the suggestion solve your issue? Essentially when you cancel with savepoint, as Congxian suggested, you stop emitting checkpoints, but data keep flowing from the source to the sink. So if you do not set the producer to exactly once, you will almost certainly end up with

Re: Mapstatedescriptor

2018-10-22 Thread Kostas Kloudas
Hi Szymon, Dominik is right. The “name” refers to the whole state described by the specified descriptor. Kostas > On Oct 13, 2018, at 10:30 AM, Dominik Wosiński wrote: > > Hey, > It's the name for the whole descriptor. Not the keys, it means that no other > descriptor should be created with

Re: Error restoring from savepoint while there's no modification to the job

2018-10-15 Thread Kostas Kloudas
Hi Averell, This could be the root cause of your problem! Thanks for digging into it. Would it be possible for you to verify that this is your problem by manually setting the UUID and seeing if the problem disappears? In addition, please file a JIRA. Thanks a lot, Kostas > On Oct 15, 2018,

Re: Flink 1.4: Queryable State Client

2018-10-15 Thread Kostas Kloudas
Hi Seye, Thanks for digging into the problem. As Vino and Jorn suggested, this looks like a bug and please file a JIRA issue. It would be also nice if you could post it here so that we know the related discussion. Cheers, Kostas > On Oct 14, 2018, at 9:46 AM, Jörn Franke wrote: > > You

Re: [BucketingSink] notify on moving into pending/ final state

2018-10-12 Thread Kostas Kloudas
Hi Rinat, I have commented on your PR and on the JIRA. Let me know what you think. Cheers, Kostas > On Oct 11, 2018, at 4:45 PM, Dawid Wysakowicz wrote: > > Hi Ribat, > I haven't checked your PR but we introduced a new connector in flink 1.6 > called StreamingFileSink that is supposed to

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Kostas Kloudas
You restore your job with the custom source from a savepoint taken without the custom source? > On Oct 10, 2018, at 11:34 AM, Averell wrote: > > Hi Kostas, > > Yes, I modified ContinuousFileMonitoringFunction to add one more > ListState. The error might/should have come from that, but I

Re: Error restoring from savepoint while there's no modification to the job

2018-10-10 Thread Kostas Kloudas
Hi Averell, In the logs there are some “Split Reader: Custom File Source:” This is a custom source you implemented? Also is your keySelector deterministic with proper equals and hashcode methods? Cheers, Kostas > On Oct 10, 2018, at 10:50 AM, Averell wrote: > > Hi Stefan, Dawid, > > I

Re: BroadcastStream vs Broadcasted DataStream

2018-10-09 Thread Kostas Kloudas
Hi Pieter-Jan, The second variant stores the elements of the broadcasted stream in operator (thus non-keyed) state. On the differences: The Broadcast stream is not a keyed stream, so you are not in a keyed context, thus you have no access to keyed state. Given this, and assuming that you are

Re: flink memory management / temp-io dir question

2018-10-08 Thread Kostas Kloudas
Sorry, I forgot to cc’ Till. > On Oct 8, 2018, at 2:17 PM, Kostas Kloudas > wrote: > > Hi Anand, > > I think that Till is the best person to answer your question. > > Cheers, > Kostas > >> On Oct 5, 2018, at 3:44 PM, anand.gopin...@ubs.com >

Re: flink memory management / temp-io dir question

2018-10-08 Thread Kostas Kloudas
Hi Anand, I think that Till is the best person to answer your question. Cheers, Kostas > On Oct 5, 2018, at 3:44 PM, anand.gopin...@ubs.com wrote: > > Hi , > I had a question with respect flink memory management / overspill to /tmp. > > In the docs >

Re: error in using kafka in flink

2018-10-08 Thread Kostas Kloudas
Hi Marzieh, This is because of a mismatch between your Kafka version and the one your job assumes (0.8). You should use an older Kafka version (0.8) for the job to run out-of-the-box or update your job to use FlinkKafkaProducer011. Cheers, Kostas > On Oct 6, 2018, at 2:13 PM, marzieh

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi, Yes, please enable DEBUG to streaming to see all the logs also from the StreamTask. A checkpoint is “valid” as soon as it get acknowledged. As the documentation says, the job will restart from “ the last **successful** checkpoint” which is the most recent acknowledged one. Cheers, Kostas

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
. Kostas > On Oct 7, 2018, at 12:37 PM, Kostas Kloudas > wrote: > > Hi Averell, > > Could you set your logging to DEBUG? > This may shed some light on what is happening as it will contain more logs. > > Kostas > >> On Oct 7, 2018, at 11:03 AM, Averell wrote

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi Averell, Could you set your logging to DEBUG? This may shed some light on what is happening as it will contain more logs. Kostas > On Oct 7, 2018, at 11:03 AM, Averell wrote: > > Hi Kostas, > > I'm using a build with your PR. However, it seemed the issue is not with S3, > as when I tried

Re: Streaming to Parquet Files in HDFS

2018-10-07 Thread Kostas Kloudas
Hi Averell, From the logs, only checkpoint 2 was acknowledged (search for “eceived completion notification for checkpoint with id=“) and this is why no more files are finalized. So only checkpoint 2 was successfully completed. BTW you are using the PR you mentioned before or Flink 1.6? I am

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Kostas Kloudas
Hi Averell, There is no such “out-of-the-box” solution, but there is an open PR for adding S3 support to the StreamingFileSink [1]. Cheers, Kostas [1] https://github.com/apache/flink/pull/6795 > On Oct 5, 2018, at 11:14 AM, Averell wrote: > > Hi

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Kostas Kloudas
Hi Averell, You are right that for Bulk Formats like Parquet, we roll on every checkpoint. This is currently a limitation that has to do with the fact that bulk formats gather and rely on metadata that they keep internally and which we cannot checkpoint in Flink,as they do not expose them.

Re: [DISCUSS] Dropping flink-storm?

2018-09-29 Thread Kostas Kloudas
+1 to drop it as nobody seems to be willing to maintain it and it also stands in the way for future developments in Flink. Cheers, Kostas > On Sep 29, 2018, at 8:19 AM, Tzu-Li Chen wrote: > > +1 to drop it. > > It seems few people use it. Commits history of an experimental > module sparse

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Kostas Kloudas
Hi Averell, > On Sep 27, 2018, at 3:09 PM, Averell wrote: > > Hi Kostas, > > Yes, I want them as metrics, as they are purely for monitoring purpose. > There's no need of fault tolerance. > > If I use side-output, for example for that metric no.1, I would need a > tumbling AllWindowFunction,

Re: New received containers silently lost during job auto restarts

2018-09-27 Thread Kostas Kloudas
Hi Paul, I am also cc’ing Till and Gary who may be able to help, but to give them more information, it would help if you told us which Flink version you are using. Cheers, Kostas > On Sep 27, 2018, at 1:24 PM, Paul Lam wrote: > > Hi, > > One of my Flink on YARN jobs got into a weird

Re: Question about sharing resource among slots with a TM

2018-09-27 Thread Kostas Kloudas
Hi Vishal, Currently there is no way to share (user-defined) resources between tasks on the same TM. So I suppose that a singleton is the best way to go for now. Cheers, Kostas > On Sep 27, 2018, at 3:43 AM, Hequn Cheng wrote: > > Hi vishal, > > Yes, we can define a static connection to

Re: Metrics: (1) Histogram in prometheus and (2) meter by event_time

2018-09-27 Thread Kostas Kloudas
Hi Averell, From what I understand for your use case, it is possible to do what you want with Flink. If you are implementing a function, then you have access to the metric system through the runtime context (see [1] for more information). Some things to take into consideration: 1) Metrics

Re: Scheduling sources

2018-09-26 Thread Kostas Kloudas
Hi Averell, If the 2a fits in memory, then you can load the data to all TMs in the open() method of any rich function, eg. ProcessFunction [1]. The open() runs before any data is allowed to flow in your pipeline from the sources. Cheers, Kostas [1]

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-25 Thread Kostas Kloudas
I see, Thanks for the clarification. Cheers, Kostas > On Sep 25, 2018, at 8:51 AM, Averell wrote: > > Hi Kostas, > > I use PROCESS_CONTINUOUSLY mode, and checkpoint interval of 20 minutes. When > I said "Within that 15 minutes, checkpointing process is not triggered > though" in my previous

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-24 Thread Kostas Kloudas
Hi Averell, Happy to hear that the problem is no longer there and if you have more news from your debugging, let us know. The thing that I wanted to mention is that from what you are describing, the problem does not seem to be related to checkpointing, but to the fact that applying your

Re: Setting a custom Kryo serializer in Flink-Python

2018-09-17 Thread Kostas Kloudas
Hi Joe, Probably Chesnay (cc’ed) may have a better idea on why this is happening. Cheers, Kostas > On Sep 14, 2018, at 7:30 PM, Joe Malt wrote: > > Hi, > > I'm trying to write a Flink job (with the Python streaming API) that handles > a custom type that needs a custom Kryo serializer. > >

Re: Question regarding state cleaning using timer

2018-09-17 Thread Kostas Kloudas
Hi Bhaskar, If you want different TTLs per key, then you should use timers with a process function as shown in [1]. This is though an old presentation, so now the RichProcessFunction is a KeyedProcessFunction. Also please have a look at the training material in [2] and the process function

Re: questions about YARN deployment and HDFS integration

2018-09-17 Thread Kostas Kloudas
Hi Chiang, Some of the answers you can find in line: > On Sep 17, 2018, at 3:47 PM, Chang Liu wrote: > > Dear All, > > I am helping my team setup a Flink cluster and we would like to have high > availability and easy to scale. > > We would like to setup a minimal cluster environment but can

Re: ListState - elements order

2018-09-17 Thread Kostas Kloudas
Oops, Sorry but I lost part of the discussion that had already been made. Please ignore my previous answer. Kostas > On Sep 17, 2018, at 4:37 PM, Kostas Kloudas > wrote: > > Hi all, > > Flink does not provide any guarantees about the order of the elements in a &g

Re: ListState - elements order

2018-09-17 Thread Kostas Kloudas
Hi all, Flink does not provide any guarantees about the order of the elements in a list and it leaves it to the state-backends. This means that semantics between different backends may differ, and even if something holds now for one of them, if RocksDB or a filesystem decides to change its

Re: Lookup from state

2018-09-17 Thread Kostas Kloudas
Hi Taher, To understand your use case, you have something like the following: stream1.keyBy(…) .connect(stream2.keyBy(…)) .window(…).apply(MyWindowFunction) and you want from within the MyWindowFunction to access the state for a FIRED window when a late element arrives for that

Re: Flink 1.3.2 RocksDB map segment fail if configured as state backend

2018-09-17 Thread Kostas Kloudas
Hi Andrea, I think that Andrey and Stefan (cc’ed) may be able to help. Kostas > On Sep 17, 2018, at 11:37 AM, Andrea Spina wrote: > > Hi everybody, > > I run with a Flink 1.3.2 installation on a Red Hat Enterprise Linux Server > and I'm not able to set rocksdb as state.backend due to this

Re: Problem with querying state on Flink 1.6.

2018-09-12 Thread Kostas Kloudas
ml > > <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html> > [2] https://github.com/jolson787/qs <https://github.com/jolson787/qs> > On Mon, Sep 10, 2018 at 7:13 AM Kostas Kloudas <mailto:k.klou...@data-artisans.com>> wrot

Re: Problem with querying state on Flink 1.6.

2018-09-10 Thread Kostas Kloudas
Hi Joe, Did the problem get resolved at the end? Thanks, Kostas > On Aug 30, 2018, at 9:06 PM, Eron Wright wrote: > > I took a brief look as to why the queryable state server would bind to the > loopback address. Both the qs server and the >

Re: Question about QueryableState

2018-08-27 Thread Kostas Kloudas
; > Le jeu. 23 août 2018 à 14:31, Kostas Kloudas <mailto:k.klou...@data-artisans.com>> a écrit : > Hi Pierre, > > You are right that this should not happen. > It seems like a bug. > Could you open a JIRA and post it here? > > Thanks, > Kostas > > >&

Re: Question about QueryableState

2018-08-23 Thread Kostas Kloudas
Hi Pierre, You are right that this should not happen. It seems like a bug. Could you open a JIRA and post it here? Thanks, Kostas > On Aug 21, 2018, at 9:35 PM, Pierre Zemb wrote: > > Hi! > > I’ve started to deploy a small Flink cluster (4tm and 1jm for now on 1.6.0), > and deployed a small

Re: Some questions about the StreamingFileSink

2018-08-22 Thread Kostas Kloudas
Hi Benoit, Thanks for using the StreamingFileSink. My answers/explanations are inlined. In most of your observations, you are correct. > On Aug 21, 2018, at 11:45 PM, Benoit MERIAUX wrote: > > Hi, > > I have some questions about the new StreamingFileSink in 1.6. > > My usecase is pretty

Re: Slide Window Compute Optimization

2018-07-05 Thread Kostas Kloudas
Hi, You are correct that with sliding windows you will have 3600 “open windows” at any point. Could you describe a bit more what you want to do? If you simply want to have an update of something like a counter every second, then you can implement your own logic with a ProcessFunction that

Re: [Flink-Forward]Why cant get video of 2018 forward in youtube?

2018-06-14 Thread Kostas Kloudas
Hi Aitozi, You can find the videos here: https://data-artisans.com/flink-forward-san-francisco-2018 Kostas > On Jun 14, 2018, at 11:18 AM, aitozi wrote: > > Hi, community, > > Why cant we get the talk of 2018 Flink forward

Re: Writing stream to Hadoop

2018-06-05 Thread Kostas Kloudas
Hi Miki, Have you enabled checkpointing? Kostas > On Jun 5, 2018, at 11:14 AM, miki haiat wrote: > > Im trying to write some data to Hadoop by using this code > > The state backend is set without time > StateBackend sb = new >

Re: Best way to clean-up states in memory

2018-05-14 Thread Kostas Kloudas
Hi Ashish, It would be helpful to share the code of your custom trigger for the first case. Without that, we cannot tell what state you create and how/when you update/clear it. Cheers, Kostas > On May 14, 2018, at 1:04 AM, ashish pok wrote: > > Hi Till, > > Thanks for

Re: Streaming and batch jobs together

2018-05-09 Thread Kostas Kloudas
Hi Flavio, Flink has no inherent limitations as far as state size is concerned, apart from the fact that the state associated to a single key (not the total state) should fit in memory. For production use, it is also advised to use the RocksDB state backend, as this will allow you to spill on

Re: jvm options for individual jobs / topologies

2018-05-08 Thread Kostas Kloudas
Hi Benjamin, I do not think you can set per job memory configuration in a shared cluster. The reason is that if different jobs share the same TM, there are resources that are shared between them, e.g, network buffers. If you are ok with having a separate cluster per job then this will allow you

Re: Streaming and batch jobs together

2018-05-08 Thread Kostas Kloudas
Hi Flavio, If I understand correctly, you have a set of keys which evolves in two ways: keys may be added/deleted values associated with the keys can also be updated. If this is the case, you can use a streaming job that: 1. has as a source the stream of events

Re: Taskmanager with multiple slots vs running multiple taskmanagers with 1 slot each

2018-05-08 Thread Kostas Kloudas
Hi Andre, I cannot speak on behalf of everyone but I would recommend 1 TM with multiple slots. This way you pay the “fixed costs” of running a TM (like allocating memory for network buffers, launching thread pools, exchanging heartbeat messages etc) only once. On the flip-side, this means that

Re: sharebuffer prune code

2018-04-05 Thread Kostas Kloudas
Hi Aitozi, I think you are correct. Could you open a JIRA and share it here? Thanks, Kostas > On Apr 2, 2018, at 7:19 AM, aitozi wrote: > > Hi, > > i am running into a cep bug : it always running into failed to find previous > sharebufferEntry, i think it may be caused

Re: Job restart hook

2018-04-04 Thread Kostas Kloudas
o explore this options and see if it > works. One small question on the same is can we restore from checkpoints with > different parallelism? > > On Tue, Apr 3, 2018 at 2:48 AM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: >

Re: bad data output

2018-04-03 Thread Kostas Kloudas
Hi Darshan, You can use side outputs [1] and a process function to split the data in as many streams as you want, e.g. correct, fixable and wrong. Each side output will be a separate stream that your can process individually. You can always send the “bad data” directly from your process

Re: Job restart hook

2018-04-03 Thread Kostas Kloudas
Hi Navneeth, If I understand correctly, you have a job with parallelism p=20, a TM goes down (eg. with 4 slots), and you want until the TM comes up, to run the job with p=16 and then re-running it with 20 again, when the TM comes up. If this is the case, one important thing to keep in mind is

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Kostas Kloudas
Hi Puneet, If you mean that after processing a file, you want to move it to another directory outside the one containing the data to be processed, then I am afraid that this is currently not possible. This is because the whole logic of how to treat files is included in your FileInputFormat.

Re: Queryable State

2018-03-21 Thread Kostas Kloudas
Hi Vishal, As Fabian said, queryable state is just a feature that exposes the state kept within Flink, and it is not made to replace functionality that would otherwise be made by a sink. In the future the functionality will definitely evolve but for there are no discussions currently, for

Re: State serialization problem when we add a new field in the object

2018-03-14 Thread Kostas Kloudas
Hi Konstantin, What you could do, is that you write and intermediate job that has the old ValueState “oldState” and the new one “newState”, with the new format. When an element comes in this intermediate job, you check the oldState if it is empty for that key or not. If it is null (empty),

Re: Dynamic CEP https://issues.apache.org/jira/browse/FLINK-7129?subTaskView=all

2018-03-08 Thread Kostas Kloudas
Hi Vishal, Dawid (cc’ed) who was working on that stopped because in the past Flink did not support broadcast state. This is now added (in the master) and the implementation of FLINK-7129 will continue hopefully soon. Cheers, Kostas > On Mar 8, 2018, at 4:09 PM, Vishal Santoshi

Re: Simple CEP pattern

2018-03-07 Thread Kostas Kloudas
> Yes I have access to the flink source code, but could you explain little bit > more what to do with it in this case ? > > Best, Esa > > From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] > Sent: Wednesday, March 7, 2018 3:51 PM > To: Esa Heikki

Re: CEP issue

2018-03-07 Thread Kostas Kloudas
tever the reason may be. > > Again I would also advise ( though not a biggy ) that strategic debug > statements in the CEP core would help folks to see what actually happens. We > instrumented the code to follow the construction of NFA that was very > helpful. > > On W

Re: CEP issue

2018-03-07 Thread Kostas Kloudas
es with false but that can be expensive ) and stop > a pattern. One question I had is that an NFA can be in a FinalState or a > StopState. > > What would constitute a StopState ? > > On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k

Re: Simple CEP pattern

2018-03-07 Thread Kostas Kloudas
be because I am new with FlinkCEP. > > Often I don’t know is it problem with “pattern” or “select”, because no > results.. Is there any way to debug CEP’s operations ? > > Best, Esa > > From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] > Sent: Wednesday, March

<    1   2   3   4   >