Re: NPE when using spring bean in custom input format

2019-01-21 Thread Piotr Nowojski
Hi, You have to use `open()` method to handle initialisation of the things required by your code/operators. By the nature of the LocalEnvironment, the life cycle of the operators is different there compared to what happens when submitting a job to the real cluster. With remote environments your

Re: Query on retract stream

2019-01-21 Thread Piotr Nowojski
Hi, There is a missing feature in Flink Table API/SQL of supporting retraction streams as the input (or conversions from append stream to retraction stream) at the moment. With that your problem would simplify to one simple `SELECT uid, count(*) FROM Changelog GROUP BY uid`. There is an ongoing

Re: Query on retract stream

2019-01-21 Thread Piotr Nowojski
. > And then you just need to analyze the events in this window. > > Piotr Nowojski mailto:pi...@da-platform.com>> > 于2019年1月21日周一 下午8:44写道: > Hi, > > There is a missing feature in Flink Table API/SQL of supporting retraction > streams as the input (or conversions

Re: Sampling rate higher than 1Khz

2019-01-28 Thread Piotr Nowojski
Hi, Maybe stupid idea, but does anything prevents a user from pretending that watermarks/event times are in different unit, for example microseconds? Of course assuming using row/event time and not using processing time for anything? Piotrek > On 28 Jan 2019, at 14:58, Tzu-Li (Gordon) Tai wr

Re: Flink Custom SourceFunction and SinkFunction

2019-03-04 Thread Piotr Nowojski
Hi, I couldn’t find any references to your question neither I haven’t seen such use case, but: Re 1. It looks like it could work Re 2. It should work as well, but just try to use StreamingFileSink Re 3. For custom source/sink function, if you do not care data processing guarantees it’s quite

Re: Setting source vs sink vs window parallelism with data increase

2019-03-04 Thread Piotr Nowojski
Hi, What Flink version are you using? Generally speaking Flink might not the best if you have records fan out, this may significantly increase checkpointing time. However you might want to first identify what’s causing long GC times. If there are long GC pause, this should be the first thing

Re: Task slot sharing: force reallocation

2019-03-04 Thread Piotr Nowojski
Hi, Are you asking the question if that’s the behaviour or you have actually observed this issue? I’m not entirely sure, but I would guess that the Sink tasks would be distributed randomly across the cluster, but maybe I’m mixing this issue with resource allocations for Task Managers. Maybe Til

Re: Command exited with status 1 in running Flink on marathon

2019-03-04 Thread Piotr Nowojski
Hi, With just this information it might be difficult to help. Please look for some additional logs (has the Flink managed to log anything?) or some standard output/errors. I would guess this might be some relatively simple mistake in configuration, like file/directory read/write/execute permis

Re: [Flink-Question] In Flink parallel computing, how do different windows receive the data of their own partition, that is, how does Window determine which partition number the current Window belongs

2019-03-04 Thread Piotr Nowojski
Hi, I’m not if I understand your question/concerns. As Rong Rong explained, key selector is used to assign records to window operators. Within key context, you do not have access to other keys/values in your operator/functions, so your reduce/process/… functions when processing key:1 won’t b

Re: StochasticOutlierSelection

2019-03-04 Thread Piotr Nowojski
Hi, I have never used this code, but ml library depends heavily on Scala, so I wouldn’t recommend using it with Java. However if you want to go this way (I’m not sure if that’s possible), you would have to pass the implicit parameters manually somehow (I don’t know how to do that from Java).

Re: event time timezone is not correct

2019-03-04 Thread Piotr Nowojski
Hi, I think that Flink SQL works currently only in UTC, so the 8 hours difference is a result of you using GMT+8 time stamps somewhere. Please take a look at this thread: http://mail-archives.apache.org/mod_mbox/flink-user/201711.mbox/%3c2e1eb190-26a0-b288-39a4-683b463f4...@apache.org%3E I thi

Re: Flink parallel subtask affinity of taskmanager

2019-03-04 Thread Piotr Nowojski
Hi, You should be able to use legacy mode for this: https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy However note that this option will disappear in the near future and there is a JIRA ticket to address this issue: https://issues.apache.org/jira/browse/FLINK-11815

Re: Task slot sharing: force reallocation

2019-03-05 Thread Piotr Nowojski
gt; placing map 1 of job 2 on machine 2, map 1 of job 3 on machine 3 so we end up > with sinks sit evenly throughout the cluster). > > Thanks. > > Le > > On Mon, Mar 4, 2019 at 6:49 AM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi, > > Are yo

Re: event time timezone is not correct

2019-03-05 Thread Piotr Nowojski
n > >> 在 2019年3月4日,下午11:29,Piotr Nowojski > <mailto:pi...@ververica.com>> 写道: >> >> Hi, >> >> I think that Flink SQL works currently only in UTC, so the 8 hours >> difference is a result of you using GMT+8 time stamps somewhere. Please take >&g

Re: Command exited with status 1 in running Flink on marathon

2019-03-05 Thread Piotr Nowojski
.org/projects/flink/flink-docs-stable/ops/deployment/mesos.html> > > But I did not install Hadoop. Is problem for that? Since HDFS was commented. > I did not change it. > > On Mon, Mar 4, 2019 at 4:40 PM Piotr Nowojski <mailto:pi...@ververica.com>> wrote: > Hi, &

Re: Setting source vs sink vs window parallelism with data increase

2019-03-06 Thread Piotr Nowojski
causes > a problem, or > 2) Is is the shuffle of all the records after the expansion which is taking a > large time - if so, is there anything I can do to mitigate this other than > trying to ensure less shuffle. > > Thanks, > Padarn > > > On Tue, Mar 5, 2019 a

Re: How to join stream and dimension data in Flink?

2019-03-14 Thread Piotr Nowojski
out, o.currency, r.rate, o.amount * r.rate > FROM > Orders AS o > LEFT JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r > ON r.currency = o.currency > > CC @Piotr Nowojski <mailto:pi...@data-artisans.com> Would be great to have > your opinions here. >

Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-20 Thread Piotr Nowojski
lt;http://codespeed.dak8s.net:8000/timeline/?ben=tumblingWindow&env=2> Piotr Nowojski > On 20 Mar 2019, at 10:09, Aljoscha Krettek wrote: > > Thanks Jincheng! It would be very good to fix those but as you said, I would > say they are not blockers. > >> On 20. Mar 2019,

Re: StochasticOutlierSelection

2019-03-21 Thread Piotr Nowojski
Java, I chose the MOA library in combination with flink API for > anomaly detection streaming which gives quite satisfactory results. > > Best, > > Anissa > > > > Le lun. 4 mars 2019 à 16:08, Piotr Nowojski <mailto:pi...@ververica.com>> a écrit : > &g

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Piotr Nowojski
Hi, I’m not sure, but shouldn’t you be just reading committed files and ignore in-progress? Maybe Kostas could add more insight to this topic. Piotr Nowojski > On 20 Mar 2019, at 12:23, Rafi Aroch wrote: > > Hi, > > I'm trying to stream events in Prorobuf format into

Re: Best practice to handle update messages in stream

2019-03-21 Thread Piotr Nowojski
anyway (like join). Piotr Nowojski [1] https://issues.apache.org/jira/browse/FLINK-8545 <https://issues.apache.org/jira/browse/FLINK-8545> > On 21 Mar 2019, at 09:39, 徐涛 wrote: > > Hi Experts, > Assuming there is a stream which content is like this: >Seq

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-27 Thread Piotr Nowojski
further information that could be helpful. > > Would appreciate your help. > > Thanks, > Rafi > > > On Thu, Mar 21, 2019 at 5:20 PM Kostas Kloudas <mailto:kklou...@gmail.com>> wrote: > Hi Rafi, > > Piotr is correct. In-progress files are not necessa

Re: Setting source vs sink vs window parallelism with data increase

2019-03-27 Thread Piotr Nowojski
r help again. > > On Wed, Mar 6, 2019 at 9:44 PM Padarn Wilson <mailto:pad...@gmail.com>> wrote: > Thanks a lot for your suggestion. I’ll dig into it and update for the mailing > list if I find anything useful. > > Padarn > > On Wed, 6 Mar 2019 at 6:03 PM, Piotr

Re: End-to-end exactly-once semantics in simple streaming app

2019-04-08 Thread Piotr Nowojski
Hi, Regarding the JDBC and Two-Phase commit (2PC) protocol. As Fabian mentioned it is not supported by the JDBC standard out of the box. With some workarounds I guess you could make it work by for example following one of the ideas: 1. Write records using JDBC with at-least-once semantics, by f

Re: Default Kafka producers pool size for FlinkKafkaProducer.Semantic.EXACTLY_ONCE

2019-04-11 Thread Piotr Nowojski
Hi Min and Fabian, The pool size is independent of the parallelism, task slots count or task managers count. The only thing that you should consider is how many simultaneous checkpoints you might have in your setup. As Fabian wrote, with > env.getCheckpointConfig().setMaxConcurrentCheckpoints(1

Re: RocksDB native checkpoint time

2019-05-03 Thread Piotr Nowojski
Hi Gyula, Have you read our tuning guide? https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb Synchronous part is mostly about flushing d

Re: DateTimeBucketAssigner using Element Timestamp

2019-05-03 Thread Piotr Nowojski
Hi Peter, It sounds like this should work, however my question would be do you want exactly-once processing? If yes, then you would have to somehow know which exact events needs re-processing or deduplicate them somehow. Keep in mind that in case of an outage in the original job, you probably w

Re: update the existing Keyed value state

2019-05-03 Thread Piotr Nowojski
m/king/bravo <https://github.com/king/bravo> Piotr Nowojski > On 3 May 2019, at 11:14, Selvaraj chennappan > wrote: > > Hi Users, > We want to have a real time aggregation (KPI) . > we are maintaining aggregation counters in the keyed value state . > key could be

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-03 Thread Piotr Nowojski
Hi Averell, I will be referring to your original two options: 1 (duplicating stream_C) and 2 (multiplexing stream_A and stream_B). Both of them could be expressed using Temporal Table Join. You could multiplex stream_A and stream_B in Table API, temporal table join them with stream_C and then

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

2019-05-14 Thread Piotr Nowojski
Hi, Sorry for late response, somehow I wasn’t notified about your e-mail. > > So you meant implementation in DataStreamAPI with cutting corners would, > generally, shorter than Table Join. I thought that using Tables would be > more intuitive and shorter, hence my initial question :) It depends

Re: State migration into multiple operators

2019-05-14 Thread Piotr Nowojski
Hi, Currently there is no native Flink support for modifying the state in a such manner. However there is an on-going effort [1] and a third party project [2] to address exactly this. Both allows you you to read savepoint, modify it and write back the new modified savepoint from which you can r

Re: Possilby very slow keyBy in program with no parallelism

2019-05-21 Thread Piotr Nowojski
Hi Theo, Regarding the performance issue. > None of my machine resources is fully utilized, i.e. none of the cluster CPU > runs at 100% utilization (according to htop). And the memory is virtually > available, but the RES column in htop states the processes uses 5499MB. By nature of stream pro

Re: Error while using session window

2019-06-17 Thread Piotr Nowojski
Hi, Thanks for reporting the issue. I think this might be caused by System.currentTimeMillis() not being monotonic [1] and the fact Flink is accessing this function per element multiple times (at least twice: first for creating a window, second to perform the check that has failed in your case)

Re: How to trigger the window function even there's no message input in this window?

2019-06-17 Thread Piotr Nowojski
Hi, As far as I know, this is currently impossible. You can workaround this issue by maybe implementing your own custom post processing operator/flatMap function, that would: - track the output of window operator - register processing time timer with some desired timeout - every time the process

Re: Timeout about local test case

2019-06-17 Thread Piotr Nowojski
Hi, I don’t know what’s the reason (also there are no open issues with this test in our jira). This test seems to be working on travis/CI and it works for me when I’ve just tried running it locally. There might be some bug in the test/production that is triggered only in some specific condition

Re: Best practice to process DB stored log (is Flink the right choice?)

2019-06-17 Thread Piotr Nowojski
Hi, Those are good questions. > A datastream to connect to a table is available? I What table, what database system do you mean? You can check the list of existing connectors provided by Flink in the documentation. About reading from relational DB (example by using JDBC) you can read a little

Re: Why the current time of my window never reaches the end time when I use KeyedProcessFunction?

2019-06-18 Thread Piotr Nowojski
Hi, Isn’t your problem that the source is constantly emitting the data and bumping your timers? Keep in mind that the code that you are basing on has the following characteristic: > In the following example a KeyedProcessFunction maintains counts per key, and > emits a key/count pair whenever

Re: Flink Kafka ordered offset commit & unordered processing

2019-07-02 Thread Piotr Nowojski
Hi, If your async operations are stalled, this will eventually cause problems. Either this will back pressure sources (the async’s operator queue will become full) or you will run out of memory (if you configured the queue’s capacity too high). I think the only possible solution is to either dr

Re: Flink Kafka ordered offset commit & unordered processing

2019-07-03 Thread Piotr Nowojski
Hi, > Will Flink able to recover under this scenario? I’m not sure exactly what you mean. Flink will be able to restore the state to the last successful checkpoint, and it well could be that the some records after this initial “stuck record” were processed and emitted down the stream. In th

Re: Checkpoints very slow with high backpressure

2019-07-31 Thread Piotr Nowojski
Hi, For Flink 1.8 (and 1.9) the only thing that you can do, is to try to limit amount of data buffered between the nodes (check Flink network configuration [1] for number of buffers and or buffer pool sizes). This can reduce maximal throughput (but only if the network transfer is a significant

Re: Will broadcast stream affect performance because of the absence of operator chaining?

2019-08-06 Thread Piotr Nowojski
Hi, Broadcasting will brake an operator chain. However my best guess is that Kafka source will be still a performance bottleneck in your job. Also Network exchanges add some measurable overhead only if your records are very lightweight and easy to process (for example if you are using RocksDB t

Re: Will broadcast stream affect performance because of the absence of operator chaining?

2019-08-06 Thread Piotr Nowojski
sage will carry its schema info among operators, it will cost about 2x for > serialization and deserialization between operators. > > Is there a better workaround that all the operators could notice the schema > change and at the same time not breaking the operator chaining? > > Than

Re: Will broadcast stream affect performance because of the absence of operator chaining?

2019-08-06 Thread Piotr Nowojski
he operator(for > example, operatorA), and other subtasks of the operator are not aware of it. > In this case is there anything I have missed? > > Thank you! > > > > > > > ------ Original -- > From: "Piotr Nowojski

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Piotr Nowojski
Congratulations :) > On 7 Aug 2019, at 12:09, JingsongLee wrote: > > Congrats Hequn! > > Best, > Jingsong Lee > > -- > From:Biao Liu > Send Time:2019年8月7日(星期三) 12:05 > To:Zhu Zhu > Cc:Zili Chen ; Jeff Zhang ; Paul Lam > ; jinch

Re: Kafka ProducerFencedException after checkpointing

2019-08-12 Thread Piotr Nowojski
rts after the last >> external checkpoint is committed. >> Now that I have 15min for Producer's transaction timeout and 10min for >> Flink's checkpoint interval, and every checkpoint takes less than 5 minutes, >> everything is working fine. >> Am I right?

Re: Kafka ProducerFencedException after checkpointing

2019-08-12 Thread Piotr Nowojski
ccurred. > > Best, > Tony Wei > > Piotr Nowojski mailto:pi...@data-artisans.com>> 於 > 2019年8月12日 週一 下午3:27寫道: > Hi, > > Yes, if it’s due to transaction timeout you will lose the data. > > Whether can you fallback to at least once, that depends on Kafka,

Re: How flink monitor source stream task(Time Trigger) is running?

2017-09-29 Thread Piotr Nowojski
We use Akka's DeathWatch mechanism to detect dead components. TaskManager failure shouldn’t prevent recovering from state (as long as there are enough task slots). I’m not sure if I understand what you mean by "source stream thread" crash. If is was some error during performing a checkpoint so

Re: How flink monitor source stream task(Time Trigger) is running?

2017-09-29 Thread Piotr Nowojski
Any exception thrown by your SourceFunction will be caught by Flink and that will mark a task (that was executing this SourceFuntion) as failed. If you started some custom threads in your SourceFunction, you have to manually propagate their exceptions to the SourceFunction. Piotrek > On Sep 29

Re: How flink monitor source stream task(Time Trigger) is running?

2017-09-29 Thread Piotr Nowojski
I am still not sure what do you mean by “thread crash without throw”. If SourceFunction.run methods returns without an exception Flink assumes that it has cleanly shutdown and that there were simply no more elements to collect/create by this task. If it continue working, without throwing an exc

Re: state of parallel jobs when one task fails

2017-09-29 Thread Piotr Nowojski
Hi, Yes, by default Flink will restart all of the tasks. I think that since Flink 1.3, you can configure a FailoverStrategy to change this behavior. Tha

Re: starting query server when running flink embedded

2017-09-29 Thread Piotr Nowojski
Hi, You can take a look at how is it done in the exercises here . There are example solutions that run on a local environment. I Hope that helps :) Piotrek > On Sep 28, 2017, at 11:22 PM, Henri Heiskanen > wrote: > > Hi, > > I wou

Re: Avoid duplicate messages while restarting a job for an application upgrade

2017-10-02 Thread Piotr Nowojski
tively developed. Piotr Nowojski > On Oct 2, 2017, at 3:35 PM, Antoine Philippot > wrote: > > Hi, > > I'm working on a flink streaming app with a kafka09 to kafka09 use case which > handles around 100k messages per seconds. > > To upgrade our application we use

Re: Avoid duplicate messages while restarting a job for an application upgrade

2017-10-02 Thread Piotr Nowojski
hint or details on this part of that "todo list" ? > > > Le lun. 2 oct. 2017 à 16:50, Piotr Nowojski <mailto:pi...@data-artisans.com>> a écrit : > Hi, > > For failures recovery with Kafka 0.9 it is not possible to avoid duplicated > messages. Using Flink

Re: How flink monitor source stream task(Time Trigger) is running?

2017-10-04 Thread Piotr Nowojski
You are welcome :) Piotrek > On Oct 2, 2017, at 1:19 PM, yunfan123 wrote: > > Thank you. > "If SourceFunction.run methods returns without an exception Flink assumes > that it has cleanly shutdown and that there were simply no more elements to > collect/create by this task. " > This sentence so

Re: Sink buffering

2017-10-04 Thread Piotr Nowojski
Hi, Do you mean buffer on state and you want to achieve exactly-once HBase sink? If so keep in mind that you will need some kind of transactions support in HBase to make it 100% reliable. Without transactions, buffering messages on state only reduces chance of duplicated records. How much “red

Re: Sink buffering

2017-10-04 Thread Piotr Nowojski
What do you mean by "This always depends on checkpointing interval right?”? In TwoPhaseCommitSinkFunction, transactions are being committed on each Flink checkpoint. I guess same applies to GenericWriteAheadSink. The first one just commits/pre-commits the data on checkpoint, second rewrites them

Re: Sink buffering

2017-10-04 Thread Piotr Nowojski
Interval - Yes. TwoPhaseCommitSinkFunction - yes, but it depends how will you implement your “Transaction” class, it wouldn’t make a lot of sense, but you could store events inside the transaction “pojo”. Piotrek > On Oct 4, 2017, at 12:45 PM, nragon > wrote: > > checkpointing interval ~=

Re: Avoid duplicate messages while restarting a job for an application upgrade

2017-10-10 Thread Piotr Nowojski
cated before. > > I planned to reapply/adapt this patch for the 1.3.2 release when we migrate > to it and maybe later to the 1.4 > > I'm open to suggestion or to help/develop this feature upstream if you want. > > > Le lun. 2 oct. 2017 à 19:09, Piotr Nowojski <

Re: Write each group to its own file

2017-10-12 Thread Piotr Nowojski
Hi, There is no straightforward way to do that. First of all, the error you are getting is because you are trying to start new application ( env.fromElements(items) ) inside your reduce function. To do what you want, you have to hash partition the products based on category (instead of groupin

Re: R/W traffic estimation between Flink and Zookeeper

2017-10-12 Thread Piotr Nowojski
Hi, Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric. As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, numbe

Re: Implement bunch of transformations applied to same source stream in Apache Flink in parallel and combine result

2017-10-12 Thread Piotr Nowojski
Hi, What is the number of events per second that you wish to process? If it’s high enough (~ number of machines * number of cores) you should be just fine, instead of scaling with number of features, scale with number of events. If you have a single data source you still could randomly shuffle

Re: Kafka 11 connector on Flink 1.3

2017-10-12 Thread Piotr Nowojski
Hi, Kafka 0.11 connector depends on some API changes for Flink 1.4, so without rebasing the code and solving some small issues it is not possible to use it for 1.3.x. We are about to finalizing the timeframe for 1.4 release, it would be great if you could come back with this question after the

Re: Writing an Integration test for flink-metrics

2017-10-12 Thread Piotr Nowojski
Hi, Doing as you proposed using JMXReporter (or custom reporter) should work. I think there is no easier way to do this at the moment. Piotrek > On 12 Oct 2017, at 04:58, Colin Williams > wrote: > > I have a RichMapFunction and I'd like to ensure Meter fields are properly > incremented. I'v

Re: Writing to an HDFS file from a Flink stream job

2017-10-12 Thread Piotr Nowojski
Hi, Maybe this is an access rights issue? Could you try to create and write to same file (same directory) in some other way (manually?), using the same user and the same machine as would Flink job do? Maybe there will be some hint in hdfs logs? Piotrek > On 12 Oct 2017, at 00:19, Isuru Suriar

Re: Writing to an HDFS file from a Flink stream job

2017-10-12 Thread Piotr Nowojski
tml > > <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/filesystem_sink.html> > > Best, > Aljoscha > >> On 12. Oct 2017, at 14:55, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Hi, >> >&g

Re: Beam Application run on cluster setup (Kafka+Flink)

2017-10-12 Thread Piotr Nowojski
Hi, What do you mean by: > With standalone beam application kafka can receive the message, But in cluster setup it is not working. In your example you are reading the data from Kafka and printing them to console. There doesn’t seems to be anything that writes back to Kafka, so what do you mean

Re: Monitoring job w/LocalStreamEnvironment

2017-10-12 Thread Piotr Nowojski
Hi, Have you read the following doc? https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/testing.html There are some hints regarding testing your application. Especially take a look at the

Re: Submitting a job via command line

2017-10-12 Thread Piotr Nowojski
Have you tried this http://mail-archives.apache.org/mod_mbox/flink-user/201705.mbox/%3ccagr9p8bxhljseexwzvxlk+drotyp1yxjy4n4_qgerdzxz8u...@mail.gmail.com%3E

Re: Beam Application run on cluster setup (Kafka+Flink)

2017-10-13 Thread Piotr Nowojski
Hi, What version of Flink are you using. In earlier 1.3.x releases there were some bugs in Kafka Consumer code. Could you change the log level in Flink to debug? Did you check the Kafka logs for some hint maybe? I guess that metrics like bytes read/input records of this Link application are not

Re: Writing an Integration test for flink-metrics

2017-10-13 Thread Piotr Nowojski
ler wrote: > You could also write a custom reporter that opens a socket or similar for > communication purposes. > > You can then either query it for the metrics, or even just trigger the > verification in the reporter, > and fail with an error if the reporter returns an error. &

Re: Submitting a job via command line

2017-10-13 Thread Piotr Nowojski
Good to hear that :) > On 13 Oct 2017, at 14:40, Alexander Smirnov wrote: > > Thank you so much, it helped! > > From: Piotr Nowojski <mailto:pi...@data-artisans.com>> > Date: Thursday, October 12, 2017 at 6:00 PM > To: Alexander Smirnov mailto:as

Re: Monitoring job w/LocalStreamEnvironment

2017-10-16 Thread Piotr Nowojski
hanks for responding, see below. > >> On Oct 12, 2017, at 7:51 AM, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Hi, >> >> Have you read the following doc? >> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/strea

Re: SLF4j logging system gets clobbered?

2017-10-19 Thread Piotr Nowojski
Hi, What versions of Flink/logback are you using? Have you read this: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html#use-logback-when-running-flink-out-of-the-ide--from-a-java-application

Re: Set heap size

2017-10-19 Thread Piotr Nowojski
Hi, Just log into the machine and check it’s memory consumption using htop or a similar tool under the load. Remember about subtracting Flink’s memory usage and and file system cache. Piotrek > On 19 Oct 2017, at 10:15, AndreaKinn wrote: > > About task manager heap size Flink doc says: > >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
Hi, What is the full stack trace of the error? Are you sure that there is no commons-compresss somewhere in the classpath (like in the lib directory)? How are you running your Flink cluster? Piotrek > On 19 Oct 2017, at 13:34, r. r. wrote: > > Hello > I have a job that runs an Apache Tika pip

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
Hi, As you can see in org.apache.flink.streaming.api.operators.AbstractStreamOperator#processWatermark1 it takes a minimum of both of the inputs. Piotrek > On 19 Oct 2017, at 14:06, Kien Truong wrote: > > Hi, > > If I connect two stream with different watermark, how are the watermark of >

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
With aitozi we have a hat trick oO > On 19 Oct 2017, at 17:08, Tzu-Li (Gordon) Tai wrote: > > Ah, sorry for the duplicate answer, I didn’t see Piotr’s reply. Slight delay > on the mail client. > > > On 19 October 2017 at 11:05:01 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
application pom.xml. I’m not sure if this is solvable in some way, or not. Maybe as a walk around, you could shade commons-compress usages in your pom.xml? Piotr Nowojski > On 19 Oct 2017, at 17:36, r. r. wrote: > > flink is started with bin/start-local.sh > > there is no classp

Re: Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Piotr Nowojski
Hi, Maybe you can just list files in your basePath and filter out those that have inProgress or pending suffixes? I think you could wrap/implement your own Bucketer and track all the paths that it returns. However some of those might be pending or in progress files that will be committed in t

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-20 Thread Piotr Nowojski
s path affect this? > > by shade commons-compress do you mean : > > it doesn't have effect either > > as a last resort i may try to rebuild Flink to use 1.14, but don't want to go > there yet =/ > > > Best regards > > > > > > >> -

Re: Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Piotr Nowojski
of moving the file into final state. > I thought, that maybe someone has already implemented such thing or knows any > other approaches that will help me to not copy/ paste existing sink impl )) > > Thx ! > > >> On 20 Oct 2017, at 14:37, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Piotrek >

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Piotr Nowojski
Hi, What do you mean by saying: > When I open the logfiles on the Hadoop cluster I see this: The error doesn’t come from Flink? Where do you execute hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml")); ? To me it seems like it is a problem with misconfigured HBase and n

Re: Does heap memory gets released after computing windows?

2017-10-20 Thread Piotr Nowojski
Hi, Memory used by session windows should be released once window is triggered (allowedLateness can prolong window’s life). Unless your code introduces some memory leak (by not releasing references) everything should be garbage collected. Keep in mind that session windows with time gap of 10 m

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-20 Thread Piotr Nowojski
t; >> ---- Оригинално писмо > >> От: Piotr Nowojski pi...@data-artisans.com > >> Относно: Re: java.lang.NoSuchMethodError and dependencies problem > >> До: "r. r." > >> Изпратено на: 20.10.2017 14:46 > > > > >

Re: flink can't read hdfs namenode logical url

2017-10-20 Thread Piotr Nowojski
Hi, Please double check the content of config files in YARN_CONF_DIR and HADOOP_CONF_DIR (the first one has a priority over the latter one) and that they are pointing to correct files. Also check logs (WARN and INFO) for any relevant entries. Piotrek > On 20 Oct 2017, at 06:07, 邓俊华 wrote: >

Re:

2017-10-20 Thread Piotr Nowojski
Hi, Only batch API is using managed memory. If you are using streaming API, you can do two things: - estimate max cache size based on for example fraction of max heap size - use WeakReference to implement your cache In batch API, you could estimate max cache size based on: - fraction of (heapSi

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Piotr Nowojski
e settings. > So it seems in the transition into the cluster the application does not copy > everything it has available locally for some reason. > > There is a very high probability I did something wrong, I'm just not seeing > it at this moment. > > Niels > > &

Re: flink can't read hdfs namenode logical url

2017-10-23 Thread Piotr Nowojski
nt.java:619) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > -- > 发件人:Piotr Nowojski > 发送时间:2017年10月20日(星期五) 21:39 > 收件人:邓俊华 > 抄 送:user > 主 题:Re: flink can't read hdfs namenode logica

Re: SLF4j logging system gets clobbered?

2017-10-23 Thread Piotr Nowojski
Till could you take a look at this? Piotrek > On 18 Oct 2017, at 20:32, Jared Stehler > wrote: > > I’m having an issue where I’ve got logging setup and functioning for my > flink-mesos deployment, and works fine up to a point (the same point every > time) where it seems to fall back to “defa

Re: HBase config settings go missing within Yarn.

2017-10-23 Thread Piotr Nowojski
mple project that reproduces the problem on my setup: > https://github.com/nielsbasjes/FlinkHBaseConnectProblem > <https://github.com/nielsbasjes/FlinkHBaseConnectProblem> > > Niels Basjes > > > On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <mailto:pi...@data-artis

Re: Write each group to its own file

2017-10-23 Thread Piotr Nowojski
You’re welcome :) > On 23 Oct 2017, at 20:43, Rodrigo Lazoti wrote: > > Piotr, > > I did as you suggested and it worked perfectly. > Thank you! :) > > Best, > Rodrigo > > On Thu, Oct 12, 2017 at 5:11 AM, Piotr Nowojski <mailto:pi...@data-artisans

Re: Flink flick cancel vs stop

2017-10-24 Thread Piotr Nowojski
I would propose implementations of NewSource to be not blocking/asynchronous. For example something like public abstract Future getCurrent(); Which would allow us to perform some certain actions while there are no data available to process (for example flush output buffers). Something like this

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-02 Thread Piotr Nowojski
Did you try to expose required ports that are listed in the README when starting the containers? https://github.com/apache/flink/tree/master/flink-contrib/docker-flink Ports: • The Web Client is on port 48081

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-06 Thread Piotr Nowojski
.StreamInputProcessor.processInput(StreamInputProcessor.java:213) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
Hi Ebru and Javier, Yes, if you could share this example job it would be helpful. Ebru: could you explain in a little more details how does your Job(s) look like? Could you post some code? If you are just using maps and filters there shouldn’t be any network transfers involved, aside from Sourc

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
quot;; and nothing seems to work. > > Thanks for your help. > > On 8 November 2017 at 13:26, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > mailto:b20926...@cs.hacettepe.edu.tr>> wrote: > On 2017-11-08 15:20, Piotr Nowojski wrote: > Hi Ebru and Javier, > > Yes, if you could shar

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
die faster. I tested as well with a > small data set, using the fromElements source, but it will take some time to > die. It's better with some data. > > On 8 November 2017 at 14:54, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Thanks for

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
heap size. Piotrek > On 8 Nov 2017, at 15:28, Javier Lopez wrote: > > Yes, I tested with just printing the stream. But it could take a lot of time > to fail. > > On Wednesday, 8 November 2017, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > > Thanks

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
time with `fromElements` source instead of >> > Kafka, right? >> > Did you try it also without a Kafka producer? >> > Piotrek >> > >> > On 8 Nov 2017, at 14:57, Javier Lopez > > <mailto:javier.lo...@zalando.de>> wrote: >> > Hi, >&

Re: How to best create a bounded session window ?

2017-11-09 Thread Piotr Nowojski
It might be more complicated if you want to take into account events coming in out of order. For example you limit length of window to 5 and you get the following events: 1 2 3 4 6 7 8 5 Do you want to emit windows: [1 2 3 4 5] (length limit exceeded) + [6 7 8] ? Or are you fine with interlea

<    1   2   3   4   5   6   7   >