Re: Best practice for creating/restoring savepoint in standalone k8 setup

2022-07-05 Thread jonas eyob
has both manual and periodic > savepoint triggering also included in the latest upcoming version :) > > Cheers, > Gyula > > On Tue, Jul 5, 2022 at 5:34 PM Weihua Hu wrote: > >> Hi, jonas >> >> If you restart flink cluster by delete/create deployment directly, i

Best practice for creating/restoring savepoint in standalone k8 setup

2022-07-05 Thread jonas eyob
Hi! We are running a Standalone job on Kubernetes using application deployment mode, with HA enabled. We have attempted to automate how we create and restore savepoints by running a script for generating a savepoint (using k8 preStop hook) and another one for restoring from a savepoint (located i

Re: Log4j2 configuration

2022-02-15 Thread jonas eyob
o your .xml > file. > 2) > Have you made modifications to the distribution (e.g., removing other > logging jars from the lib directory)? > Are you using application mode, or session clusters? > > On 15/02/2022 16:41, jonas eyob wrote: > > Hey, > > We are depl

Log4j2 configuration

2022-02-15 Thread jonas eyob
ot;org.apache.logging.log4j" % "log4j-core" % "2.17.0", "org.apache.logging.log4j" %% "log4j-api-scala" % "12.0", "io.sentry" % "sentry-log4j2" % "5.6.0", Confused about what is going on here, possible this might not be Flink related matter but I am not sure..any tips on how to best debug this would be much appreciated. -- *Thanks,* *Jonas*

Re: Cannot consum from Kinesalite using FlinkKinesisConsumer

2021-12-04 Thread jonas eyob
re 3 dec. 2021 kl 12:47 skrev Mika Naylor : > Hey Jonas, > > May I ask what version of Kinesalite you're targeting? With 3.3.3 and > STREAM_INITIAL_POSITION = "LATEST", I received a "The timestampInMillis > parameter cannot be greater than the currentTimestamp

Cannot consum from Kinesalite using FlinkKinesisConsumer

2021-12-02 Thread jonas eyob
:1.14.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) ~[flink-runtime-1.14.0.jar:1.14.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[flink-runtime-1.14.0.jar:1.14.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: Checkpoints aborted - Job is not in state RUNNING but FINISHED

2021-11-26 Thread jonas eyob
problem for me: is this caused by having tasks parallelism > 1, but only of them is RUNNING (other in FINISHED state)? Would there be a problem if say, we have two tasks to consume events from a kinesis source but the stream has only 1 shard? Den fre 26 nov. 2021 kl 03:14 skrev Yun Gao : > Hi

Checkpoints aborted - Job is not in state RUNNING but FINISHED

2021-11-25 Thread jonas eyob
: 3 # try n times before job is considered failed >From what I can see the job is still running, and the checkpointing keeps failing. After finding this (https://issues.apache.org/jira/browse/FLINK-2491) I updated the default parallelism from 2 -> 1 since our current kinesis steam consists of 1 shard. But problem persists. Any ideas? Jonas

High availability - leader election not working?

2021-09-01 Thread jonas eyob
{configMapName='thoros--jobmanager-leader'}. 2021-08-31 15:00:02,784 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - All 0 checkpoints found are already downloaded. 2021-08-31 15:00:02,784 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - No checkpoint found during restore. -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
:// schema in our case Thanks both! Den tors 26 aug. 2021 kl 17:59 skrev Gil De Grove : > Hi Jonas, > > > > Just wondering, are you trying to deploy via iam service account > annotations in a AWS eks cluster? > > We noticed that when using presto, the iam service account was usi

Re: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
iguration>, by adding the configurations to your flink-conf.yaml. The Presto S3 implementation is the recommended file system for checkpointing to S3 Its possible I am misunderstanding it? Best, Jonas [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#

Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden)

2021-08-26 Thread jonas eyob
nk.runtime.dispatcher.Dispatcher.persistAndRunJob(Dispatcher.java:392) ~[flink-dist_2.12-1.12.5.jar:1.12.5] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$waitForTerminatingJob$29(Dispatcher.java:971) ~[flink-dist_2.12-1.12.5.jar:1.12.5] at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedConsumer$3(FunctionUtils.java:93) ~[flink-dist_2.12-1.12.5.jar:1.12.5] ... 27 more -- *Med Vänliga Hälsningar* *Jonas Eyob*

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-26 Thread jonas eyob
-presto/ ls: cannot access '/opt/flink/plugins/s3-fs-presto/': No such file or directory It appears, this had to do with minikube using cached version of the docker image, so the new changes (i.e. adding the plugin) never was reflected. After pulling down the latest the error stopped :) B

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
as needed. Den ons 25 aug. 2021 kl 11:37 skrev David Morávek : > Hi Jonas, > > Where does the exception pop-up? In job driver, TM, JM? You need to make > sure that the plugin folder is setup for all of them, because they all may > need to access s3 at some point. > > Best

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
w I would check it? Den ons 25 aug. 2021 kl 10:12 skrev Thms Hmm : > Hey Jonas, > you could also try to use the ´s3p://´ scheme to directly specify that > presto should be used. Also check if your user that executes the process is > able to read the jars. > > Am Mi., 25. Aug.

Re: NullPointerException when using KubernetesHaServicesFactory

2021-08-25 Thread jonas eyob
fs-presto-1.12.5.jar Any idea? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#high-availability [2] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins Den ons 25 aug. 2021 kl 08:00 skr

NullPointerException when using KubernetesHaServicesFactory

2021-08-24 Thread jonas eyob
ctory.java:37) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:265) ... 9 more . -- Many thanks, Jonas

Re: Connect more than two streams

2017-07-25 Thread Jonas Gröger
Hello Govindarajan, one way to merge multiple streams is to union them. You can do that with the union operator described at [1]. Is that what you are looking for? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html -- Jonas Am Mo, 24. Jul 2017, um 23:18

Re: Anomaly Detection with Flink-ML

2017-07-07 Thread Jonas Gröger
/projects/flink/flink-docs-release-1.3/dev/windows.html) and basic transformations (https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#datastream-transformations). Regards, Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050

Re: Related datastream

2017-06-21 Thread Jonas Gröger
Hi nragon, apparently I didn't read the P.S. since I assumed its not important. Silly me. So you are trying to join stream A and B to stream C with stream A and B being keyed. Alright. Are how often do matching elements (matched by primary key) from A and B arrive on your operator to-be-implement

Re: Related datastream

2017-06-21 Thread Jonas Gröger
no problem but we need a little bit more clarification here. -- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Related-datastream-tp13901p13903.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: How to sessionize stream with Apache Flink?

2017-06-18 Thread Jonas
Hey Milad, since you cannot look into the future which element comes next, you have to "lag" one behind. This requires building an operator that creates 2-tuples from incoming elements containing (current-1, current), so basically a single value state that emits the last and the current element in

Re: Start streaming tuples depending on another streams rate

2017-02-12 Thread Jonas
For 2: You can also NOT read the Source (i.e. Kafka) while doing that. This way you don't have to buffer. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Start-streaming-tuples-depending-on-another-streams-rate-tp11542p11590.html Sent from th

Re: Start streaming tuples depending on another streams rate

2017-02-10 Thread Jonas
c in flatMap2 to actually start doing stuff. This has the issue that while stream A is being processed, I lose tuples from stream B because it is not "stopped".I think my use case is currently not really doable in Flink.-- Jonas -- View this message in context: http://apache-flink-

Re: Remove old <1.0 documentation from google

2017-02-09 Thread Jonas
Doesen't Google offer wildcard removal? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Remove-old-1-0-documentation-from-google-tp11541p11551.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Remove old <1.0 documentation from google

2017-02-09 Thread Jonas
Maybe add "This documentation is outdated. Please switch to a newer version by clicking here ". -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Remove-old-1-0-documentation-from-google-tp11541p11544.html Sent from the Apache Flink User Mailin

Re: How about Discourse (https://www.discourse.org/) for this mailing list

2017-02-09 Thread Jonas
I might want to add that although these two are available, the content of the submissions is still often unreadable and not properly formatted. At least for me this is annoying to read. Additionally we have Stackoverflow which has a nice UI for editing but not really good for discussions. -- Vie

Start streaming tuples depending on another streams rate

2017-02-09 Thread Jonas
empties the buffer. However, this would make my RichCoFlatMapFunction much bigger and would not allow for operator reuse in other scenarios.I'm of course happy to answer if something is unclear.-- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.23360

Remove old <1.0 documentation from google

2017-02-09 Thread Jonas
Hi! Its really annoying that if you search for something in Flink, you often get old documentation from Google. Example: Google "flink quickstart scala" and you get https://ci.apache.org/projects/flink/flink-docs-release-0.8/scala_api_quickstart.html -- View this message in context: http://ap

Re: How about Discourse (https://www.discourse.org/) for this mailing list

2017-02-06 Thread Jonas
Instead of Nabble I will use PonyMail now :) Thanks. Didn't know it existed. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-about-Discourse-https-www-discourse-org-for-this-mailing-list-tp11448p11457.html Sent from the Apache Flink User

Re: Many streaming jobs vs one

2017-02-05 Thread Jonas
I recommend multiple Jobs. You can still share most of the code by creating Java / Scala packages. THis makes it easier to update Jobs. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Many-streaming-jobs-vs-one-tp11449p11450.html Sent from th

How about Discourse (https://www.discourse.org/) for this mailing list

2017-02-05 Thread Jonas
https://www.discourse.org/about/ for the features -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-about-Discourse-https-www-discourse-org-for-this-mailing-list-tp11448.html Sent from the Apache Flink User Mailing List archive. mailing lis

Re: Improving Flink Performance

2017-02-05 Thread Jonas
Using a profiler I found out that the main performance problem (80%) was spent in a domain specific data structure. After implementing it with a more efficient one, the performance problems are gone. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabbl

Re: Connection refused error when writing to socket?

2017-01-31 Thread Jonas
Can you try opening a socket with netcat on localhost? nc -lk 9000 and see it this works? For me this works. -- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Connection-refused-error-when-writing-to-socket-tp11372p11376.html Sent

Re: Calling external services/databases from DataStream API

2017-01-30 Thread Jonas
I have a similar usecase where I (for the purposes of this discussion) have a GeoIP Database that is not fully available from the start but will eventually be "full". The GeoIP tuples are coming in one after another. After ~4M tuples the GeoIP database is complete. I also need to do the same query

Re: Regarding Flink as a web service

2017-01-29 Thread Jonas
You could write your data back to Kafka using the FlinkKafkaProducer and then use websockets to read from kafka using NodeJS or other. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-Flink-as-a-web-service-tp11364p11365.html Sent fr

Re: setParallelism() for addSource() in streaming

2017-01-28 Thread Jonas
env.setParallelism(5).addSource(???) will set the default parallelism for this Job to 5 and then add the source. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/setParallelism-for-addSource-in-streaming-tp11343p11356.html Sent from the Apache

Re: Improving Flink Performance

2017-01-26 Thread Jonas
JProfiler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11311.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Improving Flink Performance

2017-01-25 Thread Jonas
Images: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png and http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.co

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I ran a profiler on my Job and it seems that most of the time, its waiting :O See here: Also, the following code snippet executes unexpectedly slow: as you can see in this call graph:

Re: Improving Flink Performance

2017-01-25 Thread Jonas
I tried and it added a little performance (~10%) but nothing outstanding. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11301.html Sent from the Apache Flink User Mailing List archive. mailing list archiv

Re: Improving Flink Performance

2017-01-24 Thread Jonas
The performance hit due to decoding the JSON is expected and there is not a lot (except for changing the encoding that I can do about that). Alright. When joining the above stream with another stream I get another performance hit by ~80% so that in the end I have only 1k msgs/s remaining. Do you k

Improving Flink Performance

2017-01-24 Thread Jonas
Hello!I'm reposting this since the other thread had some formatting issues apparently. I hope this time it works.I'm having performance problems with a Flink job. If there is anything valuable missing, please ask and I will try to answer ASAP. My job looks like this:First, I read data from Kafka. T

Re: Improving Flink performance

2017-01-24 Thread Jonas
I don't even have images in there :O Will delete this thread and create a new one. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11245.html Sent from the Apache Flink User Mailing List archive. mailing li

Re: Improving Flink performance

2017-01-23 Thread Jonas
I received it well-formatted. May it be that the issue is your Mail reader? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11225.html Sent from the Apache Flink User Mailing List archive. mailing list arch

Improving Flink performance

2017-01-23 Thread Jonas
Hello! I'm having performance problems with a Flink job. If there is anything valuable missing, please ask and I will try to answer ASAP. My job looks like this: First, I read data from Kafka. This is very fast at 100k msgs/s. The data is decoded, a type is added (we have multiple message types

Re: Expected behaviour of windows

2017-01-23 Thread Jonas
you want a PurgingTrigger. You also didn't state what version of Flink you are using :) -- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Expected-behaviour-of-windows-tp11200p11205.html Sent from the Apache Flink User Mailing Lis

Re: weird client failure/timeout

2017-01-23 Thread Jonas
The exception says that Did you already try that? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/weird-client-failure-timeout-tp11201p11204.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-20 Thread Jonas
Hey Jamie, It turns out you were right :) I wrote my own implementation of IPAddress and then it worked. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/keyBy-called-twice-Second-time-INetAddress-and-Array-Byte-are-empty-tp10907p11179.html S

Re: Kafka Fetch Failed / DisconnectException

2017-01-18 Thread Jonas
Hallo Fabian, that IS the error message. The job continues to run without restarting. There is not really more to see from the logs. -- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-Fetch-Failed-DisconnectException

Kafka Fetch Failed / DisconnectException

2017-01-18 Thread Jonas
Hi! According to the output, I'm having some problems with the KafkaConsumer09. It reports the following on stdout: Is that something I should worry about? -- Jonas -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-Fetch-F

Re: How to read from a Kafka topic from the beginning

2017-01-16 Thread Jonas
You also need to have a new for this to work. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-read-from-a-Kafka-topic-from-the-beginning-tp3522p11087.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Multiple Sources and Isolated Analytics

2017-01-16 Thread Jonas
mbine them or something, I would suggest you to create three jobs. -- Jonas fms wrote > I am new and just starting out with Flink so please forgive if the > question > doesn't quite make sense. I would like to evaulate flink for a big data > pipeline. The part I am confused

Re: keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-09 Thread Jonas
So I created a minimal working example where this behaviour can still be seen. It is 15 LOC and can be downloaded here: https://github.com/JonasGroeger/flink-inetaddress-zeroed To run it, use sbt: If you don't want to do the above fear not, here is the code: For some reason, java.net.InetAddress

keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-07 Thread Jonas
Hi! I have two streams that I connect and call keyBy after. I put some debugging code in the bKeySelector. Turns out, it gets called twice from different areas. Stacktrace here: https://gist.github.com/JonasGroeger/8ce218ee1c19f0639fa990f43b5f9e2b It contains 1 package which gets keyed twice fo