Re: map JSON to scala case class & off-heap optimization

2020-07-16 Thread Georg Heiler
Many thanks! Am Mi., 15. Juli 2020 um 15:58 Uhr schrieb Aljoscha Krettek < aljos...@apache.org>: > On 11.07.20 10:31, Georg Heiler wrote: > > 1) similarly to spark the Table API works on some optimized binary > > representation > > 2) this is only available in the SQL way of interaction - there i

Re: Pyflink sink rowtime field

2020-07-16 Thread Xingbo Huang
Hi Jesse, I think that the type of rowtime you declared on the source schema is DataTypes.Timestamp(), you also use DataTypes.Timestamp() on the sink schema Best, Xingbo Jesse Lord 于2020年7月15日周三 下午11:41写道: > I am trying to sink the rowtime field in pyflink 1.10. I get the following > error > >

Re: Performance test Flink vs Storm

2020-07-16 Thread Xintong Song
> > From this exercise , I understand that increasing JVM memory would > directly support/increase throughout. Am i correct? > It depends. Smaller heap space means more frequent GCs, which occupies the cpu processing time and also introduces more pauses to your program. If you already have large en

Re: Performance test Flink vs Storm

2020-07-16 Thread Prasanna kumar
Hi, After making the task.managed. Memory. fraction as 0 , i see that JVM heap memory increased from 512 mb to 1 GB. Earlier I was getting a maximum of 4-6k per second throughput on Kafka source for ingestion rate of 12k+/second. Now I see that improved to 11k per task(parallelism of 1) and 16.5k

Re: Performance test Flink vs Storm

2020-07-16 Thread Xintong Song
> > *I had set Checkpoint to use the Job manager backend.* Jobmanager backend also runs in JVM heap space and does not use managed memory. Setting managed memory fraction to 0 will give you larger JVM heap space, thus lesser GC pressure. Thank you~ Xintong Song On Thu, Jul 16, 2020 at 10:38 P

Flink 1.11 throws Unrecognized field "error_code"

2020-07-16 Thread Lian Jiang
Hi, I am using java 1.8 and Flink 1.11 by following https://ci.apache.org/projects/flink/flink-docs-release-1.11/try-flink/local_installation.html on my MAC Mojave 10.14.6. But " ./bin/flink run examples/streaming/WordCount.jar " throw below error. These are the java versions that I tried: 1.8

CEP use case ?

2020-07-16 Thread Aissa Elaffani
Hello Guys, I have some sensors generating some data about their (température, humidity, positioning , ...) and I want to apply some rules (simple conditions, if température>25, ...), in order to define if the sensor is on "Normal" status or "Alerte" status. Do i need to use flink CEP, or just the

Re: Flink Pojo Serialization for Map Values

2020-07-16 Thread Theo Diefenthal
hi Krzysztof, That's why my goal is to always set env.getConfig().disableGenericTypes(); in my streaming jobs. This way, you will receive an early crash if GenericTypes are used somewhere. (They are bad for the performance so I try to avoid them everywhere). Sadly, if you build up streaming

Beam flink runner job not keeping up with input rate after downscaling

2020-07-16 Thread Kathula, Sandeep
Hi, We started a Beam application with Flink runner with parallelism as 50. It is a stateful application which uses RocksDB as state store. We are using timers and Beam’s value state and bag state (which is same as List state of Flink). We are doing incremental checkpointing. With initial par

Re: Question on Pattern Matching

2020-07-16 Thread Chesnay Schepler
Have you read this part of the documentation ? From what I understand, it provides you hooks for processing matched/timed out patterns. On 16/07/2020 20:23, Basanth Gowda wrote

Re: Using md5 hash while sinking files to s3

2020-07-16 Thread Chesnay Schepler
I only quickly skimmed the Hadoop docs and found this (although it is not documented very well I might add). If this does not do the trick, I'd suggest to reach out to the Hadoop project, since we're using their S3 filesystem. On 16/07/2020 19:32, nikita Balakrishnan wrote: Hey Chesnay, Than

Re: backup configuration in Flink doc

2020-07-16 Thread Chesnay Schepler
They should be public yes; I do not know what the "Backup" category is supposed to mean, and I suspect this was a WIP title. On 16/07/2020 18:01, Steven Wu wrote: The configuration page has this "backup" section. Can I assume that they are public interfaces? The name "backup" is a little conf

Re: Flink Pojo Serialization for Map Values

2020-07-16 Thread KristoffSC
Theo, thank you for clarification and code examples. I was actually suspectign that this is becase the Java type erasure.s The thing that bothers me though is fact that Flink was failing over to Kryo silently in my case. Without any information in the logs. And actually we found it just by luck.

Question on Pattern Matching

2020-07-16 Thread Basanth Gowda
Hello, We have a use case where we want to know when a certain pattern doesn't complete within a given time frame. For Example A -> B -> C -> D (needs to complete in 10 minutes) Now with Flink if event D doesn't happen in 10 minutes, the pattern is discarded and we can get notified. We also want

Re: Hadoop FS when running standalone

2020-07-16 Thread Lorenzo Nicora
Thanks Alessandro, I think I solved it. I cannot set any HADOOP_HOME as I have no Hadoop installed on the machine running my tests. But adding *org.apache.flink:flink-shaded-hadoop-2:2.8.3-10.0* as a compile dependency to the Maven profile building the standalone version fixed the issue. Lorenzo

Re: Using md5 hash while sinking files to s3

2020-07-16 Thread nikita Balakrishnan
Hey Chesnay, Thank you for getting back with that! I tried setting that too, it still gives me the same exception. Is there something else that I'm missing? I also have fs.s3a.bucket..server-side-encryption-algorithm=SSE-KMS and fs.s3a.bucket..server-side-encryption.key set. Is there no need to s

backup configuration in Flink doc

2020-07-16 Thread Steven Wu
The configuration page has this "backup" section. Can I assume that they are public interfaces? The name "backup" is a little confusing to me. There are some important pipeline and execution checkpointing configs here. https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#backup T

Flink Jobs are failing for running testcases when trying to build in Jenkins server

2020-07-16 Thread bujjirahul45
Hi, I am trying to build flink job in Jenkins server and when its running the testcases its giving me below i am doing a simple pattern validation, where i am testing data against a set of patterns its build fine in local with gradle 6.3 but trying to build in Jenkins server its giving below is st

UnsupportedOperatorException with TensorFlow on checkpointing

2020-07-16 Thread Sung Gon Yi
Hi, Following codes have a UnsupportedOperatorException on checkpointing (every time). Could you suggest any solution? Example code: A.java -- public class A extends RichWindowFunction { private transient MapState state; @Override

Re: Performance test Flink vs Storm

2020-07-16 Thread Prasanna kumar
Xintong Song, - Which version of Flink is used?*1.10* - Which deployment mode is used? *Standalone* - Which cluster mode is used? *Job* - Do you mean you have a 4core16gb node for each task manager, and each task manager has 4 slots? *Yeah*. *There are totally 3 taskmanagers in

Re: Hadoop FS when running standalone

2020-07-16 Thread Alessandro Solimando
Hi Lorenzo, IIRC I had the same error message when trying to write snappified parquet on HDFS with a standalone fat jar. Flink could not "find" the hadoop native/binary libraries (specifically I think for me the issue was related to snappy), because my HADOOP_HOME was not (properly) set. I have n

AllwindowStream and RichReduceFunction

2020-07-16 Thread Flavio Pompermaier
Hi to all, I'm trying to apply a rich reduce function after a countWindowAll but Flink says "ReduceFunction of reduce can not be a RichFunction. Please use reduce(ReduceFunction, WindowFunction) instead." Is there any good reason for this? Or am I doing something wrong? Best, Flavio

Re: Byte arrays in Avro

2020-07-16 Thread Timo Walther
I further investigated this issue. We are analyzing the class as a POJO in another step here which produces the warning: https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroTypeInfo.java#L71 However, the serializer is de

Re: Byte arrays in Avro

2020-07-16 Thread Timo Walther
Hi Lasse, are you using Avro specific records? A look into the code shows that the warnings in the log are generated after the Avro check: https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java#L1741 Somehow your Avro object

Re: pyFlink UDTF function registration

2020-07-16 Thread Manas Kale
Hi Xingbo, Thank you for the elaboration. Note that all of this is for a streaming job. I used this code to create a SQL VIEW : f""" CREATE VIEW TMP_TABLE AS SELECT monitorId, featureName, featureData, time_st FROM ( SELECT monitorId, featureName, featureData, time_st FROM {INPUT_T

Byte arrays in Avro

2020-07-16 Thread Lasse Nedergaard
Hi. We have some Avro objects and some of them contain the primitive data type bytes and it's translated into java.nio.ByteBuffer in the Avro objects. When using our Avro object we get these warnings: org.apache.flink.api.java.typeutils.TypeExtractor [] - class java.nio. ByteBuffer does not conta

Hadoop FS when running standalone

2020-07-16 Thread Lorenzo Nicora
Hi I need to run my streaming job as a *standalone* Java application, for testing The job uses the Hadoop S3 FS and I need to test it (not a unit test). The job works fine when deployed (I am using AWS Kinesis Data Analytics, so Flink 1.8.2) I have *org.apache.flink:flink-s3-fs-hadoop* as a "com

Re: Is there a way to access the number of 'keys' that have been used for partioning?

2020-07-16 Thread orionemail
Thanks for the response, I thought as much. Sent with [ProtonMail](https://protonmail.com) Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday, 15 July 2020 17:12, Chesnay Schepler wrote: > This information is not readily available; in fact Flink itself doesn't know > how many keys the

Re: HELP!!! Flink1.10 sql insert into HBase, error like validateSchemaAndApplyImplicitCast

2020-07-16 Thread Danny Chan
I suspect there are some inconsistency in the nullability of the whole record field, can you compare the 2 schema and see the diff ? For a table, you can get the TableSchema first and print it out. Best, Danny Chan 在 2020年7月16日 +0800 AM10:56,Leonard Xu ,写道: > Hi, Jim > > Could you post error mes

Re: Manual allocation of slot usage

2020-07-16 Thread Mu Kong
Hi, Song, Guo, We updated our cluster to 1.10.1 and the cluster.evenly-spread-out-slots works pretty well now. Thanks for your help! Best regards, Mu On Wed, Jul 8, 2020 at 9:35 PM Mu Kong wrote: > Hi Song, Guo, > > Thanks for the information. > I will first upgrade our flink cluster to 1.10.0

Accumulators in Table API

2020-07-16 Thread Flavio Pompermaier
Hi to all, in my legacy code (using Dataset api) I used to add a map function just after the Source read and keep the count of the rows. In this way I had a very light and unobtrusive way of counting the rows of a dataset. Can I do something similar in table API? Is there a way to use accumulators?

Re: Print table content in Flink 1.11

2020-07-16 Thread Flavio Pompermaier
Thanks for the suggestions Kurt and Jingsong! Very helpful On Thu, Jul 16, 2020 at 4:30 AM Jingsong Li wrote: > Hi Flavio, > > For print: > - As Kurt said, you can use `table.execute().print();`, records will be > collected to the client (NOTE it is client) and print to client console. > - But i

Re: Using md5 hash while sinking files to s3

2020-07-16 Thread Chesnay Schepler
Please try configuring : fs.s3a.etag.checksum.enabled: true On 16/07/2020 03:11, nikita Balakrishnan wrote: Hello team, I’m developing a system where we are trying to sink to an immutable s3 bucket. This bucket has server side encryption set as KMS. The DataStream sink works perfectly fine wh