Re: Wired behavior of DATE_FORMAT UDF in Flink SQL

2018-10-31 Thread Timo Walther
Hi Henry, the DATE_FORMAT function is in a very bad state right now. I would recommend to implement your own custom function right now. This issue is tracked here: https://issues.apache.org/jira/browse/FLINK-10032 Regards, Timo Am 31.10.18 um 07:44 schrieb 徐涛: Hi Experts, I found that DA

Re: Table API and AVG on dates

2018-10-31 Thread Flavio Pompermaier
I've opened a ticket for this: https://issues.apache.org/jira/browse/FLINK-10731 On Wed, Oct 31, 2018 at 7:41 AM Hequn Cheng wrote: > Hi Flavio, > > You are right. Avg on dates is not supported. It requires numeric types. > As a workaround, you can transform the datetime into a numeric type usin

Kafka offset behaviour when restarting job from savepoint

2018-10-31 Thread Bernd.Winterstein
Hi Let's say we have a job which reads from a Flink kafka source (FlinkKafkaConsumer011) and commits the offsets on each checkpoint. When the job is started from an older savepoint, will it take the latest offsets stored in Kafka for the consumer group or are the offsets taken from the savepoin

Re: Kafka offset behaviour when restarting job from savepoint

2018-10-31 Thread Paul Lam
Hi Bend, The offsets would be restored from the savepoint. Flink would only fallback to use the offsets on the brokers if there are no offset in its states. Best, Paul Lam > 在 2018年10月31日,17:13, > 写道: > > Hi > Let’s say we have a job which reads from a Flink kafka source > (FlinkKafkaCons

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
Hi Henry, Thanks for your response. However we don't face this issue during normal run as we have incremental checkpoints. Only when we try to take savepoint (which tries to save entire state in one go), we face this problem. Gagan On Wed, Oct 31, 2018 at 11:41 AM 徐涛 wrote: > Hi Gagan, >

Flink SQL string literal does not support double quotation?

2018-10-31 Thread 徐涛
Hi Experts, When I am running the following SQL in FLink 1.6.2, I got org.apache.calcite.sql.parser.impl.ParseException select BUYER_ID, AMOUNT, concat( from_unixtime(unix_timestamp(CREATE_TIME

Flink SQL string literal does not support double quotation?

2018-10-31 Thread 徐涛
Hi Experts, When I am running the following SQL in FLink 1.6.2, I got org.apache.calcite.sql.parser.impl.ParseException select BUYER_ID, AMOUNT, concat( from_unixtime(unix_timestamp(CREATE_TIME

Re: Avro serialization problem after updating to flink 1.6.0

2018-10-31 Thread Aljoscha Krettek
Hmm, so if there is this code that wants the $SCHEMA field, I wonder how it ever worked. Did you try the newly rebuilt avro model against the older Flink version (1.3.2)? If possible, you could send the code and we could look into it? OR, could you try running it in a debugger and set a breakpo

Re: CsvInputFormat - read header line first

2018-10-31 Thread Ken Krugler
Hi Madan, If your source has a parallelism > 1, then when the CSV file is split, only one of the operators will get the split with the header row. So in that case, how would you communicate the column name->index information to the other operators? If you force a parallelism of 1 for the sourc

Non deterministic result with Table API SQL

2018-10-31 Thread Flavio Pompermaier
Hi to all, I'm using Flink 1.6.1 and I get different results when running the same query on the same static dataset. There are times that I get a 'NaN' as result of a select field-expression, while other times I get a valid double. How is this possible? This seems to happen only when I execute a co

Re: Non deterministic result with Table API SQL

2018-10-31 Thread Timo Walther
Hi Flavio, do you execute this query in a batch or stream execution environment? In any case this sounds very strange to me. But is it guarateed that it is not the fault of the connector? Regars, Timo Am 31.10.18 um 14:54 schrieb Flavio Pompermaier: Hi to all, I'm using Flink 1.6.1 and I g

1.6 UI issues

2018-10-31 Thread Juan Gentile
Hello! We are migrating the the last 1.6 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all. Has anyone

Re: Non deterministic result with Table API SQL

2018-10-31 Thread Flavio Pompermaier
I read a Parquet file from the filesystem. The input rows are always read in the same way, but results are different. My query is very big and maybe this affects somehow the query execution: UM(CASE WHEN isComplete(nome,sesso,cfPiva,cognome,immobili,deceduto,dataNascita,luogoNascita,impostePagate,

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Yun Tang
Hi Gagan Savepoint would generally takes more time than usual incremental checkpoint, you could try to increase checkpoint timeout time [1] env.getCheckpointConfig().setCheckpointTimeout(90); If you just want to resume from previous job without change the state-backend, I think you coul

Re: Non deterministic result with Table API SQL

2018-10-31 Thread Flavio Pompermaier
Adding more rows to the dataset lead to a deterministic error. My tests says that the problem arise when adding the STDDEV_POP to the query.. Do you think it could be possible that there's a concurrency problem in its implementation?

Re: Non deterministic result with Table API SQL

2018-10-31 Thread Timo Walther
As far as I know STDDEV_POP is translated into basic aggregate functions (SUM/AVG/COUNT). But if this error is reproducible in a little test case, we should definitely track this in JIRA. Am 31.10.18 um 16:43 schrieb Flavio Pompermaier: Adding more rows to the dataset lead to a deterministic e

Flink 1.6, User Interface response time

2018-10-31 Thread Oleksandr Nitavskyi
Hello! We are migrating the the last 1.6.2 version and all the jobs seem to work fine, but when we check individual jobs through the web interface we encounter the issue that after clicking on a job, either it takes too long to load the information of the job or it never loads at all. Has anyo

Re: Flink weird checkpointing behaviour

2018-10-31 Thread Yun Tang
Hi Pawel First of all, I don't think the akka timeout exception has relationship with checkpoint taking long time. And both RocksDBStateBackend and FsStateBackend could have the async part of checkpoint, which would upload data to DFS in general. That's why async part would take more time than

Ask about counting elements per window

2018-10-31 Thread Rad Rad
Hi All, I have a GPS stream consumed by FlinkKafkaConsumer which contains a set of GPSs of different users. I need to count a number of users per a specific window of this stream. Could anyone help me, a part of my code is below // read data from Kafka DataStream st

Flink cluster security conf.: keberos.keytab add to run yarn-cluster

2018-10-31 Thread Marke Builder
Hi, So far I have added my keytab and principal in the flink-conf.yaml: security.kerberos.login.keytab: security.kerberos.login.principal: But is there a way that I can add this to the "start script" -> run yarn-cluster . Thanks!

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Shuyi Chen
Hi Xuefu, Thanks a lot for driving this big effort. I would suggest convert your proposal and design doc into a google doc, and share it on the dev mailing list for the community to review and comment with title like "[DISCUSS] ... Hive integration design ..." . Once approved, we can document it

Re: Flink cluster security conf.: keberos.keytab add to run yarn-cluster

2018-10-31 Thread Shuyi Chen
Do you mean have these two options as the command line options? If so, AFAIK, I dont think it's supported now. What do you need it? Thanks. On Wed, Oct 31, 2018 at 11:43 AM Marke Builder wrote: > Hi, > > So far I have added my keytab and principal in the flink-conf.yaml: > security.kerberos.log

akka timeout exception

2018-10-31 Thread Anil
getting this error in my job manager too frequently. any help. Thanks! java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager#1927353472]] after [1 ms]. Sender[null] sent message of type "org.apache.flink.runtime.message

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Zhang, Xuefu
Hi Shuiyi, Good idea. Actually the PDF was converted from a google doc. Here is its link: https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing Once we reach an agreement, I can convert it to a FLIP. Thanks, Xuefu

Re: Kafka offset behaviour when restarting job from savepoint

2018-10-31 Thread Anil
Hey Paul. Can you please point me to the code in Flink. Thanks! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Question about serialization and performance

2018-10-31 Thread TechnoMage
In running tests of flink jobs we are seeing some that yield really good performance (2.5M records in minutes) and others that are struggleing to get past 200k records processed. In the later case there are a large number of keys, and each key gets state in the form of 3 value states. One hold

TaskManagers cannot contact JobManager in Kubernetes when JobManager HA is enabled

2018-10-31 Thread John Stone
I have successfully managed to deploy a Flink cluster in Kubernetes without JobManager high availability. Everything works great. The moment I enable high availability, TaskManagers fail to contact the JobManager. My configurations and logs are below. Can someone point me in the correct dir

Re: Flink fails continuously: Couldn't retrieve the JobExecutionResult from the JobManager.

2018-10-31 Thread vino yang
Hi marke, My advice is not to keep your client connected to JM. If you expect continuous output, you can sink it out. In addition, it does not rule out that your JM load is too high, such as the emergence of full GC and so on. So, make sure your JM has enough resources to use and monitor it. Than

Re: Ask about counting elements per window

2018-10-31 Thread Hequn Cheng
Hi Rad, You can take a look at the group window[1] of SQL. I think it may help you. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#aggregations On Thu, Nov 1, 2018 at 12:53 AM Rad Rad wrote: > Hi All, > > I have a GPS stream consumed by FlinkKafkaCons

Re: Kafka offset behaviour when restarting job from savepoint

2018-10-31 Thread Paul Lam
Hi Anil, Sure. You could have a look at FlinkKafkaConsumerBase.java. Best, Paul Lam > 在 2018年11月1日,03:37,Anil 写道: > > Hey Paul. Can you please point me to the code in Flink. Thanks! > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CsvInputFormat - read header line first

2018-10-31 Thread madan
Hi Ken, Yep correct. Thank you. On Wed, Oct 31, 2018 at 7:24 PM Ken Krugler wrote: > Hi Madan, > > If your source has a parallelism > 1, then when the CSV file is split, > only one of the operators will get the split with the header row. > > So in that case, how would you communicate the colum

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-10-31 Thread Gagan Agrawal
Thanks Yun for your inputs. Yes, increasing checkpoint helps and we are able to save save points now. In our case we wanted to increase parallelism so I believe savepoint is the only option as checkpoint doesn't support code/parallelism changes. Gagan On Wed, Oct 31, 2018 at 8:46 PM Yun Tang wro