Re: Issue with running Flink Python jobs on cluster

2016-07-18 Thread Geoffrey Mon
Hello Chesnay, Thank you very much! With your help I've managed to set up a Flink cluster that can run Python jobs successfully. I solved my issue by removing local=True and installing HDFS in a separate cluster. I don't think it was clearly mentioned in the documentation that HDFS was required f

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-18 Thread Clifford Resnick
In 1.1, AbstractYarnClusterDescriptor pushes contents of flink/lib (local to where the yarn app is launched) to Yarn with a single directory copy. In 1.0.3 it looked like it was copying the individual jars. So, yes I did actually change HDFSCopyToLocal, which was easy, but the job staging in t

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-18 Thread Ufuk Celebi
Hey Cliff! Good to see that we came to the same conclusion :-) What do you mean with copying of the "lib" folder? This issue should be the same for both 1.0 and 1.1. Another work around could be to use the fully async RocksDB snapshots with Flink 1.1-SNAPSHOT. If you like, you could also work on t

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-18 Thread Clifford Resnick
Hi Ufuk, My mail was down, so I missed this response. Thanks for that. On 7/18/16, 10:38 AM, "Ufuk Celebi" wrote: Hey Cliff! I was able to reproduce this by locally running a job and RocksDB semi asynchronous checkpoints (current default) to S3A. I've created an issue here:

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-18 Thread Clifford Resnick
The root cause of this problem seems to be that Flink is copying directories with the FileSystem. Unfortunately, unlike the default HDFS implementation, org.apache.ahadoop.fs.s3a.S3AFileSystem does not implement a recursive copyFromLocalFile and Flink 1.0.3 fails when is tries to copy a Window O

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-18 Thread Ufuk Celebi
Hey Cliff! I was able to reproduce this by locally running a job and RocksDB semi asynchronous checkpoints (current default) to S3A. I've created an issue here: https://issues.apache.org/jira/browse/FLINK-4228. Running with S3N it is working as expected. You can use that implementation as a work

RE: 1.1 release

2016-07-18 Thread aris kol
Dropping the binary on my qa cluster as we speak.Thanks for the prompt reponse > From: u...@apache.org > Date: Mon, 18 Jul 2016 15:47:25 +0200 > Subject: Re: 1.1 release > To: user@flink.apache.org > > We are in the processing of fixing the last issues before starting the > 1.1 RC1 release vote.

Re: 1.1 release

2016-07-18 Thread Ufuk Celebi
We are in the processing of fixing the last issues before starting the 1.1 RC1 release vote. The latest discussion can be found in these threads: RC0 1.1. preview (with known issues): https://mail-archives.apache.org/mod_mbox/flink-dev/201607.mbox/%3cCAKiyyaH_USuVD6etfehaRm6GLkGie=dczg_g+5+bt5x+8s

1.1 release

2016-07-18 Thread aris kol
Hi, Any clues as to when 1.1 will be released? Thank,Aris

Unable to get the value of datatype in datastream

2016-07-18 Thread subash basnet
Hello all, I am trying to cluster datastream points around a centroid. My input is stock data where the centroid id I have taken as the timestamp of the stock. The error I am facing is in getting *id *of the *centroid* within *flatMap2*. Below is my code if you could look: ConnectedIterativeStrea

Re: how to get rid of null pointer exception in collection in DataStream

2016-07-18 Thread subash basnet
Hello Robert, Yup thank you, I used array in place of collection to rid of that error. Best Regards, Subash Basnet On Thu, Jul 14, 2016 at 4:03 PM, Robert Metzger wrote: > [image: Boxbe] This message is eligible > for Automatic Cleanup! (rmetz...@apache.org)

Re: Elasticsearch connector and number of shards

2016-07-18 Thread Flavio Pompermaier
Indeed, we've tried with the parameter *index.number_of_shards* but it didn't work so I fear that this parameter is not handled by the current implementation..am I wrong? On Mon, Jul 18, 2016 at 11:37 AM, Ufuk Celebi wrote: > I've never used the Elasticsearch sink, but the docs say: > > "Note ho

Re: Elasticsearch connector and number of shards

2016-07-18 Thread Ufuk Celebi
I've never used the Elasticsearch sink, but the docs say: "Note how a Map of Strings is used to configure the Sink. The configuration keys are documented in the Elasticsearch documentationhere. Especially important is the cluster.name parameter that must correspond to the name of your cluster." T

Re: Intermediate Data Caching

2016-07-18 Thread Ufuk Celebi
Hey Saliya, the result of each iteration (super step) that is fed back to the iteration is cached. For the iterate operator that is the last partial solution and for the delta iterate operator it's the current solution set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/it

Re: Different results on local and on cluster

2016-07-18 Thread Ufuk Celebi
Thanks for reporting back! On Mon, Jul 18, 2016 at 10:13 AM, Flavio Pompermaier wrote: > Hi to all, > I forgot to close this thread. In the end the error was (fortunately) in my > code, since I use the "reuse strategy" and in one case I forgot to reset the > field of a POJO I was filling in a map

Elasticsearch connector and number of shards

2016-07-18 Thread Flavio Pompermaier
Hi to all, we tried to use the streaming ES connector of Flink 1.1-SNAPSHOT and we wanted to set the number of shards when creating a new index. Is that possible at the moment? Best, Flavio

Re:

2016-07-18 Thread Ufuk Celebi
PS: Please provide a subject line in the future, it makes it easier for the community to asses whether they can help or not without looking into the message. On Mon, Jul 18, 2016 at 11:14 AM, Ufuk Celebi wrote: > I would discourage from using the GlobalJobParameters. I think their > main purpose

Re:

2016-07-18 Thread Ufuk Celebi
I would discourage from using the GlobalJobParameters. I think their main purpose was to display configuration keys on the web interface. Instead, I would simply do it as a class field which you set in the constructor. public static final class Tokenizer extends RichFlatMapFunction> { privat

Re: Different results on local and on cluster

2016-07-18 Thread Flavio Pompermaier
Hi to all, I forgot to close this thread. In the end the error was (fortunately) in my code, since I use the "reuse strategy" and in one case I forgot to reset the field of a POJO I was filling in a map function. So, every time I was running the job the error was in a different output object. Than