Re: set yarn jvm options

2016-08-19 Thread Robert Metzger
Hi, yes, using "env.java.opts": https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html On Thu, Aug 4, 2016 at 4:03 AM, Prabhu V wrote: > Hi, > > Is there a way to set jvm options on the yarn application-manager and > task-manager with flink ? > > Thanks, > Prabhu >

Re: 1.1.1: JobManager config endpoint no longer supplies port

2016-08-19 Thread Robert Metzger
Hi Shannon, the problem is that YARNs proxy only allows GET HTTP requests, but for uploading files, a different request type is needed. I've filed a JIRA for the problem you've reported: https://issues.apache.org/jira/browse/FLINK-4432 Regards, Robert On Mon, Aug 15, 2016 at 6:03 PM, Shannon Ca

Re: Flink Cluster setup

2016-08-19 Thread Robert Metzger
Hi, Flink allocates the blob server at an ephemeral port, so it'll change every time you start Flink. However, the "blob.server.port" configuration [1] allows you to use a predefined port, or even a port range. If your Kafka topic has only one partition, only one of the 8 tasks will read the data

Re: Compress DataSink Output

2016-08-19 Thread Wesley Kerr
That looks good. Thanks! On Fri, Aug 19, 2016 at 6:15 AM Robert Metzger wrote: > Hi Wes, > > Flink's own OutputFormats don't support compression, but we have some > tools to use Hadoop's OutputFormats with Flink [1], and those support > compression: > https://hadoop.apache.org/docs/stable/api/o

Flink Cluster setup

2016-08-19 Thread Alam, Zeeshan
Hi All, I have set up a flink standalone cluster, with one master and two slave , all RedHat-7 machines. In the master Dashboard http://flink-master:8081/ I can see 2 Task Manager and 8 task slot as I have set taskmanager.numberOfTaskSlots: 4 in flink-conf.yaml in all of the slaves. Now when

Re: Checking for existance of output directory/files before running a batch job

2016-08-19 Thread Robert Metzger
Ooops. Looks like Google Mail / Apache / the internet needs 13 minutes to deliver an email. Sorry for double answering. On Fri, Aug 19, 2016 at 3:07 PM, Maximilian Michels wrote: > HI Niels, > > Have you tried specifying the fully-qualified path? The default is the > local file system. > > For e

Re: Checking for existance of output directory/files before running a batch job

2016-08-19 Thread Robert Metzger
Hi Niels, I assume the directoryName you are passing doesn't have the file system prefix (hdfs:// or s3://, ...) specified. In those cases, Path.getFileSystem() is looking up the default file system prefix from the configuration. Probably the environment where you are submitting the job from doesn

Re: Batch jobs with a very large number of input splits

2016-08-19 Thread Robert Metzger
Hi Niels, In Flink, you don't need one task per file, since splits are assigned lazily to reading tasks. What exactly is the error you are getting when trying to read that many input splits? (Is it on the JobManager?) Regards, Robert On Thu, Aug 18, 2016 at 1:56 PM, Niels Basjes wrote: > Hi, >

Re: Compress DataSink Output

2016-08-19 Thread Robert Metzger
Hi Wes, Flink's own OutputFormats don't support compression, but we have some tools to use Hadoop's OutputFormats with Flink [1], and those support compression: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html Let me know if you need more info

Re: problem running flink using remote environment

2016-08-19 Thread Robert Metzger
Hi Baswaraj, when you are using the ./bin/flink run client for submitting jobs, the StreamExecutionEnvironment.getExecutionEnvironment(); call is the correct one to retrieve the Execution Environment. So you can not use the RemoteEnvironment with the ./bin/flink tool.. The purpose of the remote e

Re: Checking for existance of output directory/files before running a batch job

2016-08-19 Thread Maximilian Michels
HI Niels, Have you tried specifying the fully-qualified path? The default is the local file system. For example, hdfs:///path/to/foo If that doesn't work, do you have the same Hadoop configuration on the machine where you test? Cheers, Max On Thu, Aug 18, 2016 at 2:02 PM, Niels Basjes wrote:

Re: Flink redshift table lookup and updates

2016-08-19 Thread Robert Metzger
Hi Harshith, Welcome to the Flink community ;) I would recommend using approach 2. Keeping the state in Flink and just sending updates to the dashboard store should give you better performance and consistency. I don't know whether its better to download the full state snapshot from redshift in th

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-19 Thread Maximilian Michels
Hi Mira, If I understood correctly, the log output should be for Flink 1.1.1. However, there are classes present in the log which don't exist in Flink 1.1.1, e.g. FlinkYarnClient. Could you please check if you posted the correct log? Also, it would be good to have not only the client log but also