Get minimum or maximum value from a Dataset

2016-08-10 Thread Punit Naik
Hi I have a dataset like this: val x : Dataset[Long]… I wanted to get the minimum or the maximum Long value. How do I do it?

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Perfect👍 On Mon, May 9, 2016 at 3:12 PM, Ufuk Celebi wrote: > Yes, I did just that and I used the relevant Flink terminology instead > of #cores and #machines: > > #cores => #slots per TM > #machines => #TMs > > On Mon, May 9, 2016 at 11:33 AM, Punit Naik > wro

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
Yeah, thanks a lot for that. Also if you could, please write the formula, *#cores\^2\^* * *#machines* * 4, in a different form so that its more readable and understandable. On 09-May-2016 2:54 PM, "Ufuk Celebi" wrote: > On Mon, May 9, 2016 at 11:05 AM, Punit Naik > wrote: &g

Re: How to choose the 'parallelism.default' value

2016-05-09 Thread Punit Naik
> > – Ufuk > > > On Sat, May 7, 2016 at 10:50 AM, Punit Naik > wrote: > > I am afraid not. > > > > On 07-May-2016 1:24 PM, "Aljoscha Krettek" wrote: > >> > >> Could it be that the TaskManagers are configured with not-enou

Re: How to choose the 'parallelism.default' value

2016-05-07 Thread Punit Naik
ttp://www.slideshare.net/robertmetzger1/apache-flink-hands-on#37 >> >> >> On Thu, May 5, 2016 at 1:30 PM, Punit Naik >> wrote: >> >>> Yes I followed it and changed it to 298 but again it said the same >>> thing. The only change was that it now said "r

Re: How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik
od initial value for the parallelism. > The higher the parallelism, the more network buffers are needed. I would > follow the recommendation from the exception and increase the number of > network buffers. > > On Thu, May 5, 2016 at 11:23 AM, Punit Naik > wrote: > >> Hello >

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik
Yes. On Thu, May 5, 2016 at 3:04 PM, Flavio Pompermaier wrote: > Do you run the start-cluster.sh script with the same user having the ssh > passwordless login? > > > On Thu, May 5, 2016 at 11:03 AM, Punit Naik > wrote: > >> Okay, so it was a configuration mistake o

How to choose the 'parallelism.default' value

2016-05-05 Thread Punit Naik
berOfBuffers'. What does this mean? And how to choose a proper value for parallelism? -- Thank You Regards Punit Naik

Re: Flink - start-cluster.sh

2016-05-05 Thread Punit Naik
May 4, 2016 at 1:33 PM, Punit Naik wrote: > Passwordless SSH has been setup across all the machines. And when I > execute the spark-clsuter.sh script, I can see the master logging into the > slaves but it does not start anything. It just logs in and logs out. > > I have referred to

Re: Flink - start-cluster.sh

2016-05-04 Thread Punit Naik
e. >>> Logs are written only on the master node. Slaves don't have any logs. And >>> when I ran a program it said: >>> >>> Resources available to scheduler: Number of instances=0, total number of >>> slots=0, available slots=0 >>> >>> Can anyone help please? >>> >>> -- >>> Thank You >>> >>> Regards >>> >>> Punit Naik >>> >> > -- Thank You Regards Punit Naik

Flink - start-cluster.sh

2016-05-03 Thread Punit Naik
er of instances=0, total number of slots=0, available slots=0 Can anyone help please? -- Thank You Regards Punit Naik

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Yeah thanks for letting me know. On 03-May-2016 2:40 PM, "Fabian Hueske" wrote: > Yes, but be aware that your program runs with parallelism 1 if you do not > configure the parallelism. > > 2016-05-03 11:07 GMT+02:00 Punit Naik : > >> Hi Stephen, Fabian >>

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
; See > https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#file-systems > > Greetings, > Stephan > > > On Tue, May 3, 2016 at 9:26 AM, Punit Naik wrote: > >> Hello >> >> I executed my Flink code in eclipse and it properly generated th

Sink - writeAsText problem

2016-05-03 Thread Punit Naik
folder with files in it, it just created a single file (as specified in the string) (output was correct though). Why does this happen? I want Flink to write its output in folder. -- Thank You Regards Punit Naik

Re: Perform a groupBy on an already groupedDataset

2016-05-02 Thread Punit Naik
It solved my problem! On Mon, May 2, 2016 at 3:45 PM, Fabian Hueske wrote: > Grouping a grouped dataset is not supported. > You can group on multiple keys: dataSet.groupBy(1,2). > > Can you describe your use case if that does not solve the problem? > > > > 2016-05-02 10

Perform a groupBy on an already groupedDataset

2016-05-02 Thread Punit Naik
Hello I wanted to perform a groupBy on an already grouped dataset. How do I do this? -- Thank You Regards Punit Naik

Re: Problem with writeAsText

2016-05-02 Thread Punit Naik
Please ignore this question as I forgot to do a env.execute On Mon, May 2, 2016 at 11:45 AM, Punit Naik wrote: > I have a Dataset which contains only strings. But when I execute a > writeAsText and supply a folder inside the string, it finishes with the > following output but does not

Problem with writeAsText

2016-05-01 Thread Punit Naik
://path/to/output) - UTF-8) -- Thank You Regards Punit Naik

Re: Count on grouped keys

2016-04-29 Thread Punit Naik
Yeah no problem. Its not an optimised solution but I think it gives enough understanding of how reduceGroup works. On 29-Apr-2016 5:17 PM, "Stefano Baghino" wrote: > Thanks for sharing the solution, Punit. > > On Fri, Apr 29, 2016 at 1:40 PM, Punit Naik > wrote: > >&

Re: Count on grouped keys

2016-04-29 Thread Punit Naik
= 0; var k:Map[String,String]=Map() for (t <- in) { v+=1; k=t } out.collect((k,v)) On Fri, Apr 29, 2016 at 3:59 PM, Punit Naik wrote: > I have a dataset which has maps. I have performed a groupBy on a key and I > want to count all the e

Count on grouped keys

2016-04-29 Thread Punit Naik
I have a dataset which has maps. I have performed a groupBy on a key and I want to count all the elements in a particular group. How do I do this? -- Thank You Regards Punit Naik