Well said Will. I would add that you might want to investigate GraphChi which claims to be able to run a number of large-scale graph processing tasks on a workstation much quicker than a very large Hadoop cluster. It would be interesting to know how widely applicable the approach GraphChi takes and what implications it has for parallel/distributed computing approaches. A rich seam to mine indeed.
Robin > On 15 Jul 2015, at 14:48, William Temperley <willtemper...@gmail.com> wrote: > > There seems to be a bit of confusion here - the OP (doing the PhD) had the > thread hijacked by someone with a similar name asking a mundane question. > > It would be a shame to send someone away so rudely, who may do valuable work > on Spark. > > Sashidar (not Sashid!) I'm personally interested in running graph algorithms > for image segmentation using MLib and Spark. I've got many questions though > - like is it even going to give me a speed-up? > (http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html > <http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html>) > > It's not obvious to me which classes of graph algorithms can be implemented > correctly and efficiently in a highly parallel manner. There's tons of work > to be done here, I'm sure. Also, look at parallel geospatial algorithms - > there's a lot of work being done on this. > > Best, Will > > > > On 15 July 2015 at 09:01, Vineel Yalamarthy <vineelyalamar...@gmail.com > <mailto:vineelyalamar...@gmail.com>> wrote: > Hi Daniel > > Well said > > Regards > Vineel > > > On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos > <daniel.dara...@lynxanalytics.com <mailto:daniel.dara...@lynxanalytics.com>> > wrote: > Hi Shahid, > To be honest I think this question is better suited for Stack Overflow than > for a PhD thesis. > > On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf <sha...@trialx.com > <mailto:sha...@trialx.com>> wrote: > hi > > I have a 10 node cluster i loaded the data onto hdfs, so the no. of > partitions i get is 9. I am running a spark application , it gets stuck on > one of tasks, looking at the UI it seems application is not using all nodes > to do calculations. attached is the screen shot of tasks, it seems tasks are > put on each node more then once. looking at tasks 8 tasks get completed under > 7-8 minutes and one task takes around 30 minutes so causing the delay in > results. > > > On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao <raoshashidhar...@gmail.com > <mailto:raoshashidhar...@gmail.com>> wrote: > Hi, > > I am doing my PHD thesis on large scale machine learning e.g Online > learning, batch and mini batch learning. > > Could somebody help me with ideas especially in the context of Spark and to > the above learning methods. > > Some ideas like improvement to existing algorithms, implementing new features > especially the above learning methods and algorithms that have not been > implemented etc. > > If somebody could help me with some ideas it would really accelerate my work. > > Plus few ideas on research papers regarding Spark or Mahout. > > Thanks in advance. > > Regards > > > > -- > with Regards > Shahid Ashraf > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >