Re: Research ideas using spark

Robin East Wed, 15 Jul 2015 09:14:24 -0700

Well said Will. I would add that you might want to investigate GraphChi which 
claims to be able to run a number of large-scale graph processing tasks on a 
workstation much quicker than a very large Hadoop cluster. It would be 
interesting to know how widely applicable the approach GraphChi takes and what 
implications it has for parallel/distributed computing approaches. A rich seam 
to mine indeed.


Robin
> On 15 Jul 2015, at 14:48, William Temperley <willtemper...@gmail.com> wrote:
> 
> There seems to be a bit of confusion here - the OP (doing the PhD) had the 
> thread hijacked by someone with a similar name asking a mundane question.
> 
> It would be a shame to send someone away so rudely, who may do valuable work 
> on Spark.
> 
> Sashidar (not Sashid!) I'm personally interested in running graph algorithms 
> for image segmentation using MLib and Spark.  I've got many questions though 
> - like is it even going to give me a speed-up?  
> (http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html 
> <http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html>)
> 
> It's not obvious to me which classes of graph algorithms can be implemented 
> correctly and efficiently in a highly parallel manner.  There's tons of work 
> to be done here, I'm sure. Also, look at parallel geospatial algorithms - 
> there's a lot of work being done on this.
> 
> Best, Will
> 
> 
> 
> On 15 July 2015 at 09:01, Vineel Yalamarthy <vineelyalamar...@gmail.com 
> <mailto:vineelyalamar...@gmail.com>> wrote:
> Hi Daniel
> 
> Well said
> 
> Regards 
> Vineel
> 
> 
> On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos 
> <daniel.dara...@lynxanalytics.com <mailto:daniel.dara...@lynxanalytics.com>> 
> wrote:
> Hi Shahid,
> To be honest I think this question is better suited for Stack Overflow than 
> for a PhD thesis.
> 
> On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf <sha...@trialx.com 
> <mailto:sha...@trialx.com>> wrote:
> hi 
> 
> I have a 10 node cluster  i loaded the data onto hdfs, so the no. of 
> partitions i get is 9. I am running a spark application , it gets stuck on 
> one of tasks, looking at the UI it seems application is not using all nodes 
> to do calculations. attached is the screen shot of tasks, it seems tasks are 
> put on each node more then once. looking at tasks 8 tasks get completed under 
> 7-8 minutes and one task takes around 30 minutes so causing the delay in 
> results. 
> 
> 
> On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao <raoshashidhar...@gmail.com 
> <mailto:raoshashidhar...@gmail.com>> wrote:
> Hi,
> 
> I am doing my PHD thesis on large scale machine learning e.g  Online 
> learning, batch and mini batch learning.
> 
> Could somebody help me with ideas especially in the context of Spark and to 
> the above learning methods. 
> 
> Some ideas like improvement to existing algorithms, implementing new features 
> especially the above learning methods and algorithms that have not been 
> implemented etc.
> 
> If somebody could help me with some ideas it would really accelerate my work.
> 
> Plus few ideas on research papers regarding Spark or Mahout.
> 
> Thanks in advance.
> 
> Regards 
> 
> 
> 
> -- 
> with Regards
> Shahid Ashraf
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: Research ideas using spark

Reply via email to