date:20161230

Re: [ML] [GraphFrames] : Bayesian Network framework

2016-12-30 Thread Felix Cheung

GraphFrames has a Belief Propagation example Have you checked it out? graphframes.github.io/api/scala/index.html#org.graphframes.examples.BeliefPropagation$ From:

Re: ml word2vec finSynonyms return type

2016-12-30 Thread Felix Cheung

Could you link to the JIRA here? What you suggest makes sense to me. Though we might want to maintain compatibility and add a new method instead of changing the return type of the existing one. _ From: Asher Krim > Sent:

Re: RDD Location

2016-12-30 Thread Fei Hu

It will be very appreciated if you can give more details about why runJob function could not be called in getPreferredLocations() In the NewHadoopRDD class and HadoopRDD class, they get the location information from the inputSplit. But there may be an issue in NewHadoopRDD, because it generates

Re: RDD Location

2016-12-30 Thread Sun Rui

You can’t call runJob inside getPreferredLocations(). You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations() appropriately. > On Dec 31, 2016, at 09:48, Fei Hu wrote: > > That is a good idea. > > I tried add the following code to

context.runJob() was suspended in getPreferredLocations() function

2016-12-30 Thread Fei Hu

Dear all, I tried to customize my own RDD. In the getPreferredLocations() function, I used the following code to query anonter RDD, which was used as an input to initialize this customized RDD: * val results: Array[Array[DataChunkPartition]] = context.runJob(partitionsRDD,

Re: RDD Location

2016-12-30 Thread Fei Hu

That is a good idea. I tried add the following code to get getPreferredLocations() function: val results: Array[Array[DataChunkPartition]] = context.runJob( partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true) But it

[ML] [GraphFrames] : Bayesian Network framework

2016-12-30 Thread Brian Cajes

Hi, I'm interested in using (or contributing to an implementation) of a Bayesian Network framework within Spark. Similar to https://github.com/jmschrei/pomegranate/blob/master/examples/bayesnet_monty_hall_train.ipynb . I've found a related library for spark:

Re: repeated unioning of dataframes take worse than O(N^2) time

2016-12-30 Thread Liang-Chi Hsieh

Actually, as you use Dataset's union API, unlike RDD's union API, it will break the nested structure. So that should not be the issue. The additional time introduced when the number of dataframes grows, is spent on analysis stage. I can think that as the Union has a long children list, the

Re: [ML] [GraphFrames] : Bayesian Network framework

Re: ml word2vec finSynonyms return type

Re: RDD Location

Re: RDD Location

context.runJob() was suspended in getPreferredLocations() function

Re: RDD Location

[ML] [GraphFrames] : Bayesian Network framework

Re: repeated unioning of dataframes take worse than O(N^2) time

8 matches

Site Navigation

Mail list logo

Footer information