t; indices. The indices are in [0, numLabels), ordered by label frequencies."
>
> Xinh
>
> On Tue, Jun 28, 2016 at 12:29 AM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>> I'm trying to a find a way to transform a DataFrame into a data that is
>&
Dear all,
I'm trying to a find a way to transform a DataFrame into a data that is
more suitable for third party classification algorithm. The DataFrame have
two columns : "feature" represented by a vector and "label" represented by
a string. I want the "label" to be a number between [0, number of
Hi all,
Is it possible to learn a gaussian mixture model with a diagonal covariance
matrix in the GMM algorithm implemented in MLIb ? It seems to be possible
but can't figure out how to do that.
Cheers,
Jao
Hi there,
The Pipeline of ml package is really a great feature and we use it in our
every day task. But we have some use case where we need a Pipeline of
Transformers only and the problem is that there's not train phase in that
case. For example, we have a pipeline of image analytics with the foll
Hi there,
The actual API of ml.Transformer use only DataFrame as input. I have a use
case where I need to transform a single element. For example transforming
an element from spark-streaming. Is there any reason for this or the
ml.Transformer will support transforming a single element later ?
Che
--
>>
>> If you don't want to compute all N^2 similarities, you need to implement
>> some kind of blocking first. For example, LSH (locally sensitive hashing).
>> A quick search gave this link to a Spark implementation:
>>
>>
>> http:
Dear all,
I'm trying to find an efficient way to build a k-NN graph for a large
dataset. Precisely, I have a large set of high dimensional vector (say d
>>> 1) and I want to build a graph where those high dimensional points
are the vertices and each one is linked to the k-nearest neighbor base
take a look at this
https://github.com/derrickburns/generalized-kmeans-clustering
Best,
Jao
On Mon, May 11, 2015 at 3:55 PM, Driesprong, Fokko
wrote:
> Hi Paul,
>
> I would say that it should be possible, but you'll need a different
> distance measure which conforms to your coordinate system.
in the assembly jar) at runtime. Make sure the
> full class name (with package name) is used. Btw, UDTs are not public
> yet, so please use it with caution. -Xiangrui
>
> On Fri, Apr 17, 2015 at 12:45 AM, Jaonary Rabarisoa
> wrote:
> > Dear all,
> >
> > Here is an e
In this example, every thing work expect save to parquet file.
On Mon, May 11, 2015 at 4:39 PM, Jaonary Rabarisoa
wrote:
> MyDenseVectorUDT do exist in the assembly jar and in this example all the
> code is in a single file to make sure every thing is included.
>
> On Tue, Apr 21,
Dear all,
Here is an example of code to reproduce the issue I mentioned in a previous
mail about saving an UserDefinedType into a parquet file. The problem here
is that the code works when I run it inside intellij idea but fails when I
create the assembly jar and run it with spark-submit. I use th
ot; % "javacpp"
% "0.11-SNAPSHOT", "org.scalatest" % "scalatest_2.10" % "2.2.0" %
"test")*
On Thu, Apr 16, 2015 at 11:16 PM, Richard Marscher wrote:
> If it fails with sbt-assembly but not without it, then there's always the
> l
Any ideas ?
On Thu, Apr 16, 2015 at 5:04 PM, Jaonary Rabarisoa
wrote:
> Dear all,
>
> Here is an issue that gets me mad. I wrote a UserDefineType in order to be
> able to store a custom type in a parquet file. In my code I just create a
> DataFrame with my custom data type and
Dear all,
Here is an issue that gets me mad. I wrote a UserDefineType in order to be
able to store a custom type in a parquet file. In my code I just create a
DataFrame with my custom data type and write in into a parquet file. When I
run my code directly inside idea every thing works like a charm
Hi all,
If you follow the example of schema merging in the spark documentation
http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging
you obtain the following results when you want to load the result data :
single triple double
1 3 null
2 6 null
4 1
I forgot to mention that the imageId field is a custom scala object. Do I
need to implement some special method to make it works (equal, hashCode ) ?
On Tue, Apr 14, 2015 at 5:00 PM, Jaonary Rabarisoa
wrote:
> Dear all,
>
> In the latest version of spark there's a feature call
Dear all,
In the latest version of spark there's a feature called : automatic
partition discovery and Schema migration for parquet. As far as I know,
this gives the ability to split the DataFrame into several parquet files,
and by just loading the parent directory one can get the global schema of
chema()
> >
> > dataDF2.saveAsParquetFile("test3.parquet") // FAIL !!!
> > }
> > }
> >
> >
> > On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng
> wrote:
> >>
> >> I cannot reproduce this error on master, but I'm not awar
Hi all,
Is it posible to zip an existing DataFrame with a RDD[T] such that the
result is a new DataFrame with one more column that the first one and the
additionnal column corresponds to the RDD[T] ?
In other words, is it possible to zip 2 DataFrame ?
Cheers,
Jaonary
quetFile("test3.parquet") //
FAIL !!! }}*
On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng wrote:
> I cannot reproduce this error on master, but I'm not aware of any
> recent bug fixes that are related. Could you build and try the current
> master? -Xiangrui
>
> On Tue,
Hi all,
DataFrame with an user defined type (here mllib.Vector) created with
sqlContex.createDataFrame can't be saved to parquet file and raise
ClassCastException:
org.apache.spark.mllib.linalg.DenseVector cannot be cast to
org.apache.spark.sql.Row error.
Here is an example of code to reproduce t
Shivaram
>
> On Tue, Mar 31, 2015 at 12:50 AM, Jaonary Rabarisoa
> wrote:
>
>> Following your suggestion, I end up with the following implementation :
>>
>>
>>
>>
>>
>>
>>
>> *override def transform(dataSet: DataFrame, paramMap:
k with the
> JNI calls and then convert back to RDD.
>
> Thanks
> Shivaram
>
> On Mon, Mar 30, 2015 at 8:37 AM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>> I'm still struggling to make a pre-trained caffe model transformer for
>> dataframe works
a
> great way to do something equivalent to mapPartitions with UDFs right now.
>
> On Tue, Mar 3, 2015 at 4:36 AM, Jaonary Rabarisoa
> wrote:
>
>> Here is my current implementation with current master version of spark
>>
>>
>>
>>
>> *class De
hu, Mar 12, 2015 at 11:36 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
>
> On Thu, Mar 12, 2015 at 3:05 PM, Jaonary Rabarisoa
> wrote:
>
>> In fact, by activating netlib with native libraries it goes faster.
>>
>> Glad you got it work ! Bet
s://github.com/fommil/netlib-java#machine-optimised-system-libraries
>
> Thanks
> Shivaram
>
> On Tue, Mar 10, 2015 at 9:57 AM, Jaonary Rabarisoa
> wrote:
>
>> I'm trying to play with the implementation of least square solver (Ax =
>> b) in mlmatrix.TSQR where A
ter/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/TSQR.scala
> [2]
> https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>
/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out
l-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/NormalEquations.scala
>
> On Tue, Mar 3, 2015 at 9:01 AM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>> Is there a least square solver based on DistributedMatrix that we can use
>> out of
Hi Cesar,
Yes, you can define an UDT with the new DataFrame, the same way that
SchemaRDD did.
Jaonary
On Fri, Mar 6, 2015 at 4:22 PM, Cesar Flores wrote:
>
> The SchemaRDD supports the storage of user defined classes. However, in
> order to do that, the user class needs to extends the UserDefi
Dear all,
Is there a least square solver based on DistributedMatrix that we can use
out of the box in the current (or the master) version of spark ?
It seems that the only least square solver available in spark is private to
recommender package.
Cheers,
Jao
r than stating it here)
> because it changes between Spark 1.2 and 1.3. In 1.3, the DSL is much
> improved and makes it easier to create a new column.
>
> Joseph
>
> On Sun, Mar 1, 2015 at 1:26 AM, Jaonary Rabarisoa
> wrote:
>
>> class DeepCNNFeature extends Tran
class DeepCNNFeature extends Transformer ... {
override def transform(data: DataFrame, paramMap: ParamMap): DataFrame
= {
// How can I do a map partition on the underlying RDD and
then add the column ?
}
}
On Sun, Mar 1, 2015 at 10:23 AM, Jaonary Rabarisoa
wrote
econd question, I would modify the above call as follows:
>
> myRDD.mapPartitions { myDataOnPartition =>
> val myModel = // instantiate neural network on this partition
> myDataOnPartition.map { myDatum => myModel.predict(myDatum) }
> }
>
> I hope this helps!
>
Dear all,
We mainly do large scale computer vision task (image classification,
retrieval, ...). The pipeline is really great stuff for that. We're trying
to reproduce the tutorial given on that topic during the latest spark
summit (
http://ampcamp.berkeley.edu/5/exercises/image-classification-wit
or alias should be
> `df.select($"image.data".as("features"))`.
>
> On Tue, Feb 24, 2015 at 3:35 PM, Xiangrui Meng wrote:
> > If you make `Image` a case class, then select("image.data") should work.
> >
> > On Tue, Feb 24, 2015 at 3:06 PM,
Hi all,
I have a DataFrame that contains a user defined type. The type is an image
with the following attribute
*class Image(w: Int, h: Int, data: Vector)*
In my DataFrame, images are stored in column named "image" that corresponds
to the following case class
*case class LabeledImage(label: Int
x27;re using.
>
> Are there particular issues you're running into?
>
> Joseph
>
> On Mon, Jan 19, 2015 at 12:59 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
>>
>> I'm trying to implement a pipeline for computer vision based on the
>> l
Hi all,
I'm trying to run the master version of spark in order to test some alpha
components in ml package.
I follow the build spark documentation and build it with :
$ mvn clean package
The build is successful but when I try to run spark-shell I got the
following errror :
*Exception in thr
That's what I did.
On Mon, Feb 2, 2015 at 11:28 PM, Sean Owen wrote:
> Snapshot builds are not published. Unless you build and install snapshots
> locally (like with mvn install) they wont be found.
> On Feb 2, 2015 10:58 AM, "Jaonary Rabarisoa" wrote:
>
>> Hi
Hi all,
I'm trying to use the master version of spark. I build and install it with
$ mvn clean clean install
I manage to use it with the following configuration in my build.sbt :
*libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" %
"1.3.0-SNAPSHOT" % "provided", "org.apache.s
Hi all,
I'm trying to implement a pipeline for computer vision based on the latest
ML package in spark. The first step of my pipeline is to decode image (jpeg
for instance) stored in a parquet file.
For this, I begin to create a UserDefinedType that represents a decoded
image stored in a array of
I've seen all
>>>> kinds of hacking to improvise it: REST api, HDFS, tachyon, etc.
>>>> Not sure if an 'official' benchmark & implementation will be released
>>>> soon
>>>>
>>>> On 9 January 2015 at 10:59, Ma
Hi all,
DeepLearning algorithms are popular and achieve many state of the art
performance in several real world machine learning problems. Currently
there are no DL implementation in spark and I wonder if there is an ongoing
work on this topics.
We can do DL in spark Sparkling water and H2O but t
Hi,
There's is a ongoing work on model export
https://www.github.com/apache/spark/pull/3062
For now, since LinearRegression is serializable you can save it as object
file :
sc.saveAsObjectFile(Seq(model))
then
val model = sc.objectFile[LinearRegresionWithSGD]("path").first
model.predict(...)
www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Mon, Dec 8, 2014 at 7:53 AM, Jaonary Rabarisoa
> wrote:
> >> After some investigation, I learned that I can't compare kmeans in mllib
> >> with another kmeans impl
Dear all,
I'm trying to understand what is the correct use case of ColumnSimilarity
implemented in RowMatrix.
As far as I know, this function computes the similarity of a column of a
given matrix. The DIMSUM paper says that it's efficient for large m (rows)
and small n (columns). In this case the
+1 with 1.3-SNAPSHOT.
On Mon, Dec 1, 2014 at 5:49 PM, agg212
wrote:
> Thanks for your reply, but I'm still running into issues
> installing/configuring the native libraries for MLlib. Here are the steps
> I've taken, please let me know if anything is incorrect.
>
> - Download Spark source
> - u
application, I will have more than 248k data to cluster.
On Fri, Dec 5, 2014 at 6:03 PM, Davies Liu wrote:
> Could you post you script to reproduce the results (also how to
> generate the dataset)? That will help us to investigate it.
>
> On Fri, Dec 5, 2014 at 8:40 AM, Jaonary Rabar
ally use Spark when you
> have a problem large enough to warrant distributing, or, your data
> already lives in a distributed store like HDFS.
>
> But it's also possible you're not configuring the implementations the
> same way, yes. There's not enough info here really to
Hi all,
I'm trying to a run clustering with kmeans algorithm. The size of my data
set is about 240k vectors of dimension 384.
Solving the problem with the kmeans available in julia (kmean++)
http://clusteringjl.readthedocs.org/en/latest/kmeans.html
take about 8 minutes on a single core.
Solvin
e Hadoop
> InputFormat would make 52 splits for it. Data drives partitions, not
> processing resource. Really, 8 splits is the minimum parallelism you
> want. Several times your # of cores is better.
>
> On Fri, Dec 5, 2014 at 8:51 AM, Jaonary Rabarisoa
> wrote:
> > Hi all,
> >
Hi all,
I'm trying to run some spark job with spark-shell. What I want to do is
just to count the number of lines in a file.
I start the spark-shell with the default argument i.e just with
./bin/spark-shell.
Load the text file with sc.textFile("path") and then call count on my data.
When I do th
Dear all,
I have a job that crashes before its end because of no space left on
device, and I noticed that this job generates a lots of temporary data on
my disk.
To be precise, the job is a simple map job that takes a set of images,
extracts local features and save these local features as a seque
Dear all,
How can one save a kmeans model after training ?
Best,
Jao
" % "2.2.0" % "test"
)
resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
On Tue, Nov 4, 2014 at 11:00 AM, Sean Owen wrote:
> Generally this means you included some javax.servlet dependency in
> your project deps. You should exclude
Hi all,
I have a spark job that I build with sbt and I can run without any problem
with sbt run. But when I run it inside IntelliJ Idea I got the following
error :
*Exception encountered when invoking run on a nested suite - class
"javax.servlet.FilterRegistration"'s signer information does not m
e the type of the id param to Int it works for me but I don't
>> know why.
>>
>> case class PersonID(id: Int)
>>
>> Looks like a strange behavior to me. Have a try.
>>
>> Good luck,
>> Niklas
>>
>>
>> On 23.10.2014 21:52, Jaonary
Hi all,
I have the following case class that I want to use as a key in a key-value
rdd. I defined the equals and hashCode methode but it's not working. What
I'm doing wrong ?
*case class PersonID(id: String) {*
* override def hashCode = id.hashCode*
* override def equals(other: Any) = o
lib/CosineSimilarity.scala>
> .
>
> This implements the DIMSUM sampling scheme, recently merged into master
> <https://github.com/apache/spark/pull/1778>.
>
> Best,
> Reza
>
> On Fri, Oct 17, 2014 at 3:43 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
>>
Hi all,
I need to compute a similiarity between elements of two large sets of high
dimensional feature vector.
Naively, I create all possible pair of vectors with
* features1.cartesian(features2)* and then map the produced paired rdd with
my similarity function.
The problem is that the cartesian
you are
> getting?
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
> On Fri, Oct 17, 2014 at 12:28 PM, Jaonary Rabarisoa
> wrote:
>
>> Dear all,
>>
>> Is it p
Dear all,
Is it possible to use any kind of object as key in a PairedRDD. When I use
a case class key, the groupByKey operation don't behave as I expected. I
want to use a case class to avoid using a large tuple as it is easier to
manipulate.
Cheers,
Jaonary
And what about Hue http://gethue.com ?
On Sun, Oct 12, 2014 at 1:26 PM, andy petrella
wrote:
> Dear Sparkers,
>
> As promised, I've just updated the repo with a new name (for the sake of
> clarity), default branch but specially with a dedicated README containing:
>
> * explanations on how to lau
in fact with --driver-memory 2G I can get it working
On Thu, Oct 9, 2014 at 6:20 PM, Xiangrui Meng wrote:
> Please use --driver-memory 2g instead of --conf
> spark.driver.memory=2g. I'm not sure whether this is a bug. -Xiangrui
>
> On Thu, Oct 9, 2014 at 9:00 AM, Jaonary R
Dear all,
I have a spark job with the following configuration
*val conf = new SparkConf()*
* .setAppName("My Job")*
* .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")*
* .set("spark.kryo.registrator", "value.serializer.Registrator")*
* .setMaster("local[4]")*
Dear all,
I have a spark job that communicates with a C++ code using pipe. Since, the
data I need to send is rather complicated, I think about using protobuf to
serialize it. The problem is that the string form of my data outputted by
protobuf contains the "\n" character so it a bit complicated to
chanism.html#Dependency_Scope
>
> Cheers
>
> On Fri, Sep 26, 2014 at 8:57 AM, Jaonary Rabarisoa
> wrote:
>
>> Thank Ted. Can you tell me how to adjust the scope ?
>>
>> On Fri, Sep 26, 2014 at 5:47 PM, Ted Yu wrote:
>>
>>> spark-c
> Adjusting the scope should solve the problem below.
>
> On Fri, Sep 26, 2014 at 8:42 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
>>
>> I'm using some functions from Breeze in a spark job but I get the
>> following build error :
>>
>> *Error:s
Hi all,
I'm using some functions from Breeze in a spark job but I get the following
build error :
*Error:scalac: bad symbolic reference. A signature in RandBasis.class
refers to term math3*
*in package org.apache.commons which is not available.*
*It may be completely missing from the current clas
Hi all,
I'm trying to process a large image data set and need some way to optimize
my implementation since it's very slow from now. In my current
implementation I store my images in an object file with the following fields
case class Image(groupId: String, imageId: String, buffer: String)
Images
Dear all,
I'm facing the following problem and I can't figure how to solve it.
I need to join 2 rdd in order to find their intersections. The first RDD
represent an image encoded in base64 string associated with image id. The
second RDD represent a set of geometric primitives (rectangle) associa
Dear all,
When callinig an external process with RDD.pipe I got the following error :
*Not interrupting system thread Thread[process reaper,10,system]*
*Not interrupting system thread Thread[process reaper,10,system]*
*Not interrupting system thread Thread[process reaper,10,system]*
*14/09/01 10
1.0.2
On Friday, August 29, 2014, Michael Armbrust wrote:
> What version are you using?
>
>
>
> On Fri, Aug 29, 2014 at 2:22 AM, Jaonary Rabarisoa > wrote:
>
>> Still not working for me. I got a compilation error : *value in is not a
>> member of Symbol.* An
[Expression]("a", "b", ...)
> table("src").where('key in (longList: _*))
>
> Also, note that I had to explicitly specify Expression as the type
> parameter of Seq to ensure that the compiler converts "a" and "b" into
> Spark SQ
le.where('name in ("foo", "bar"))
>
>
>
> On Thu, Aug 28, 2014 at 3:09 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
>>
>> What is the expression that I should use with spark sql DSL if I need to
>> retreive
>> data with a fi
Hi all,
What is the expression that I should use with spark sql DSL if I need to
retreive
data with a field in a given set.
For example :
I have the following schema
case class Person(name: String, age: Int)
And I need to do something like :
personTable.where('name in Seq("foo", "bar")) ?
Ch
forgot the second point, I found the answer myself inside the source code
PipedRDD :)
On Wed, Aug 27, 2014 at 1:36 PM, Jaonary Rabarisoa
wrote:
> Thank you Matei.
>
> I found a solution using pipe and matlab engine (an executable that can
> call matlab behind the scene and us
the command line. Just watch out for any environment variables
> needed (you can pass them to pipe() as an optional argument if there are
> some).
>
> On August 25, 2014 at 12:41:29 AM, Jaonary Rabarisoa (jaon...@gmail.com)
> wrote:
>
> Hi all,
>
> Is there someone
Dear all,
I'm looking for an efficient way to manage external dependencies. I know
that one can add .jar or .py dependencies easily but how can I handle other
type of dependencies. Specifically, I have some data processing algorithm
implemented with other languages (ruby, octave, matlab, c++) and
Hi all,
Is there someone that tried to pipe RDD into matlab script ? I'm trying to
do something similiar if one of you could point some hints.
Best regards,
Jao
Dear all,
Is there any example of mapPartitions that fork external process or how to
make RDD.pipe working on every data of a partition ?
Cheers,
Jaonary
is your query? Did you use the Hive Parser (your query was
> submitted through hql(...)) or the basic SQL Parser (your query was
> submitted through sql(...)).
>
> Thanks,
>
> Yin
>
>
> On Tue, Jul 15, 2014 at 8:52 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
Hi all,
When running a join operation with Spark SQL I got the following error :
Exception in thread "main"
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Ambiguous
references to id: (id#303,List()),(id#0,List()), tree:
Filter ('videoId = 'id)
Join Inner, None
ParquetRelation
Hi all,
How should I store a one to many relationship using spark sql and parquet
format. For example I the following case class
case class Person(key: String, name: String, friends: Array[String])
gives an error when I try to insert the data in a parquet file. It doesn't
like the Array[String]
single object in 1 rdd
> would perhaps not be super optimized.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Wed, Jul 9, 2014 at 12:17 PM, Jaonary
Hi all,
I need to run a spark job that need a set of images as input. I need
something that load these images as RDD but I just don't know how to do
that. Do some of you have any idea ?
Cheers,
Jao
Hi all,
I need to run a complex external process with a lots of dependencies from
spark. The "pipe" and "addFile" function seem to be my friends but there
are just some issues that I need to solve.
Precisely, the process I want to run are C++ executable that may depend on
some libraries and addit
Is there an equivalent of wholeTextFiles for binary files for example a set
of images ?
Cheers,
Jaonary
Hi all,
I'm trying to use spark sql to store data in parquet file. I create the
file and insert data into it with the following code :
*val conf = new SparkConf().setAppName("MCT").setMaster("local[2]")
val sc = new SparkContext(conf)val sqlContext = new SQLContext(sc)
n talk to
>
>
> On Tue, Jun 24, 2014 at 3:12 AM, Jaonary Rabarisoa
> wrote:
>
>> Hi all,
>>
>> So far, I run my spark jobs with spark-shell or spark-submit command. I'd
>> like to go further and I wonder how to use spark as a backend of a web
>>
Hi all,
So far, I run my spark jobs with spark-shell or spark-submit command. I'd
like to go further and I wonder how to use spark as a backend of a web
application. Specificaly, I want a frontend application ( build with nodejs
) to communicate with spark on the backend, so that every query from
ltiple gpu machines.
>
> Sent from my iPhone
>
> > On Apr 11, 2014, at 8:38 AM, Jaonary Rabarisoa
> wrote:
> >
> > Hi all,
> >
> > I'm just wondering if hybrid GPU/CPU computation is something that is
> feasible with spark ? And what should be the best way to do it.
> >
> >
> > Cheers,
> >
> > Jaonary
>
Hi all,
I'm just wondering if hybrid GPU/CPU computation is something that is
feasible with spark ? And what should be the best way to do it.
Cheers,
Jaonary
es this?
>>
>> Matei
>>
>> On Mar 28, 2014, at 3:09 AM, Jaonary Rabarisoa wrote:
>>
>> I forgot to mention that I don't really use all of my data. Instead I use
>> a sample extracted with randomSample.
>>
>>
>> On Fri, Mar 28, 2014 at
Hi all;
Can someone give me some tips to compute mean of RDD by key , maybe with
combineByKey and StatCount.
Cheers,
Jaonary
I forgot to mention that I don't really use all of my data. Instead I use a
sample extracted with randomSample.
On Fri, Mar 28, 2014 at 10:58 AM, Jaonary Rabarisoa wrote:
> Hi all,
>
> I notice that RDD.cartesian has a strange behavior with cached and
> uncached data. More pr
Hi all,
I notice that RDD.cartesian has a strange behavior with cached and uncached
data. More precisely, I have a set of data that I load with objectFile
*val data: RDD[(Int,String,Array[Double])] = sc.objectFile("data")*
Then I split it in two set depending on some criteria
*val part1 = data
m, you need
> to be careful - ObjectInputStream uses root classloader to load classes and
> does not work with jars that are added to TCCC. Apache commons has
> ClassLoaderObjectInputStream to workaround this.
>
>
> On Wed, Mar 26, 2014 at 1:38 PM, Jaonary Rabarisoa wrote:
wrote:
> > Have you looked through the logs fully? I have seen this (in my limited
> > experience) pop up as a result of previous exceptions/errors, also as a
> > result of being unable to serialize objects etc.
> > Ognen
> >
> >
> > On 3/26/14, 10:39
1 - 100 of 124 matches
Mail list logo