Adding your application jar to the sparkContext will resolve this issue.
Eg:
sparkContext.addJar(./target/scala-2.10/myTestApp_2.10-1.0.jar)
Thanks
Best Regards
On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao xiaotao.cs@gmail.com wrote:
In the beginning I tried to read HBase and found that
Few things to keep in mind:
- I believe Driver memory should not exceed executor memory
- Set spark.storage.memoryFraction default is 0.6
- Set spark.rdd.compress default is set to false
- Always specify the level of parallelism while doing a groupBy, reduceBy,
join, sortBy etc.
- If you don't
Hi Akhil,
Thanks for the response..
Another query... do you know how to use spark.executor.extraJavaOptions
option?
SparkConf.set(spark.executor.extraJavaOptions,what value should go in
here)?
I am trying to find an example but cannot seem to find the same..
On Mon, Oct 13, 2014 at 12:03 AM,
like this you can set:
sparkConf.set(spark.executor.extraJavaOptions, -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300
-XX:MaxInlineSize=300 )
Here's a benchmark example
Cool.. Thanks.. And one last final question..
conf = SparkConf.set().set(...)
matrix = get_data(..)
rdd = sc.parallelize(matrix) # heap error here...
How and where do I set set the storage level.. seems like conf is the wrong
place to set this thing up..?? as I get this error:
This is how the table was created:
transactions = parts.map(lambda p: Row(customer_id=long(p[0]),
chain=int(p[1]), dept=int(p[2]), category=int(p[3]), company=int(p[4]),
brand=int(p[5]), date=str(p[6]), productsize=float(p[7]),
productmeasure=str(p[8]), purchasequantity=int(p[9]),
Like this:
import org.apache.spark.storage.StorageLevel
val rdd = sc.parallelize(1 to
100).persist(StorageLevel.MEMORY_AND_DISK_SER)
Thanks
Best Regards
On Mon, Oct 13, 2014 at 12:50 PM, Chengi Liu chengi.liu...@gmail.com
wrote:
Cool.. Thanks.. And one last final question..
conf =
https://issues.apache.org/jira/browse/SPARK-3106
I'm having the saming errors described in SPARK-3106 (no other types of
errors confirmed), running a bunch sql queries on spark 1.2.0 built from
latest master HEAD.
Any updates to this issue?
My main task is to join a huge fact table with a dozen
Thanks Michael, your patch works for me :)
Regards,
Kelvin Paul
On Fri, Oct 3, 2014 at 3:52 PM, Michael Armbrust mich...@databricks.com
wrote:
Are you running master? There was briefly a regression here that is
hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635.
On Fri,
Hmm... it failed again, just lasted a little bit longer.
Jianshi
On Mon, Oct 13, 2014 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
https://issues.apache.org/jira/browse/SPARK-3106
I'm having the saming errors described in SPARK-3106 (no other types of
errors confirmed), running
Hi all, I tried to set the configuration
spark.sql.inMemoryColumnarStorage.compressed,
and spark.sql.inMemoryColumnarStorage.batchSize in
spark.executor.extraJavaOptions
but it does not work, my spark.executor.extraJavaOptions contains
Dspark.sql.inMemoryColumnarStorage.compressed=true
All,
We are using Spark Streaming to receive data from twitter stream. This is
running behind proxy. We have done the following configurations inside spark
steaming for twitter4j to work behind proxy.
def main(args: Array[String]) {
val filters = Array(Modi)
Hello,
Given the following structure, is it possible to query, e.g. session[0].id ?
In general, is it possible to query Array Of Struct in json RDDs?
root
|-- createdAt: long (nullable = true)
|-- id: string (nullable = true)
|-- sessions: array (nullable = true)
||-- element:
Hi,
We are using Spark for several months now and will be glad to join the
Spark family officially :)
*Company Name*: Supersonic
Supersonic is a mobile advertising company.
*URL*: http://www.supersonicads.com/
*Components and use-cases*: Using Spark core for big data crunching, MLlib
for
Hi ALL,
I set spark.storage.blockManagerSlaveTimeoutMs=10 in
spark-default.conf file, but in http://driver:4040 page does not exist
under the “Environment” tab of the settings, why? ps : new SparkConf() in
code and use spark standalone mode.
Thanks,
chepoo
Hi,
Thank you so much!
By the way, what is the DATEADD function in Scala/Spark? or how to implement
DATEADD(MONTH, 3, '2013-07-01')” and DATEADD(YEAR, 1, '2014-01-01')” in Spark
or Hive?
Regards
Arthur
On 12 Oct, 2014, at 12:03 pm, Ilya Ganelin ilgan...@gmail.com wrote:
Because of how
Currently Spark SQL doesn’t support reading SQL specific configurations
via system properties. But for |HiveContext|, you can put them in
|hive-site.xml|.
On 10/13/14 4:28 PM, Kevin Paul wrote:
Hi all, I tried to set the configuration
spark.sql.inMemoryColumnarStorage.compressed, and
When I execute the following in yarn-client mode its working fine and giving
the result properly, but when i try to run in Yarn-cluster mode i am getting
error
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client
Is it planned in a near future ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-custom-aggregation-function-UDAF-tp15784p16275.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
Thank you Liquan. I just missed some in information in my previous post.
I just solved the problem.
Actually, I use the first line(schema header) of the CSV file to generate
StructType and StructField. However, the input file is UTF-8 Unicode (*with*
BOM), so the first char of the file is
Hello!
I'm new to Spark, and write a Java program with it.
But when I implement the Accumulator, something
confuse me, here is my implementation:
/class /WeightAccumulatorParam /implements /AccumulatorParamWeight{
/ @*Override*/
public Weight zero(/Weight /initialValue){
return new
Question 1: Please check
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#hive-tables.
Question 2:
One workaround is to re-write it. You can use LEFT SEMI JOIN to implement
the subquery with EXISTS and use LEFT OUTER JOIN + IS NULL to implement the
subquery with NOT EXISTS.
SELECT
Are you sure you aren't actually trying to extend AccumulableParam
instead of AccumulatorParam? The example you cite does the latter. I
do not get a compile error from this example.
You also didn't say what version of Spark.
(Although there are a few things about the example that could be
Hi Cheng,
I am using version 1.1.0.
Looks like that bug was fixed sometime after 1.1.0 was released. Interestingly,
I tried your code on 1.1.0 and it gives me a different (incorrect) result:
case class T(a:String, ts:java.sql.Timestamp)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
If you are using HiveContext, it should work in 1.1.
Thanks,
Yin
On Mon, Oct 13, 2014 at 5:08 AM, shahab shahab.mok...@gmail.com wrote:
Hello,
Given the following structure, is it possible to query, e.g. session[0].id
?
In general, is it possible to query Array Of Struct in json RDDs?
Hi Shahab,
Can you try to use HiveContext? Its should work in 1.1. For SQLContext,
this issues was not fixed in 1.1 and you need to use master branch at the
moment.
Thanks,
Yin
On Sun, Oct 12, 2014 at 5:20 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
Apparently is it is possible to query
Dear friends,
Is there any way to know what is the “predicted label” for each “input label”?
[CODE]
…
Val model=SVMWithSGD.train(training,numIterations)
Model.clearThreshold()
Val scoreAndLabels = test.map{ point =
Val score = model.predict(point.features)
Seems the reason that you got wrong results was caused by timezone.
The time in java.sql.Timestamp(long time) means milliseconds since January
1, 1970, 00:00:00 *GMT*. A negative number is the number of milliseconds
before January 1, 1970, 00:00:00 *GMT*.
However, in ts='1970-01-01 00:00:00',
Hi,
Can somebody please share the plans regarding java version's support for
apache spark 1.2.0 or near future releases.
Will java 8 become the all feature supported version in apache spark 1.2 or
java 1.7 will suffice?
Thanks,
Aside from some syntax errors, this looks like exactly how you do it, right?
Except that you call clearThreshold(), which causes it to return the
margin, not a 0/1 prediction. Don't call that. It will default to the
behavior you want.
On Mon, Oct 13, 2014 at 3:03 PM, Alfonso Muñoz Muñoz
I have not heard any plans to even drop support for Java 6. I imagine
it will remain that way for a while. Java 6 is sufficient.
On Mon, Oct 13, 2014 at 3:37 PM, twinkle sachdeva
twinkle.sachd...@gmail.com wrote:
Hi,
Can somebody please share the plans regarding java version's support for
Thanks Yin. I trued HiveQL and and it solved that problem. But now I have
second query requirement :
But since you are main developer behind JSON-Spark integration (I saw your
presentation on youtube Easy JSON Data Manipulation in Spark), is it
possible to perform aggregation kind queries,
for
I am sorry that I forget mentioning the version of spark.
The version is spark-1.1.0.
I am pretty sure I am not trying to extend AccumulableParam,
you can see the code of the implementation of WeightAccumulatorParam
in my post, it implements the interface *AccumulatorParamWeight*.
And I am not
Good guess, but that is not the reason. Look at this code:
scala val data = sc.parallelize(132554880L::133554880L::Nil).map(i=
T(i.toString, new java.sql.Timestamp(i)))
data: org.apache.spark.rdd.RDD[T] = MappedRDD[17] at map at console:17
scala data.collect
res3: Array[T] =
Hi Shahab,
Do you mean queries with group by and aggregation functions? Once you
register the json dataset as a table, you can write queries like querying a
regular table. You can join it with other tables and do aggregations. Is it
what you were asking for? If not, can you give me a more
Further more, the second building can be passed, but exception
will be throw while running:
Exception is NoClassDefFoundError.
The situation is the same as the sample on line -- VectorAccumulatorParam.
So to me, if I don't implement the addAccumulator function, I cannot custom
the
Yeah, it is not related to timezone. I think you hit this issue
https://issues.apache.org/jira/browse/SPARK-3173 and it was fixed after
1.1 release.
On Mon, Oct 13, 2014 at 11:24 AM, Mohammed Guller moham...@glassbeam.com
wrote:
Good guess, but that is not the reason. Look at this code:
Hi,
I am using SparkSQL 1.1.0.
Actually, I have a table as following:
root
|-- account_id: string (nullable = false)
|-- Birthday: string (nullable = true)
|-- preferstore: string (nullable = true)
|-- registstore: string (nullable = true)
|-- gender: string (nullable = true)
|--
How can we read all parquet files in a directory in spark-sql. We are
following this example which shows a way to read one file:
// Read in the parquet file created above. Parquet files are
self-describing so the schema is preserved.// The result of loading a
Parquet file is also a SchemaRDD.val
Hi all,
A quick question on SparkSql *SELECT* syntax.
Does it support queries like:
*SELECT t1.*, t2.d, t2.e FROM t1 LEFT JOIN t2 on t1.a = t2.a*
It always ends with the exception:
*Exception in thread main java.lang.RuntimeException: [2.12] failure:
string literal expected
SELECT t1.*,
Right now I believe the only supported option is to pass a comma-delimited
list of paths.
I've opened SPARK-3928: Support wildcard matches on Parquet files
https://issues.apache.org/jira/browse/SPARK-3928 to request this feature.
Nick
On Mon, Oct 13, 2014 at 12:21 PM, Sadhan Sood
SPARK-3928: Support wildcard matches on Parquet files
https://issues.apache.org/jira/browse/SPARK-3928
On Wed, Sep 24, 2014 at 2:14 PM, Michael Armbrust mich...@databricks.com
wrote:
We could certainly do this. The comma separated support is something I
added.
On Wed, Sep 24, 2014 at 10:20
That explains it. Thanks!
Mohammed
From: Yin Huai [mailto:huaiyin@gmail.com]
Sent: Monday, October 13, 2014 8:47 AM
To: Mohammed Guller
Cc: Cheng, Hao; Cheng Lian; user@spark.apache.org
Subject: Re: Spark SQL parser bug?
Yeah, it is not related to timezone. I think you hit this
Hi
I am trying to access files/buckets in S3 and encountering a permissions
issue. The buckets are configured to authenticate using an IAMRole provider.
I have set the KeyId and Secret using environment variables (
AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still unable to
access
I got this error when trying to perform PCA on a sparse matrix, each row
has a nominal length of 8000, and there are 36k rows. each row has on
average 3 elements being non-zero.
I guess the total size is not that big.
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at
Maybe I should put this another way. If spark has two jobs, A and B, both
of which consume the entire allocated memory pool, is it expected that
spark can launch B before the executor processes tied to A are completely
terminated?
On Thu, Oct 9, 2014 at 6:57 PM, Keith Simmons ke...@pulse.io
The Gramian is 8000 x 8000, dense, and full of 8-byte doubles. It's
symmetric so can get away with storing it in ~256MB. The catch is that
it's going to send around copies of this 256MB array. You may easily
be running your driver out of memory given all the overheads and
copies, or your
Is there any plan to support windowing queries? I know that Shark
supported it in its last release and expected it to be already included.
Someone from redhat is working on this. Unclear if it will make the 1.2
release.
Its not on the roadmap for 1.2. I'd suggest opening a JIRA.
On Mon, Oct 13, 2014 at 4:28 AM, Pierre B
pierre.borckm...@realimpactanalytics.com wrote:
Is it planned in a near future ?
--
View this message in context:
This conversion is done implicitly anytime you use a string column in an
operation with a numeric column. If you run explain on your query you
should see the cast that is inserted. This is intentional and based on the
type semantics of Apache Hive.
On Mon, Oct 13, 2014 at 9:03 AM, invkrh
If you are running a version 1.1 you can create external parquet tables.
I'd recommend setting spark.sql.hive.convertMetastoreParquet=true. Here's a
helper function to do it automatically:
/**
* Sugar for creating a Hive external table from a parquet path.
*/
def createParquetTable(name:
Hi guys,
I am trying just parse out values from a CSV, everything is a numeric
(Double) value, and the input text CSV data is about 1.3 GB in size.
When inspect the Java Heap space used by SparkSubmit using JVisualiser VM,
I end up eating up 8GB of memory! Moreover, by inspecting the
Turned out it was caused by this issue:
https://issues.apache.org/jira/browse/SPARK-3923
Set spark.akka.heartbeat.interval to 100 solved it.
Jianshi
On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hmm... it failed again, just lasted a little bit longer.
Jianshi
A CSV element like 3.2, takes 4 bytes as text on disk, but, as a
Double will always take 8 bytes. Is your input like this? that could
explain it.
You can map to Float in this case to halve the memory, if that works
for your use case. This is just kind of how Strings and floating-point
work in the
Cross posting an interesting question on Stack Overflow
http://stackoverflow.com/questions/26321947/multipart-uploads-to-amazon-s3-from-apache-spark
.
Nick
--
View this message in context:
One thing made me very confused during debuggin is the error message. The
important one
WARN ReliableDeliverySupervisor: Association with remote system
[akka.tcp://sparkDriver@xxx:50278] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
is of Log Level WARN.
Jianshi
Currently, printing (toString) gives a human-readable version of the tree,
but it is not a format which is easy to save and load. That sort of
serialization is in the works, but not available for trees right now.
(Note that the current master actually has toString (for a short summary
of the
Not directly related, but FWIW, EMR seems to back away from s3n usage:
Previously, Amazon EMR used the S3 Native FileSystem with the URI scheme,
s3n. While this still works, we recommend that you use the s3 URI scheme
for the best performance, security, and reliability.
Oh, that's a straight reversal from their position up until earlier this
year
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p5485.html
.
Was there an announcement explaining the change in recommendation?
Nick
On Mon, Oct 13, 2014 at 4:54 PM, Daniil
Hi guys,
Just starting out with Spark and following through a few tutorials, it seems
the easiest way to get ones source data into an RDD is using the
sc.parallelize function. Unfortunately, my local data is in multiple
instances of MapK,V types, and the parallelize function only works on
objects
Is there a way to specify a request header during the
sparkContext.textFile call?
- Ranga
On Mon, Oct 13, 2014 at 11:03 AM, Ranga sra...@gmail.com wrote:
Hi
I am trying to access files/buckets in S3 and encountering a permissions
issue. The buckets are configured to authenticate using an
The biggest scaling issue was supporting a large number of reduce tasks
efficiently, which the JIRAs in that post handle. In particular, our current
default shuffle (the hash-based one) has each map task open a separate file
output stream for each reduce task, which wastes a lot of memory
Thank you Sean. Moving over my data types from Double to Float was an
(obvious) big win, and I discovered one more good optimization from the
Tuning section --
I modified my original code to call .persist(MEMORY_ONLY_SER) from the
FIRST import of the data, and I pass in --conf
Hello,
After reading the following pages:
https://spark.apache.org/docs/latest/building-with-maven.html
http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
I came up with this build command:
mvn -Dhadoop.version=1.0.3 -DskipTests -Pdeb -Phadoop-1.0.3 -Phive
Yes, there is no hadoop-1.0.3 profile. You can look at the parent
pom.xml to see the profiles that exist. It doesn't mean 1.0.3 doesn't
work, just that there is nothing specific to activate for this version
range. I don't think the docs suggest this profile exists. You don't
need any extra
For now, with SparkSPARK-3462 parquet pushdown for unionAll PR, you
can do the following for unionAll schemaRDD.
val files = Array(hdfs://file1.parquet, hdfs://file2.parquet,
hdfs://file3.parquet)
val rdds = paths.map(hc.parquetFile(_))
val unionedRDD = {
var temp = rdds(0)
for (i
Hi Sean,
Thanks for the quick response. I'll give that a try. I'm still a little
concerned that yarn support will be built into my target assembly. Are you
aware of something I can check after the build competes to be sure that
Spark doesn't look for Yarn during runtime?
Thanks,
-Chris
--
I am currently using Spark 1.1.0 that has been compiled against Hadoop 2.3. Our
cluster is CDH5.1.2 which is runs Hive 0.12. I have two external Hive tables
that point to Parquet (compressed with Snappy), which were converted over from
Avro if that matters.
I am trying to perform a join with
is the following what you are looking for?
scala sc.parallelize(myMap.map{ case (k,v) = (k,v) }.toSeq)
res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at
parallelize at console:21
2014-10-13 14:02 GMT-07:00 jon.g.massey jon.g.mas...@gmail.com:
Hi guys,
Just
Hi All,I am doing a POC and written a Job in java. so the architecture has
kafka and spark.Now i want a process to notify me whenever system performance
is getting down or in crunch of resources, like CPU or RAM. I understand
org.apache.spark.streaming.scheduler.StreamingListener, but it has
Map.toSeq already does that even. You can skip the map. You can put
together Maps with ++ too. You should have an RDD of pairs then, but
to get the special RDD functions you're looking for remember to import
SparkContext._
On Mon, Oct 13, 2014 at 10:58 PM, Stephen Boesch java...@gmail.com wrote:
In Hive, the table was created with custom SerDe, in the following way.
row format serde abc.ProtobufSerDe
with serdeproperties (serialization.class=
abc.protobuf.generated.LogA$log_a)
When I start spark-sql shell, I always got the following exception, even
for a simple query.
select user from
(Copying the user list)
You should use spark_ec2 script to configure the cluster. If you use trunk
version you can use the new --copy-aws-credentials option to configure the
S3 parameters automatically, otherwise either include them in your
SparkConf variable or add them to
Thanks Nick, DB - that was helpful.
On Mon, Oct 13, 2014 at 5:44 PM, DB Tsai dbt...@dbtsai.com wrote:
For now, with SparkSPARK-3462 parquet pushdown for unionAll PR, you
can do the following for unionAll schemaRDD.
val files = Array(hdfs://file1.parquet, hdfs://file2.parquet,
Hi,
I am new to scala/graphx and am having problems converting a tsv file to a
graph. I have a flat tab separated file like below:
n1 P1 n2
n3 P1 n4
n2 P2 n3
n3 P2 n1
n1 P3 n4
n3 P3 n2
where n1,n2,n3,n4 are the nodes of the graph and R1,P2,P3 are the
properties which should form the edges
Thank you for the details! Would you mind speaking to what tools proved
most useful as far as identifying bottlenecks or bugs? Thanks again.
On Oct 13, 2014 5:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
The biggest scaling issue was supporting a large number of reduce tasks
efficiently,
Hi Daniil
Could you provide some more details on how the cluster should be
launched/configured? The EC2 instance that I am dealing with uses the
concept of IAMRoles. I don't have any keyfile to specify to the spark-ec2
script.
Thanks for your help.
- Ranga
On Mon, Oct 13, 2014 at 3:04 PM,
Well done guys. MapReduce sort at that time was a good feat and Spark now
has raised the bar with the ability to sort a PB.
Like some of the folks in the list, a summary of what worked (and didn't)
as well as the monitoring practices would be good.
Cheers
k/
P.S: What are you folks planning next ?
At 2014-10-13 18:22:44 -0400, Soumitra Siddharth Johri
soumitra.siddha...@gmail.com wrote:
I have a flat tab separated file like below:
[...]
where n1,n2,n3,n4 are the nodes of the graph and R1,P2,P3 are the
properties which should form the edges between the nodes.
How can I construct a
There are some known bug with the parquet serde and spark 1.1.
You can try setting spark.sql.hive.convertMetastoreParquet=true to cause
spark sql to use built in parquet support when the serde looks like parquet.
On Mon, Oct 13, 2014 at 2:57 PM, Terry Siu terry@smartfocus.com wrote:
I am
Hi Guys,
Spark rookie here. I am getting a file not found exception on the --jars.
This is on the yarn cluster mode and I am running the following command on
our recently upgraded Spark 1.1.1 environment.
./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class
myEngine
Having the exact same error with the exact same jar Do you work for
Altiscale? :)
J
Sent from my iPhone
On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote:
Hi Guys,
Spark rookie here. I am getting a file not found exception on the --jars.
This is on the yarn
BTW this has always worked for me before until we upgraded the cluster to
Spark 1.1.1...
J
ᐧ
*JIMMY MCERLAIN*
DATA SCIENTIST (NERD)
*. . . . . . . . . . . . . . . . . .*
*IF WE CAN’T DOUBLE YOUR SALES,*
*ONE OF US IS IN THE WRONG BUSINESS.*
*E*: ji...@sellpoints.com
*M*:
Well in the cluster, can you try copying the entire folder and then run?
For example my home folder say helloWorld consists of the src, target etc.
can you copy the entire folder in the cluster ? I doubt it is looking for
some dependencies and is missing that when it runs your jar file.
or if you
Or if it has something to do with the way you package your files - try
another alternative method and see if it works
On Monday, October 13, 2014, HARIPRIYA AYYALASOMAYAJULA
aharipriy...@gmail.com wrote:
Well in the cluster, can you try copying the entire folder and then run?
For example my
Thanks Akhil. Both ways work for me, but I'd like to know why that
exception was thrown. The class HBaseApp and related class were all
contained in my application jar, why was *com.xt.scala.HBaseApp$$*
*anonfun$testHBase$1* not found ?
2014-10-13 14:53 GMT+08:00 Akhil Das
Sean,
thanks, I didn't know about repartitionAndSortWithinPartitions, that seems
very helpful!
Tobias
Hello
I'm trying to connect kafka in spark shell, but failed as below. Could you
take a look what I missed.
scala val kafkaStream = KafkaUtils.createStream(ssc,
test-vip.snc1:2181, test_spark, Map(user-test-1))
error: bad symbolic reference. A signature in KafkaUtils.class refers to
term
Hi,
I posted a query yesterday and have tried out all the options given in
responses..
Basically, I am reading a very fat matrix (2000 by 50 dimension matrix)
and am trying to run kmeans on it.
I keep on getting heap error..
Now, I am even using persist(StorageLevel.DISK_ONLY_2) option..
At 2014-10-13 21:08:15 -0400, Soumitra Johri soumitra.siddha...@gmail.com
wrote:
There is no 'long' field in my file. So when I form the edge I get a type
mismatch error. Is it mandatory for GraphX that every vertex should have a
distinct id. ?
in my case n1,n2,n3,n4 are all strings.
(+user
90 matches
Mail list logo