May 2015 13:16, Shailesh Birari sbirar...@gmail.com wrote:
Hi,
I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
of RAM.
I have around 600,000+ Json files on HDFS. Each file is small around 1KB
in
size. Total data is around 16GB. Hadoop block size is 256MB.
My
Hi,
I have a 4 node Spark 1.3.1 cluster. All four nodes have 4 cores and 64 GB
of RAM.
I have around 600,000+ Json files on HDFS. Each file is small around 1KB in
size. Total data is around 16GB. Hadoop block size is 256MB.
My application reads these files with sc.textFile() (or sc.jsonFile()
Hello,
I want to use Spark sql to aggregate some columns of the data.
e.g. I have huge data with some columns as:
time, src, dst, val1, val2
I want to calculate sum(val1) and sum(val2) for all unique pairs of src and
dst.
I tried by forming SQL query
SELECT a.time, a.src, a.dst,
Hi SM,
Apologize for delayed response.
No, the issue is with Spark 1.2.0. There is a bug in Spark 1.2.0.
Recently Spark have latest 1.3.0 release so it might have fixed in it.
I am not planning to test it soon, may be after some time.
You can try for it.
Regards,
Shailesh
--
View this
.
On Sun, Jan 25, 2015 at 6:28 PM, Shailesh Birari sbirar...@gmail.com
wrote:
Can anyone please let me know ?
I don't want to open all ports on n/w. So, am interested in the property
by
which this new port I can configure.
Shailesh
--
View this message in context:
http://apache-spark-user
Can anyone please let me know ?
I don't want to open all ports on n/w. So, am interested in the property by
which this new port I can configure.
Shailesh
--
View this message in context:
Hello,
Recently, I have upgraded my setup to Spark 1.2 from Spark 1.1.
I have 4 node Ubuntu Spark Cluster.
With Spark 1.1, I used to write Spark Scala program in Eclipse on my Windows
development host and submit the job on Ubuntu Cluster, from Eclipse (Windows
machine).
As on my network not all
Hello,
I recently upgraded my setup from Spark 1.1 to Spark 1.2.
My existing applications are working fine on ubuntu cluster.
But, when I try to execute Spark MLlib application from Eclipse (Windows
node) it gives java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions exception.
version from the past few years).
On Tue, Jan 20, 2015 at 6:16 PM, Shailesh Birari sbirar...@gmail.com
wrote:
Hi Frank,
Its a normal eclipse project where I added Scala and Spark libraries as
user libraries.
Though, I am not attaching any hadoop libraries, in my application code I
have
, but if you can pull up
your builds dependency tree, you will likely find com.google.guava being
brought in by one of your dependencies.
Regards,
Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466
On Jan 20, 2015, at 5:13 PM, Shailesh Birari sbirar
like you are mixing versions
of Spark then, with some that still refer to unshaded Guava. Make sure
you are not packaging Spark with your app and that you don't have
other versions lying around.
On Tue, Jan 20, 2015 at 11:55 PM, Shailesh Birari sbirar...@gmail.com
wrote:
Hello,
I recently
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable().
I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than
earlier).
I also tried by changing schema to use Long data type in some fields but
seems conversion takes more time.
Is there any way to specify index ?
Hello,
I have written an Spark SQL application which reads data from HDFS and
query on it.
The data size is around 2GB (30 million records). The schema and query I am
running is as below.
The query takes around 05+ seconds to execute.
I tried by adding
Thanks by setting driver host to Windows and specifying some ports (like
driver, fileserver, broadcast etc..) it worked perfectly. I need to specify
those ports as not all ports are open on my machine.
For, driver host name, I was assuming Spark should get it, as in case of
linux we are not
Can anyone please help me here ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-tp16989p17552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Yes, this is doable.
I am submitting the Spark job using
JavaSparkContext spark = new JavaSparkContext(sparkMaster,
app name, System.getenv(SPARK_HOME),
new String[] {application JAR});
To run this first you have to create the application jar and in above API
specify its
Hello,
I am able to submit Job on Spark cluster from Windows desktop. But the
executors are not able to run.
When I check the Spark UI (which is on Windows, as Driver is there) it shows
me JAVA_HOME, CLASS_PATH and other environment variables related to Windows.
I tried by setting
Some more update.
Now, I tried with by setting spark.driver.host to Spark Master node and
spark.driver.port to 51800 (available open port), but its failing with bind
error. I was hoping that it will start the driver on supplied host:port and
as its unix node there should not be any issue.
Can
Hello,
Spark 1.1.0, Hadoop 2.4.1
I have written a Spark streaming application. And I am getting
FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath).
Here is brief what I am is trying to do.
My application is creating text file stream using Java Stream context. The
input file is
Thanks Sameer for quick reply.
I will try to implement it.
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962p16970.html
Sent from the Apache Spark User List mailing list archive at
Hi Xianguri,
After setting SVD to smaller value (200) its working.
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15179.html
Sent from the Apache Spark User List
Note, the data is random numbers (double).
Any suggestions/pointers will be highly appreciated.
Thanks,
Shailesh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html
Sent from the
22 matches
Mail list logo