Hi Spark users and developers,
Does anyone encounter any issue when a spark SQL job produces a lot of
files (over 1 millions), the job hangs on the refresh method? I'm using
spark 1.5.1. Below is the stack trace. I saw the parquet files are produced
but the driver is doing something very intensive
If you execute the collect step (foreach in 1, possibly reduce in 2) in two
threads in the driver then both of them will be executed in parallel.
Whichever gets submitted to Spark first gets executed first - you can use a
semaphore if you need to ensure the ordering of execution, though I would
ass
I wanted to understand something about the internals of spark streaming
executions.
If I have a stream X, and in my program I send stream X to function A and
function B:
1. In function A, I do a few transform/filter operations etc. on X->Y->Z to
create stream Z. Now I do a forEach Operation on Z
Hi All,
In sql say for example I have table1 (moveid) and table2 (movieid,moviename)
in sql we write something like select moviename ,movieid,count(1) from
table2 inner join table table1 on table1.movieid=table2.moveid group by
, here in sql table1 has only one column where as table 2 has tw
Hi
I am using spark streaming in Java. One of the problems I have is I need to
save twitter status in JSON format as I receive them
When I run the following code on my local machine. It work how ever all the
output files are created in the current directory of the driver program.
Clearly not a g
The code below was introduced by SPARK-7673 / PR #6225
See item #1 in the description of the PR.
Cheers
On Sat, Oct 24, 2015 at 12:59 AM, Koert Kuipers wrote:
> the code that seems to flatMap directories to all the files inside is in
> the private HadoopFsRelation.buildScan:
>
> // First a
Hi,
I have raised a JIRA ( https://issues.apache.org/jira/browse/SPARK-11045)
to track the discussion but also mailing user group .
This Kafka consumer is around for a while in spark-packages (
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer ) and I see
many started using it , I a
better wiki entry https://wiki.apache.org/hadoop/BindException
On 24 Oct 2015, at 00:46, Lin Zhao mailto:l...@exabeam.com>>
wrote:
I have a spark on YARN deployed using Cloudera Manager 5.4. The installation
went smoothly. But when I try to run spark-shell I get a long list of
exceptions saying "failed to bind to: /public_ip_of_host:0" and "Service
'spar
Hi All,
I am trying to wrote an RDD as Sequence file into my Hadoop cluster but
getting connection time out again and again ,I can ping the hadoop cluster
and also directory gets created with the file name i specify ,I believe I am
missing some configuration ,Kindly help me
object WriteSequenceF
the code that seems to flatMap directories to all the files inside is in
the private HadoopFsRelation.buildScan:
// First assumes `input` is a directory path, and tries to get all
files contained in it.
fileStatusCache.leafDirToChildrenFiles.getOrElse(
path,
// Otherwise,
11 matches
Mail list logo