ut in a test. This highly depends
> on the data and the analysis you want to do.
>
> > On 21. Feb 2018, at 21:54, Kane Kim <kane.ist...@gmail.com> wrote:
> >
> > Hello,
> >
> > Which format is better supported in spark, parquet or orc?
> > Will spark use i
Hello,
Which format is better supported in spark, parquet or orc?
Will spark use internal sorting of parquet/orc files (and how to test that)?
Can spark save sorted parquet/orc files?
Thanks!
it as:
telnet s3.amazonaws.com 80
GET / HTTP/1.0
[image: Inline image 1]
Thanks
Best Regards
On Wed, Feb 11, 2015 at 6:43 AM, Kane Kim kane.ist...@gmail.com wrote:
I'm getting this warning when using s3 input:
15/02/11 00:58:37 WARN RestStorageService: Adjusted time offset in
response
it is skewed.
cheers
On Fri, Feb 13, 2015 at 5:51 AM, Kane Kim kane.ist...@gmail.com wrote:
The thing is that my time is perfectly valid...
On Tue, Feb 10, 2015 at 10:50 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its with the timezone actually, you can either use an NTP to maintain
accurate
sometimes I'm getting this exception:
Traceback (most recent call last):
File /opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py, line 162,
in manager
code = worker(sock)
File /opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py, line 64,
in worker
outfile.flush()
IOError:
I'm getting this warning when using s3 input:
15/02/11 00:58:37 WARN RestStorageService: Adjusted time offset in response
to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the
time by approximately 0 seconds. Retrying connection.
After that there are tons of 403/forbidden
Found it - used saveAsHadoopFile
On Mon, Feb 9, 2015 at 9:11 AM, Kane Kim kane.ist...@gmail.com wrote:
Hi, how to compress output with gzip using python api?
Thanks!
I'm getting SequenceFile doesn't work with GzipCodec without native-hadoop
code! Where to get those libs and where to put it in the spark?
Also can I save plain text file (like saveAsTextFile) as gzip?
Thanks.
On Wed, Feb 4, 2015 at 11:10 PM, Kane Kim kane.ist...@gmail.com wrote:
How to save
cluster and got odd results
for stopping the workers (no workers found) but the start script... seemed
to work. My integration cluster was running and functioning after executing
both scripts, but I also didn't make any changes to spark-env either.
On Thu Feb 05 2015 at 9:49:49 PM Kane Kim
I submit spark job from machine behind firewall, I can't open any incoming
connections to that box, does driver absolutely need to accept incoming
connections? Is there any workaround for that case?
Thanks.
Hi,
I'm trying to change setting as described here:
http://spark.apache.org/docs/1.2.0/ec2-scripts.html
export SPARK_WORKER_CORES=6
Then I ran ~/spark-ec2/copy-dir /root/spark/conf to distribute to slaves,
but without any effect. Do I have to restart workers?
How to do that with spark-ec2?
How to save RDD with gzip compression?
Thanks.
I'm trying to process a large dataset, mapping/filtering works ok, but
as long as I try to reduceByKey, I get out of memory errors:
http://pastebin.com/70M5d0Bn
Any ideas how I can fix that?
Thanks.
-
To unsubscribe, e-mail:
I'm trying to process 5TB of data, not doing anything fancy, just
map/filter and reduceByKey. Spent whole day today trying to get it
processed, but never succeeded. I've tried to deploy to ec2 with the
script provided with spark on pretty beefy machines (100 r3.2xlarge
nodes). Really frustrated
Related question - is execution of different stages optimized? I.e.
map followed by a filter will require 2 loops or they will be combined
into single one?
On Tue, Jan 20, 2015 at 4:33 AM, Bob Tiernay btier...@hotmail.com wrote:
I found the following to be a good discussion of the same topic:
I want to add some java options when submitting application:
--conf spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
But looks like it doesn't get set. Where I can add it to make it working?
Thanks.
16 matches
Mail list logo