pressed to true. This property is
> already set to true by default in master branch and branch-1.2.
>
> On 11/13/14 7:16 AM, Sadhan Sood wrote:
>
> We noticed while caching data from our hive tables which contain data
> in compressed sequence file format that it gets uncompresse
We noticed while caching data from our hive tables which contain data in
compressed sequence file format that it gets uncompressed in memory when
getting cached. Is there a way to turn this off and cache the compressed
data as is ?
output location for shuffle 0
The data is lzo compressed sequence file with compressed size ~ 26G. Is
there a way to understand why shuffle keeps failing for one partition. I
believe we have enough memory to store the uncompressed data in memory.
On Wed, Nov 12, 2014 at 2:50 PM, Sadhan Sood wrote
lerBackend
(Logging.scala:logError(75)) - Asked to remove non-existent executor 372
2014-11-12 19:11:21,655 INFO scheduler.DAGScheduler
(Logging.scala:logInfo(59)) - Executor lost: 372 (epoch 3)
On Wed, Nov 12, 2014 at 12:31 PM, Sadhan Sood wrote:
> We are running spark on yarn with combined mem
We are running spark on yarn with combined memory > 1TB and when trying to
cache a table partition(which is < 100G), seeing a lot of failed collect
stages in the UI and this never succeeds. Because of the failed collect, it
seems like the mapPartitions keep getting resubmitted. We have more than
en
While testing SparkSQL on top of our Hive metastore, we were trying to
cache the data for one partition of the table in memory like this:
CACHE TABLE xyz_20141029 AS SELECT * FROM xyz where date_prefix = 20141029
Table xyz is a hive table which is partitioned with date_prefix. The data
is date_pr
port. I guess the Thrift server didn't start
> successfully because the HiveServer2 occupied the port, and your Beeline
> session was probably linked against HiveServer2.
>
> Cheng
>
>
> On 11/11/14 8:29 AM, Sadhan Sood wrote:
>
> I was testing out the spark thrift j
I was testing out the spark thrift jdbc server by running a simple query in
the beeline client. The spark itself is running on a yarn cluster.
However, when I run a query in beeline -> I see no running jobs in the
spark UI(completely empty) and the yarn UI seem to indicate that the
submitted query
. Based on the
> > Jenkins logs, I think that this pull request may have broken things
> > (although I'm not sure why):
> >
> > https://github.com/apache/spark/pull/3030#issuecomment-62436181
> >
> > On Mon, Nov 10, 2014 at 1:42 PM, Sadhan Sood
> wrote:
&
Getting an exception while trying to build spark in spark-core:
[ERROR]
while compiling:
/Users/dev/tellapart_spark/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
during phase: typer
library version: version 2.10.4
compiler version: version 2.10.4
reconstruct
We want to run multiple instances of spark sql cli on our yarn cluster.
Each instance of the cli is to be used by a different user. This looks
non-optimal if each user brings up a different cli given how spark works on
yarn by running executor processes (and hence consuming resources) on
worker nod
We want to run multiple instances of spark sql cli on our yarn cluster.
Each instance of the cli is to be used by a different user. This would be
non-optimal if each user brings up a different cli given how spark works on
yarn by running executor processes (and hence consuming resources) on
worker
12 matches
Mail list logo