from:"Sadhan Sood"

Re: Cache sparkSql data without uncompressing it in memory

2014-11-13 Thread Sadhan Sood

pressed to true. This property is > already set to true by default in master branch and branch-1.2. > > On 11/13/14 7:16 AM, Sadhan Sood wrote: > > We noticed while caching data from our hive tables which contain data > in compressed sequence file format that it gets uncompresse

Cache sparkSql data without uncompressing it in memory

2014-11-12 Thread Sadhan Sood

We noticed while caching data from our hive tables which contain data in compressed sequence file format that it gets uncompressed in memory when getting cached. Is there a way to turn this off and cache the compressed data as is ?

Re: Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood

output location for shuffle 0 The data is lzo compressed sequence file with compressed size ~ 26G. Is there a way to understand why shuffle keeps failing for one partition. I believe we have enough memory to store the uncompressed data in memory. On Wed, Nov 12, 2014 at 2:50 PM, Sadhan Sood wrote

Re: Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood

lerBackend (Logging.scala:logError(75)) - Asked to remove non-existent executor 372 2014-11-12 19:11:21,655 INFO scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Executor lost: 372 (epoch 3) On Wed, Nov 12, 2014 at 12:31 PM, Sadhan Sood wrote: > We are running spark on yarn with combined mem

Too many failed collects when trying to cache a table in SparkSQL

2014-11-12 Thread Sadhan Sood

We are running spark on yarn with combined memory > 1TB and when trying to cache a table partition(which is < 100G), seeing a lot of failed collect stages in the UI and this never succeeds. Because of the failed collect, it seems like the mapPartitions keep getting resubmitted. We have more than en

Partition caching taking too long

2014-11-11 Thread Sadhan Sood

While testing SparkSQL on top of our Hive metastore, we were trying to cache the data for one partition of the table in memory like this: CACHE TABLE xyz_20141029 AS SELECT * FROM xyz where date_prefix = 20141029 Table xyz is a hive table which is partitioned with date_prefix. The data is date_pr

Re: thrift jdbc server probably running queries as hive query

2014-11-11 Thread Sadhan Sood

port. I guess the Thrift server didn't start > successfully because the HiveServer2 occupied the port, and your Beeline > session was probably linked against HiveServer2. > > Cheng > > > On 11/11/14 8:29 AM, Sadhan Sood wrote: > > I was testing out the spark thrift j

thrift jdbc server probably running queries as hive query

2014-11-10 Thread Sadhan Sood

I was testing out the spark thrift jdbc server by running a simple query in the beeline client. The spark itself is running on a yarn cluster. However, when I run a query in beeline -> I see no running jobs in the spark UI(completely empty) and the yarn UI seem to indicate that the submitted query

Re: getting exception when trying to build spark from master

2014-11-10 Thread Sadhan Sood

. Based on the > > Jenkins logs, I think that this pull request may have broken things > > (although I'm not sure why): > > > > https://github.com/apache/spark/pull/3030#issuecomment-62436181 > > > > On Mon, Nov 10, 2014 at 1:42 PM, Sadhan Sood > wrote: &

getting exception when trying to build spark from master

2014-11-10 Thread Sadhan Sood

Getting an exception while trying to build spark in spark-core: [ERROR] while compiling: /Users/dev/tellapart_spark/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 reconstruct

Fwd: Sharing spark context across multiple spark sql cli initializations

2014-10-22 Thread Sadhan Sood

We want to run multiple instances of spark sql cli on our yarn cluster. Each instance of the cli is to be used by a different user. This looks non-optimal if each user brings up a different cli given how spark works on yarn by running executor processes (and hence consuming resources) on worker nod

Sharing spark context across multiple spark sql cli initializations

2014-10-22 Thread Sadhan Sood

We want to run multiple instances of spark sql cli on our yarn cluster. Each instance of the cli is to be used by a different user. This would be non-optimal if each user brings up a different cli given how spark works on yarn by running executor processes (and hence consuming resources) on worker

Re: Cache sparkSql data without uncompressing it in memory

Cache sparkSql data without uncompressing it in memory

Re: Too many failed collects when trying to cache a table in SparkSQL

Re: Too many failed collects when trying to cache a table in SparkSQL

Too many failed collects when trying to cache a table in SparkSQL

Partition caching taking too long

Re: thrift jdbc server probably running queries as hive query

thrift jdbc server probably running queries as hive query

Re: getting exception when trying to build spark from master

getting exception when trying to build spark from master

Fwd: Sharing spark context across multiple spark sql cli initializations

Sharing spark context across multiple spark sql cli initializations

12 matches

Site Navigation

Mail list logo

Footer information