date:20161219

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit

Tim, We will try to run the application in coarse grain mode, and share the findings with you. Regards Sumit Chawla On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen wrote: > Dynamic allocation works with Coarse grain mode only, we wasn't aware > a need for Fine grain mode after we enabled dynami

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen

Dynamic allocation works with Coarse grain mode only, we wasn't aware a need for Fine grain mode after we enabled dynamic allocation support on the coarse grain mode. What's the reason you're running fine grain mode instead of coarse grain + dynamic allocation? Tim On Mon, Dec 19, 2016 at 2:45 P

Loading a class from a dependency jar

2016-12-19 Thread viraj

Hi, I am currently using the kite library(https://github.com/kite-sdk/kite) to persist to HBase from my Spark Job. All this happens in the driver. I am on version 1.6.1 on spark. The problem I am facing is that a particular class in one of the dependency jars is not found by kite when it uses Cl

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Mehdi Meziane

We will be interested by the results if you give a try to Dynamic allocation with mesos ! - Mail Original - De: "Michael Gummelt" À: "Sumit Chawla" Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User" , d...@spark.apache.org Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00

Re: PySpark: [Errno 8] nodename nor servname provided, or not known

2016-12-19 Thread Jain, Nishit

Found it. Some how my host mapping was messing it up. Changing it to point to localhost worked.: /etc/host #127.0.0.1 XX.com 127.0.0.1 localhost From: "Jain, Nishit" mailto:nja...@underarmour.com>> Date: Monday, December 19, 2016 at 2:54 PM To: "user@spark.apache.org

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt

> Is this problem of idle executors sticking around solved in Dynamic Resource Allocation? Is there some timeout after which Idle executors can just shutdown and cleanup its resources. Yes, that's exactly what dynamic allocation does. But again I have no idea what the state of dynamic allocation

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit

Great. Makes much better sense now. What will be reason to have spark.mesos.mesosExecutor.cores more than 1, as this number doesn't include the number of cores for tasks. So in my case it seems like 30 CPUs are allocated to executors. And there are 48 tasks so 48 + 30 = 78 CPUs. And i am noti

PySpark: [Errno 8] nodename nor servname provided, or not known

2016-12-19 Thread Jain, Nishit

Hi, I am using pre-built 'spark-2.0.1-bin-hadoop2.7’ and when I try to start pyspark, I get following message. Any ideas what could be wrong? I tried using python3, setting SPARK_LOCAL_IP to 127.0.0.1 but same error. ~ -> cd /Applications/spark-2.0.1-bin-hadoop2.7/bin/ /Applications/spark-2.0.

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt

> I should preassume that No of executors should be less than number of tasks. No. Each executor runs 0 or more tasks. Each executor consumes 1 CPU, and each task running on that executor consumes another CPU. You can customize this via spark.mesos.mesosExecutor.cores ( https://github.com/apac

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit

Ah thanks. looks like i skipped reading this *"Neither will executors terminate when they’re idle."* So in my job scenario, I should preassume that No of executors should be less than number of tasks. Ideally one executor should execute 1 or more tasks. But i am observing something strange inste

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen

Hi Chawla, One possible reason is that Mesos fine grain mode also takes up cores to run the executor per host, so if you have 20 agents running Fine grained executor it will take up 20 cores while it's still running. Tim On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit wrote: > Hi > > I am using S

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt

Yea, the idea is to use dynamic allocation. I can't speak to how well it works with Mesos, though. On Mon, Dec 19, 2016 at 11:01 AM, Mehdi Meziane wrote: > I think that what you are looking for is Dynamic resource allocation: > http://spark.apache.org/docs/latest/job-scheduling.html# > dynamic-

Re: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Michael Stratton

I don't think the issue is an empty partition, but it may not hurt to try a repartition prior to writing just to rule it out due to the premature EOF exception. On Mon, Dec 19, 2016 at 1:53 PM, Joseph Naegele wrote: > Thanks Michael, hdfs dfsadmin -report tells me: > > > > Configured Capacity: 7

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Mehdi Meziane

I think that what you are looking for is Dynamic resource allocation: http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your applic

Re: Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

2016-12-19 Thread Sergey B.

I have a asked a similar question here http://stackoverflow.com/questions/40701518/spark-2-0-redefining-sparksession-params-through-getorcreate-and-not-seeing-cha Please see the answer, basically stating that it's impossible to change Session config as soon as it was initiated On Mon, Dec 19, 20

RE: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Joseph Naegele

Thanks Michael, hdfs dfsadmin -report tells me: Configured Capacity: 7999424823296 (7.28 TB) Present Capacity: 7997657774971 (7.27 TB) DFS Remaining: 7959091768187 (7.24 TB) DFS Used: 38566006784 (35.92 GB) DFS Used%: 0.48% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missi

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Chawla,Sumit

But coarse grained does the exact same thing which i am trying to avert here. At the cost of lower startup, it keeps the resources reserved till the entire duration of the job. Regards Sumit Chawla On Mon, Dec 19, 2016 at 10:06 AM, Michael Gummelt wrote: > Hi > > I don't have a lot of experie

Pivot in Spark with Case and when

2016-12-19 Thread KhajaAsmath Mohammed

Hi , I am trying to convert sample of hive code into spark sql for better performance. below is part of Hive query that needs to be converted to Spark SQL. All the data is grouped on particular column(id) and max value(value column) is taken for that particular grouped column(id) and pivoted out

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Michael Gummelt

Hi I don't have a lot of experience with the fine-grained scheduler. It's deprecated and fairly old now. CPUs should be relinquished as tasks complete, so I'm not sure why you're seeing what you're seeing. There have been a few discussions on the spark list regarding deprecating the fine-graine

Re: Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

2016-12-19 Thread Venkata Naidu

We can create a link in the spark conf directory to point hive.conf file of hive installation I believe. Thanks, Venkat. On Mon, Dec 19, 2016, 10:58 AM apu wrote: > This is for Spark 2.0: > > If I wanted Hive support on a new SparkSession, I would build it with: > > spark = SparkSession \ >

Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

2016-12-19 Thread apu

This is for Spark 2.0: If I wanted Hive support on a new SparkSession, I would build it with: spark = SparkSession \ .builder \ .enableHiveSupport() \ .getOrCreate() However, PySpark already creates a SparkSession for me, which appears to lack HiveSupport. How can I either: (a) Add

Re: Reference External Variables in Map Function (Inner class)

2016-12-19 Thread mbayebabacar

Hello Marcelo, Finally what was the solution, I face the same problem. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reference-External-Variables-in-Map-Function-Inner-class-tp11990p28237.html Sent from the Apache Spark User List mailing list arc

Re: [Spark SQL] Task failed while writing rows

2016-12-19 Thread Michael Stratton

It seems like an issue w/ Hadoop. What do you get when you run hdfs dfsadmin -report? Anecdotally(And w/o specifics as it has been a while), I've generally used Parquet instead of ORC as I've gotten a bunch of random problems reading and writing ORC w/ Spark... but given ORC performs a lot better

Re: Spark SQL Syntax

2016-12-19 Thread A Shaikh

I use pyspark on Spark 2. I used Oracle, Postgres syntax just to get back "unhappy response". I do get it some of it resolved after some searching but that consumes a lot of my time, having a platform to test my SQL Syntax and its results would be very helpful. On 19 December 2016 at 14:00, Rames

stratified sampling scales poorly

2016-12-19 Thread Martin Le

Hi all, I perform sampling on a DStream by taking samples from RDDs in the DStream. I have used two sampling mechanisms: simple random sampling and stratified sampling. Simple random sampling: inputStream.transform(x => x.sample(false, fraction)). Stratified sampling: inputStream.transform(x =>

Spark SQL Syntax

2016-12-19 Thread A Shaikh

HI, I keep getting Spark SQL Syntax invalid especially for Dates/Timestamps manipulation. What's the best way to test SQL Syntax in Spark Dataframe is valid? Any online site for test or run a demo SQL! Thanks, Afzal

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Rabin Banerjee

Thanks , It worked !! On Mon, Dec 19, 2016 at 5:55 PM, Dhaval Modi wrote: > > Replace with ":" > > Regards, > Dhaval Modi > > On 19 December 2016 at 13:10, Rabin Banerjee > wrote: > >> HI All, >> >> I am trying to save data from Spark into HBase using saveHadoopDataSet >> API . Please refer

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Dhaval Modi

Replace with ":" Regards, Dhaval Modi On 19 December 2016 at 13:10, Rabin Banerjee wrote: > HI All, > > I am trying to save data from Spark into HBase using saveHadoopDataSet > API . Please refer the below code . Code is working fine .But the table is > getting stored in the default namespac

How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

2016-12-19 Thread Rabin Banerjee

HI All, I am trying to save data from Spark into HBase using saveHadoopDataSet API . Please refer the below code . Code is working fine .But the table is getting stored in the default namespace.how to set the NameSpace in the below code? wordCounts.foreachRDD ( rdd => { val conf = HBaseCon

Re: What is the deployment model for Spark Streaming? A specific example.

2016-12-19 Thread Eike von Seggern

Hi, are you using Spark 2.0.*? Then it might be related to https://issues.apache.org/jira/browse/SPARK-18281 . Best Eike 2016-12-18 6:21 GMT+01:00 Russell Jurney : > Anyone? This is for a book, so I need to figure this out. > > On Fri, Dec 16, 2016 at 12:53 AM Russell Jurney > wrote: > >> I h

Re: Reading xls and xlsx files

2016-12-19 Thread Jörn Franke

I am currently developing one https://github.com/ZuInnoTe/hadoopoffice It contains working source code, but a release will likely be only beginning of the year (will include a Spark data source, but the existing source code can be used without issues in a Spark application). > On 19 Dec 2016,

Reading xls and xlsx files

2016-12-19 Thread Selvam Raman

Hi, Is there a way to read xls and xlsx files using spark?. is there any hadoop inputformat available to read xls and xlsx files which could be used in spark? -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: How to perform Join operation using JAVARDD

2016-12-19 Thread ayan guha

What's your desired output? On Sat., 17 Dec. 2016 at 9:50 pm, Sree Eedupuganti wrote: > I tried like this, > > *CrashData_1.csv:* > > *CRASH_KEYCRASH_NUMBER CRASH_DATECRASH_MONTH* > *2016899114 2016899114 01/02/2016 12:00:00 > AM +* > > *CrashData_2

Re: How to get recent value in spark dataframe

2016-12-19 Thread ayan guha

You have 2 parts to it 1. Do a sub query where for each primary key derive latest value of flag=1 records. Ensure you get exactly 1 record per primary key value. Here you can use rank() over (partition by primary key order by year desc) 2. Join your original dataset with the above on primary key.

Re: Question about Spark and filesystems

2016-12-19 Thread Calvin Jia

Hi, If you are concerned with the performance of the alternative filesystems (ie. needing a caching client), you can use Alluxio on top of any of NFS , Ceph

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Mesos Spark Fine Grained Execution - CPU count

Loading a class from a dependency jar

Re: Mesos Spark Fine Grained Execution - CPU count

Re: PySpark: [Errno 8] nodename nor servname provided, or not known

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Mesos Spark Fine Grained Execution - CPU count

PySpark: [Errno 8] nodename nor servname provided, or not known

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Mesos Spark Fine Grained Execution - CPU count

Re: [Spark SQL] Task failed while writing rows

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

RE: [Spark SQL] Task failed while writing rows

Re: Mesos Spark Fine Grained Execution - CPU count

Pivot in Spark with Case and when

Re: Mesos Spark Fine Grained Execution - CPU count

Re: Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

Adding Hive support to existing SparkSession (or starting PySpark with Hive support)

Re: Reference External Variables in Map Function (Inner class)

Re: [Spark SQL] Task failed while writing rows

Re: Spark SQL Syntax

stratified sampling scales poorly

Spark SQL Syntax

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

Re: How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

How to set NameSpace while storing from Spark to HBase using saveAsNewAPIHadoopDataSet

Re: What is the deployment model for Spark Streaming? A specific example.

Re: Reading xls and xlsx files

Reading xls and xlsx files

Re: How to perform Join operation using JAVARDD

Re: How to get recent value in spark dataframe

Re: Question about Spark and filesystems

35 matches

Site Navigation

Mail list logo

Footer information