Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Mich Talebzadeh
simultaneously There are two distinct points here. Using Spark as a query engine. That is BAU and most forum members use it everyday. You run Spark with either Standalone, Yarn or Mesos as Cluster managers. You start master that does the management of resources and you start slaves to create workers

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-22 Thread Timur Shenkao
production environment when several people make different queries simultaneously? It's impossible to restart Spark masters, workers several tines a day, tune it constantly. On Mon, May 23, 2016 at 2:42 AM, Mich Talebzadeh wrote: > Hi, > > > > I have done a number of extensive tes

Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-22 Thread Mich Talebzadeh
Hi, I have done a number of extensive tests using Spark-shell with Hive DB and ORC tables. Now one issue that we typically face is and I quote: Spark is fast as it uses Memory and DAG. Great but when we save data it is not fast enough OK but there is a solution now. If you use Spark with

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
DECIMAL(20,2)) , CAST(REGEXP_REPLACE(vat,'[^\\d\\.]','') AS DECIMAL(20,2)) , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) FROM stg_t2 WHERE --INVOICENUMBER > 0 AND CAST(REGEXP_REPLACE(total,'[

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
CCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
gt;>>> >>>> >>>> On 22 May 2016 at 20:14, Jörn Franke wrote: >>>> >>>>> 14000 partitions seem to be way too many to be performant (except for >>>>> large data sets). How much data does one partition contain? >>>&g

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
ch data does one partition contain? >>>> >>>> > On 22 May 2016, at 09:34, SRK wrote: >>>> > >>>> > Hi, >>>> > >>>> > In my Spark SQL query to insert data, I have around 14,000 partitions >>>

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
g memory issues. How can I insert the >>> data for >>> > 100 partitions at a time to avoid any memory issues? >>> > >>> > >>> > >>> > -- >>> > View this message in context: >>> http://apache-spark-user-list.10015

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
ues. How can I insert the data >> for >> > 100 partitions at a time to avoid any memory issues? >> > >> > >> > >> > -- >> > View this message in context: >

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
; for > > 100 partitions at a time to avoid any memory issues? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Jörn Franke
be causing memory issues. How can I insert the data for > 100 partitions at a time to avoid any memory issues? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Sabarish Sasidharan
016 at 08:34, SRK wrote: >> >>> Hi, >>> >>> In my Spark SQL query to insert data, I have around 14,000 partitions of >>> data which seems to be causing memory issues. How can I insert the data >>> for >>> 100 partiti

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
>>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOA

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
;> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
wrote: >>> >>>> Hi, >>>> >>>> In my Spark SQL query to insert data, I have around 14,000 partitions of >>>> data which seems to be causing memory issues. How can I insert the data >>>> for >>>> 100 partitions at a time

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
; In my Spark SQL query to insert data, I have around 14,000 partitions of >>> data which seems to be causing memory issues. How can I insert the data >>> for >>> 100 partitions at a time to avoid any memory issues? >>> >>> >>> >>> -

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread swetha kasireddy
be causing memory issues. How can I insert the data >> for >> 100 partitions at a time to avoid any memory issues? >> >> >> >> -- >> View this message in context: >> http:

Re: How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread Mich Talebzadeh
t; > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --

How to insert data for 100 partitions at a time using Spark SQL

2016-05-22 Thread SRK
/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Memory issues when trying to insert data in the form of ORC using Spark SQL

2016-05-20 Thread swetha kasireddy
Hi, > > I see some memory issues when trying to insert the data in the form of ORC > using Spark SQL. Please find the query and exception below. Any idea as to > why this is happening? > > sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING, >

Memory issues when trying to insert data in the form of ORC using Spark SQL

2016-05-20 Thread SRK
Hi, I see some memory issues when trying to insert the data in the form of ORC using Spark SQL. Please find the query and exception below. Any idea as to why this is happening? sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING, record STRING) PARTITIONED BY (datePart

Re: [Spark 2.0 state store] Streaming wordcount using spark state store

2016-05-18 Thread Michael Armbrust
ctRecord = new ProducerRecord(Configs.topic,new > Random().nextInt(10), "" , r.toString) > producer.send(productRecord) > } > }) > count+=1 > Thread.sleep(5000); > } > > > Complete code is available here( > https://gi

[Spark 2.0 state store] Streaming wordcount using spark state store

2016-05-18 Thread Shekhar Bansal
tring)          producer.send(productRecord)        }      })      count+=1      Thread.sleep(5000);    } Complete code is available here(https://github.com/zuxqoj/HelloWorld/tree/master/SparkStreamingStateStore/src/main/scala/spark/streaming/statestore/test) I am using spark on yarn in client

How to run hive queries in async mode using spark sql

2016-05-17 Thread Raju Bairishetti
I am using spark sql for running hive queries also. Is there any way to run hive queries in asyc mode using spark sql. Does it return any hive handle or if yes how to get the results from hive handle using spark sql? -- Thanks, Raju Bairishetti, www.lazada.com

Issue with creation of EC2 cluster using spark scripts

2016-05-16 Thread Marco Mistroni
hi all i am experiencing issues when creating ec2 clusters using scripts in hte spark\ec2 directory i launched the following command ./spark-ec2 -k sparkkey -i sparkAccessKey.pem -r us-west2 -s 4 launch MM-Cluster My output is stuck with the following (has been for the last 20 minutes) i am

Re: XML Processing using Spark SQL

2016-05-12 Thread Mail.com
es/92 > > This is about XmlInputFormat.scala and it seems a bit tricky to handle the > case so I left open until now. > > > Thanks! > > > 2016-05-13 5:03 GMT+09:00 Arunkumar Chandrasekar : >> Hello, >> >> Greetings. >> >> I'm trying to pr

Re: XML Processing using Spark SQL

2016-05-12 Thread Hyukjin Kwon
Chandrasekar : > Hello, > > Greetings. > > I'm trying to process a xml file exported from Health Kit application > using Spark SQL for learning purpose. The sample record data is like the > below: > > sourceVersion="9.3" device="<<HKDevice: 0x7

XML Processing using Spark SQL

2016-05-12 Thread Arunkumar Chandrasekar
Hello, Greetings. I'm trying to process a xml file exported from Health Kit application using Spark SQL for learning purpose. The sample record data is like the below: . I want to have the column name of my table as the field value like type, sourceName, sourceVersion and the row en

Submitting Job to YARN-Cluster using Spark Job Server

2016-05-12 Thread ashesh_28
Hi Guys , Does any of you have tried this mechanism before? I am able to run it locally and get the output ..But how do i submit the job to the Yarn-Cluster using Spark-JobServer. Any documentation ? Regards Ashesh -- View this message in context: http://apache-spark-user-list.1001560

Re: Error while running jar using spark-submit on another machine

2016-05-03 Thread nsalian
Thank you for the question. What is different on this machine as compared to the ones where the job succeeded? - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-jar-using-spark-submit-on-another-machine

Re: Save RDD to HDFS using Spark Python API

2016-04-26 Thread Prashant Sharma
What Davies said is correct, second argument is hadoop's output format. Hadoop supports many type of output format's and all of them have their own advantages. Apart from the one specified above, https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html

Re: Save RDD to HDFS using Spark Python API

2016-04-26 Thread Davies Liu
hdfs://192.168.10.130:9000/dev/output/test already exists, so you need to remove it first. On Tue, Apr 26, 2016 at 5:28 AM, Luke Adolph wrote: > Hi, all: > Below is my code: > > from pyspark import * > import re > > def getDateByLine(input_str): > str_pattern = '^\d{4}-\d{2}-\d{2}' > patt

Save RDD to HDFS using Spark Python API

2016-04-26 Thread Luke Adolph
Hi, all: Below is my code: from pyspark import *import re def getDateByLine(input_str): str_pattern = '^\d{4}-\d{2}-\d{2}' pattern = re.compile(str_pattern) match = pattern.match(input_str) if match: return match.group() else: return None file_url = "hdfs://192

Error at starting httpd after the instillation using spark-ec2 script

2016-04-13 Thread Mohed Alibrahim
Dear All, I installed spark 1.6.1 on Amazon EC2 using spark-ec2 script. Everything was OK, but , it failed to start httpd at the end of the installation. I followed exactly the instruction and I repeated the process many times, but there is no luck. - [timing] rstudio setup: 00h

RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-13 Thread ashesh_28
Data.filter(line => line.contains("spark")).count() println("Lines with Hadoop : %s, Lines with Spark: %s".format(numAs, numBs)) } } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-us

RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Hora
ssage- From: "Amit Hora" Sent: ‎4/‎13/‎2016 11:41 AM To: "Jörn Franke" Cc: "user@spark.apache.org" Subject: RE: Unable to Access files in Hadoop HA enabled from using Spark There are DNS entries for both of my namenode Ambarimaster is standby and it resolves to ip per

RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Hora
uot; Sent: ‎4/‎13/‎2016 11:37 AM To: "Amit Singh Hora" Cc: "user@spark.apache.org" Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark Is the host in /etc/hosts ? > On 13 Apr 2016, at 07:28, Amit Singh Hora wrote: > > I am trying to access di

Re: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Jörn Franke
; hadoop and it works > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Singh Hora
This property already exists. -Original Message- From: "ashesh_28 [via Apache Spark User List]" Sent: ‎4/‎13/‎2016 11:02 AM To: "Amit Singh Hora" Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark Try adding the following propert

Re: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread ashesh_28
-Hadoop-HA-enabled-from-using-Spark-tp26768p26769.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Singh Hora
success. Any suggestion will be of a great help Note:- Hadoop HA is working properly as i have tried uploading file to hadoop and it works -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.h

What is the best way to process streaming data from multiple channels simultaneously using Spark 2.0 API's?

2016-04-08 Thread imax
e any reference implementation that I could look at? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-best-way-to-process-streaming-data-from-multiple-channels-simultaneously-using-Spark-2-0-tp26720.html Sent from the Apache Spark User List mailing

Re: Unable to execute query on SAPHANA using SPARK

2016-03-29 Thread Mich Talebzadeh
specialized HW.As a result, both rely on massive parallelization. HANA is a true column store and does not end up duplicating the data as Oracle does. Now going back to using Spark as middleware accessing SAP HANA , it may make sense if the objective is to extract data from SAP HANA and save it into

Re: Unable to execute query on SAPHANA using SPARK

2016-03-29 Thread Gourav Sengupta
. Regards, Gourav On Tue, Mar 29, 2016 at 3:54 PM, reena upadhyay < reena.upadh...@impetus.co.in> wrote: > I am trying to execute query using spark sql on SAP HANA from spark > shell. I > am able to create the data frame object. On calling any action on the data > frame ob

Re: Unable to execute query on SAPHANA using SPARK

2016-03-29 Thread Mich Talebzadeh
y.Host is not serializable. > > Maybe post question on Sap Hana mailing list (if any) ? > > On Tue, Mar 29, 2016 at 7:54 AM, reena upadhyay < > reena.upadh...@impetus.co.in> wrote: > >> I am trying to execute query using spark sql on SAP HANA from spark >> shell.

Re: Unable to execute query on SAPHANA using SPARK

2016-03-29 Thread Ted Yu
As the error said, com.sap.db.jdbc.topology.Host is not serializable. Maybe post question on Sap Hana mailing list (if any) ? On Tue, Mar 29, 2016 at 7:54 AM, reena upadhyay < reena.upadh...@impetus.co.in> wrote: > I am trying to execute query using spark sql on SAP HANA from spark &

Unable to execute query on SAPHANA using SPARK

2016-03-29 Thread reena upadhyay
I am trying to execute query using spark sql on SAP HANA from spark shell. I am able to create the data frame object. On calling any action on the data frame object, I am getting* java.io.NotSerializableException.* Steps I followed after adding saphana driver jar in spark class path. 1. Start

Re: println not appearing in libraries when running job using spark-submit --master local

2016-03-28 Thread Kevin Peng
making the println in the SocialUtil object appear. >> >> Thanks, >> >> KP >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-usin

Re: println not appearing in libraries when running job using spark-submit --master local

2016-03-28 Thread Ted Yu
. > > Thanks, > > KP > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-using-spark-submit-master-loc

println not appearing in libraries when running job using spark-submit --master local

2016-03-28 Thread kpeng1
ng on in this situation and how I can go about making the println in the SocialUtil object appear. Thanks, KP -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-using-spark-submit-master-local-tp26617.html

Using Spark to retrieve a HDFS file protected by Kerberos

2016-03-23 Thread Nkechi Achara
I am having issues setting up my spark environment to read from a kerberized HDFS file location. At the moment I have tried to do the following: def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match { case None => code case Some(u) => u.doAs(new PrivilegedExceptionAc

Using Spark to retrieve a HDFS file protected by Kerberos

2016-03-22 Thread Nkechi Achara
I am having issues setting up my spark environment to read from a kerberized HDFS file location. At the moment I have tried to do the following: def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match { case None => code case Some(u) => u.doAs(new PrivilegedExceptionAc

Re: Best way to store Avro Objects as Parquet using SPARK

2016-03-22 Thread Manivannan Selvadurai
at 11:07 PM, Michael Armbrust wrote: > But when tired using Spark streamng I could not find a way to store the >> data with the avro schema information. The closest that I got was to create >> a Dataframe using the json RDDs and store them as parquet. Here the parquet >>

Re: Best way to store Avro Objects as Parquet using SPARK

2016-03-21 Thread Michael Armbrust
> > But when tired using Spark streamng I could not find a way to store the > data with the avro schema information. The closest that I got was to create > a Dataframe using the json RDDs and store them as parquet. Here the parquet > files had a spark specific schema in their foote

Re: Best way to store Avro Objects as Parquet using SPARK

2016-03-21 Thread Manivannan Selvadurai
quetWriter in separately to create the Parquet >> Files. The parquet files along with the data also had the 'avro schema' >> stored on them as a part of their footer. >> >>But when tired using Spark streamng I could not find a way to >> store the da

Re: Best way to store Avro Objects as Parquet using SPARK

2016-03-20 Thread Sebastian Piu
> I was able to use AvroParquetWriter in separately to create the Parquet > Files. The parquet files along with the data also had the 'avro schema' > stored on them as a part of their footer. > >But when tired using Spark streamng I could not find a way to &

Best way to store Avro Objects as Parquet using SPARK

2016-03-20 Thread Manivannan Selvadurai
t of their footer. But when tired using Spark streamng I could not find a way to store the data with the avro schema information. The closest that I got was to create a Dataframe using the json RDDs and store them as parquet. Here the parquet files had a spark specific schema in their footer.

Re: Get Offset when using Spark Streaming + Kafka

2016-03-06 Thread Cody Koeninger
Have you read the materials linked from https://github.com/koeninger/kafka-exactly-once On Sun, Mar 6, 2016 at 8:39 AM, Zhun Shen wrote: > Hi, > > I use KafkaUtils.createDirectStream to consumer data from Kafka, but I found > that Zookeeper-based Kafka monitoring tools could not show progress of

Get Offset when using Spark Streaming + Kafka

2016-03-06 Thread Zhun Shen
Hi, I use KafkaUtils.createDirectStream to consumer data from Kafka, but I found that Zookeeper-based Kafka monitoring tools could not show progress of the streaming application because createDirectStream save the offset in checkpoints(http://spark.apache.org/docs/latest/streaming-kafka-integra

Re: Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Suresh Thalamati
the error ('Invalid method name: ‘alter_table_with_cascade’’) you are seeing may be related to mismatch of hive versions. Error looks similar to one reported in https://issues.apache.org/jira/browse/SPARK-12496 > On Mar 3, 2016, at 7:43 AM, Gourav Sengupta wrote: > > Hi, > > Why are you

Re: Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Gourav Sengupta
Hi, Why are you trying to load data into HIVE and then access it via hiveContext? (by the way hiveContext tables are not visible in the sqlContext). Please read the data directly into a SPARK dataframe and then register it as a temp table to run queries on it. Regards, Gourav On Thu, Mar 3, 20

Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Afshartous, Nick
Hi, On AWS EMR 4.2 / Spark 1.5.2, I tried the example here https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables to load data from a file into a Hive table. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> sqlContext.sql("CREATE TABLE

Pattern Matching over a Sequence of rows using Spark

2016-02-28 Thread Jerry Lam
Hi spark users and developers, Anyone has experience developing pattern matching over a sequence of rows using Spark? I'm talking about functionality similar to matchpath in Hive or match_recognize in Oracle DB. It is used for path analysis on clickstream data. If you know of any libraries

Re: Using Spark functional programming rather than SQL, Spark on Hive tables

2016-02-24 Thread Mich Talebzadeh
Well spotted Sab. You are correct. An oversight by me. They should both use "sales". The results are now comparable The following statement "On the other hand using SQL the query 1 takes 19 seconds compared to just under 4 minutes for functional programming The seconds query using SQL ta

Re: Using Spark functional programming rather than SQL, Spark on Hive tables

2016-02-24 Thread Sabarish Sasidharan
Spark has its own efficient in memory columnar format. So it's not ORC. It's just that the data has to be serialized and deserialized over the network. And that is consuming time. Regards Sab On 24-Feb-2016 9:50 pm, "Mich Talebzadeh" < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > *

Re: Using Spark functional programming rather than SQL, Spark on Hive tables

2016-02-24 Thread Sabarish Sasidharan
One more, you are referring to 2 different sales tables. That might account for the difference in numbers. Regards Sab On 24-Feb-2016 9:50 pm, "Mich Talebzadeh" < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > *Hi,* > > *Tools* > > *Spark 1.5.2, Hadoop 2.6, Hive 2.0, Spark-Shell, Hiv

Re: Using Spark functional programming rather than SQL, Spark on Hive tables

2016-02-24 Thread Mich Talebzadeh
HI, TOOLS SPARK 1.5.2, HADOOP 2.6, HIVE 2.0, SPARK-SHELL, HIVE DATABASE OBJECTIVES: TIMING DIFFERENCES BETWEEN RUNNING SPARK USING SQL AND RUNNING SPARK USING FUNCTIONAL PROGRAMING (FP) (FUNCTIONAL CALLS) ON HIVE TABLES UNDERLYING TABLES: THREE TABLES IN HIVE DATABASE USING ORC FORMAT

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav, I did a prove as you said, for me it’s working, I am using spark in local mode, master and worker in the same machine. I run the example in spark-shell —package com.databricks:spark-csv_2.10:1.3.0 without errors. BR From: Gourav Sengupta Date: Monday, February 15, 2016 at 10:03

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
ter in local mode kindly do not attempt in answering this question. My question is how to use packages like https://github.com/databricks/spark-csv when I using SPARK cluster in local mode. Regards, Gourav Sengupta <http://spark.apache.org/docs/latest/spark-standalone.html> On Mon, Feb 15, 201

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
Hi Gourav, I did not unterstand your problem… the - - packages command should not make any difference if you are running standalone or in YARN for example. Give us an example what packages are you trying to load, and what error are you getting… If you want to use the libraries in spark-pack

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, I am grateful for everyone's response, but sadly no one here actually has read the question before responding. Has anyone yet tried starting a SPARK cluster as mentioned in the link in my email? :) Regards, Gourav On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado wrote: > $SPARK_HOME/bin/s

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 It will download everything for you and register into your JVM. If you want to use it in your Prod just package it with maven. > On 15/02/2016, at 12:14, Gourav Sengupta wrote: > > Hi, > > How to we include the fol

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, How to we include the following package: https://github.com/databricks/spark-csv while starting a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html Thanks and Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R wrote:

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Ramanathan R
Hi Gourav, If your question is how to distribute python package dependencies across the Spark cluster programmatically? ...here is an example - $ export PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application' And in code: sc.addPyFile('/path/to/thrift.

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, So far no one is able to get my question at all. I know what it takes to load packages via SPARK shell or SPARK submit. How do I load packages when starting a SPARK cluster, as mentioned here http://spark.apache.org/docs/latest/spark-standalone.html ? Regards, Gourav Sengupta On Mon, Fe

Re: Using SPARK packages in Spark Cluster

2016-02-13 Thread Gourav Sengupta
Hi, I was interested in knowing how to load the packages into SPARK cluster started locally. Can someone pass me on the links to set the conf file so that the packages can be loaded? Regards, Gourav On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz wrote: > Hello Gourav, > > The packages need to be

Re: Using SPARK packages in Spark Cluster

2016-02-12 Thread Burak Yavuz
Hello Gourav, The packages need to be loaded BEFORE you start the JVM, therefore you won't be able to add packages dynamically in code. You should use the --packages with pyspark before you start your application. One option is to add a `conf` that will load some packages if you are constantly goi

Using SPARK packages in Spark Cluster

2016-02-12 Thread Gourav Sengupta
Hi, I am creating sparkcontext in a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html using the following code: -- sc.stop()

newbie how to access S3 cluster created using spark-ec2

2016-02-10 Thread Andy Davidson
I am using spark-1.6.0 and java. I created a cluster using spark-ec2. I am having a heck of time figuring out how to write from my streaming app to AWS s3. I should mention I have never used s3 before and am not sure it is set up correctly. org.apache.hadoop.fs.s3.S3Exception

Advise on using spark shell for Hive table sql queries

2016-02-07 Thread Mich Talebzadeh
Hi, Pretty new to spark shell. So decided to write this piece of code to get the data from spark shell on Hiver tables. The issue is that I don't really need to define the sqlContext here as I can do a simple command like sql("select count(1) from t") WITHOUT sqlContext. sql("select cou

Re: Re: clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
...@163.com From: Ted Yu Date: 2016-02-04 11:49 To: fightf...@163.com CC: user Subject: Re: Re: clear cache using spark sql cli In spark-shell, I can do: scala> sqlContext.clearCache() Is that not the case for you ? On Wed, Feb 3, 2016 at 7:35 PM, fightf...@163.com wrote: Hi, Ted Yes. I had s

Re: Re: clear cache using spark sql cli

2016-02-03 Thread Ted Yu
gt; *To:* fightf...@163.com > *CC:* user > *Subject:* Re: clear cache using spark sql cli > Have you looked at > SPARK-5909 Add a clearCache command to Spark SQL's cache manager > > On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com > wrote: > >> Hi, >> How

Re: Re: clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
From: Ted Yu Date: 2016-02-04 11:22 To: fightf...@163.com CC: user Subject: Re: clear cache using spark sql cli Have you looked at SPARK-5909 Add a clearCache command to Spark SQL's cache manager On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com wrote: Hi, How could I clear cache (execute sql

Re: clear cache using spark sql cli

2016-02-03 Thread Ted Yu
Have you looked at SPARK-5909 Add a clearCache command to Spark SQL's cache manager On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com wrote: > Hi, > How could I clear cache (execute sql query without any cache) using spark > sql cli ? > Is there any command availabl

clear cache using spark sql cli

2016-02-03 Thread fightf...@163.com
Hi, How could I clear cache (execute sql query without any cache) using spark sql cli ? Is there any command available ? Best, Sun. fightf...@163.com

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only configura

Re: deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Christopher Nguyen
Thanks Nick :) Abid, you may also want to check out http://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/43484, which describes our work on a combination of Spark and Tachyon for Deep Learning. We found significant gains in using Tachyon (with co-processing) fo

Re: deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Nick Pentreath
> http://apache-spark-user-list.1001560.n3.nabble.com/deep-learning-with-heterogeneous-cloud-computing-using-spark-tp26109.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To

deep learning with heterogeneous cloud computing using spark

2016-01-30 Thread Abid Malik
Dear all; Is there any work in this area? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/deep-learning-with-heterogeneous-cloud-computing-using-spark-tp26109.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2016-01-29 Thread Iulian Dragoș
s like a simple/naive question but really couldn’t find an answer. >> >> >> >> *From:* Fernandez, Andres >> *Sent:* Tuesday, January 26, 2016 2:53 PM >> *To:* 'Ewan Leith'; Iulian Dragoș >> *Cc:* user >> *Subject:* RE: how to correctly run scala script

Re: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2016-01-29 Thread Iulian Dragoș
ith'; Iulian Dragoș > *Cc:* user > *Subject:* RE: how to correctly run scala script using spark-shell > through stdin (spark v1.0.0) > > > > True thank you. Is there a way of having the shell not closed (how to > avoid the :quit statement). Thank you both. >

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-29 Thread cc
Hey, Jia Zou I'm curious about this exception, the error log you showed that the exception is related to unlockBlock, could you upload your full master.log and worker.log under tachyon/logs directory? Best, Cheng 在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道: > > Hi, > > Thanks for the detaile

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-28 Thread Calvin Jia
Hi, Thanks for the detailed information. How large is the dataset you are running against? Also did you change any Tachyon configurations? Thanks, Calvin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additiona

Re: Using Spark in mixed Java/Scala project

2016-01-27 Thread Zoran Jeremic
Hi Jakob, Thanks a lot for your help. I'll try this. Zoran On Wed, Jan 27, 2016 at 10:49 AM, Jakob Odersky wrote: > JavaSparkContext has a wrapper constructor for the "scala" > SparkContext. In this case all you need to do is declare a > SparkContext that is accessible both from the Java and S

Re: Using Spark in mixed Java/Scala project

2016-01-27 Thread Jakob Odersky
JavaSparkContext has a wrapper constructor for the "scala" SparkContext. In this case all you need to do is declare a SparkContext that is accessible both from the Java and Scala sides of your project and wrap the context with a JavaSparkContext. Search for java source compatibilty with scala for

Using Spark in mixed Java/Scala project

2016-01-27 Thread jeremycod
Hi, I have a mixed Java/Scala project. I have already been using Spark in Scala code in local mode. Now, some new team members should develop functionalities that should use Spark but in Java code, and they are not familiar with Scala. I know it's not possible to have two Spark contexts i

RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2016-01-27 Thread Andres.Fernandez
To: 'Ewan Leith'; Iulian Dragoș Cc: user Subject: RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0) True thank you. Is there a way of having the shell not closed (how to avoid the :quit statement). Thank you both. Andres From: Ewan Leith [mail

Re: How to send a file to database using spark streaming

2016-01-27 Thread Akhil Das
This is a good start https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md Thanks Best Regards On Sat, Jan 23, 2016 at 12:19 PM, Sree Eedupuganti wrote: > New to Spark Streaming. My question is i want to load the XML files to > database [cassandra] using

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
ream.java:122) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > > at > org.apache.thrift.tr

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 15 more On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote: > Dears, I keep getting below exception when using Spark 1.6.0 on top of > Tachyon

<    1   2   3   4   5   6   7   8   9   10   >