simultaneously
There are two distinct points here.
Using Spark as a query engine. That is BAU and most forum members use it
everyday. You run Spark with either Standalone, Yarn or Mesos as Cluster
managers. You start master that does the management of resources and you
start slaves to create workers
production
environment when several people make different queries simultaneously?
It's impossible to restart Spark masters, workers several tines a day, tune
it constantly.
On Mon, May 23, 2016 at 2:42 AM, Mich Talebzadeh
wrote:
> Hi,
>
>
>
> I have done a number of extensive tes
Hi,
I have done a number of extensive tests using Spark-shell with Hive DB and
ORC tables.
Now one issue that we typically face is and I quote:
Spark is fast as it uses Memory and DAG. Great but when we save data it is
not fast enough
OK but there is a solution now. If you use Spark with
DECIMAL(20,2))
, CAST(REGEXP_REPLACE(vat,'[^\\d\\.]','') AS DECIMAL(20,2))
, CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2))
FROM
stg_t2
WHERE
--INVOICENUMBER > 0 AND
CAST(REGEXP_REPLACE(total,'[
CCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>
gt;>>>
>>>>
>>>> On 22 May 2016 at 20:14, Jörn Franke wrote:
>>>>
>>>>> 14000 partitions seem to be way too many to be performant (except for
>>>>> large data sets). How much data does one partition contain?
>>>&g
ch data does one partition contain?
>>>>
>>>> > On 22 May 2016, at 09:34, SRK wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > In my Spark SQL query to insert data, I have around 14,000 partitions
>>>
g memory issues. How can I insert the
>>> data for
>>> > 100 partitions at a time to avoid any memory issues?
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> http://apache-spark-user-list.10015
ues. How can I insert the data
>> for
>> > 100 partitions at a time to avoid any memory issues?
>> >
>> >
>> >
>> > --
>> > View this message in context:
>
; for
> > 100 partitions at a time to avoid any memory issues?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark
be causing memory issues. How can I insert the data for
> 100 partitions at a time to avoid any memory issues?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-
016 at 08:34, SRK wrote:
>>
>>> Hi,
>>>
>>> In my Spark SQL query to insert data, I have around 14,000 partitions of
>>> data which seems to be causing memory issues. How can I insert the data
>>> for
>>> 100 partiti
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn *
>>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOA
;>
>>>>
>>>> LinkedIn *
>>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>
wrote:
>>>
>>>> Hi,
>>>>
>>>> In my Spark SQL query to insert data, I have around 14,000 partitions of
>>>> data which seems to be causing memory issues. How can I insert the data
>>>> for
>>>> 100 partitions at a time
; In my Spark SQL query to insert data, I have around 14,000 partitions of
>>> data which seems to be causing memory issues. How can I insert the data
>>> for
>>> 100 partitions at a time to avoid any memory issues?
>>>
>>>
>>>
>>> -
be causing memory issues. How can I insert the data
>> for
>> 100 partitions at a time to avoid any memory issues?
>>
>>
>>
>> --
>> View this message in context:
>> http:
t;
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> --
/How-to-insert-data-for-100-partitions-at-a-time-using-Spark-SQL-tp26997.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
Hi,
>
> I see some memory issues when trying to insert the data in the form of ORC
> using Spark SQL. Please find the query and exception below. Any idea as to
> why this is happening?
>
> sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
>
Hi,
I see some memory issues when trying to insert the data in the form of ORC
using Spark SQL. Please find the query and exception below. Any idea as to
why this is happening?
sqlContext.sql(" CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
record STRING) PARTITIONED BY (datePart
ctRecord = new ProducerRecord(Configs.topic,new
> Random().nextInt(10), "" , r.toString)
> producer.send(productRecord)
> }
> })
> count+=1
> Thread.sleep(5000);
> }
>
>
> Complete code is available here(
> https://gi
tring) producer.send(productRecord) } })
count+=1 Thread.sleep(5000); }
Complete code is available
here(https://github.com/zuxqoj/HelloWorld/tree/master/SparkStreamingStateStore/src/main/scala/spark/streaming/statestore/test)
I am using spark on yarn in client
I am using spark sql for running hive queries also. Is there any way to run
hive queries in asyc mode using spark sql.
Does it return any hive handle or if yes how to get the results from hive
handle using spark sql?
--
Thanks,
Raju Bairishetti,
www.lazada.com
hi all
i am experiencing issues when creating ec2 clusters using scripts in hte
spark\ec2 directory
i launched the following command
./spark-ec2 -k sparkkey -i sparkAccessKey.pem -r us-west2 -s 4 launch
MM-Cluster
My output is stuck with the following (has been for the last 20 minutes)
i am
es/92
>
> This is about XmlInputFormat.scala and it seems a bit tricky to handle the
> case so I left open until now.
>
>
> Thanks!
>
>
> 2016-05-13 5:03 GMT+09:00 Arunkumar Chandrasekar :
>> Hello,
>>
>> Greetings.
>>
>> I'm trying to pr
Chandrasekar :
> Hello,
>
> Greetings.
>
> I'm trying to process a xml file exported from Health Kit application
> using Spark SQL for learning purpose. The sample record data is like the
> below:
>
> sourceVersion="9.3" device="<<HKDevice: 0x7
Hello,
Greetings.
I'm trying to process a xml file exported from Health Kit application using
Spark SQL for learning purpose. The sample record data is like the below:
.
I want to have the column name of my table as the field value like type,
sourceName, sourceVersion and the row en
Hi Guys ,
Does any of you have tried this mechanism before?
I am able to run it locally and get the output ..But how do i submit the
job to the Yarn-Cluster using Spark-JobServer.
Any documentation ?
Regards
Ashesh
--
View this message in context:
http://apache-spark-user-list.1001560
Thank you for the question.
What is different on this machine as compared to the ones where the job
succeeded?
-
Neelesh S. Salian
Cloudera
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-jar-using-spark-submit-on-another-machine
What Davies said is correct, second argument is hadoop's output format.
Hadoop supports many type of output format's and all of them have their own
advantages. Apart from the one specified above,
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html
hdfs://192.168.10.130:9000/dev/output/test already exists, so you need
to remove it first.
On Tue, Apr 26, 2016 at 5:28 AM, Luke Adolph wrote:
> Hi, all:
> Below is my code:
>
> from pyspark import *
> import re
>
> def getDateByLine(input_str):
> str_pattern = '^\d{4}-\d{2}-\d{2}'
> patt
Hi, all:
Below is my code:
from pyspark import *import re
def getDateByLine(input_str):
str_pattern = '^\d{4}-\d{2}-\d{2}'
pattern = re.compile(str_pattern)
match = pattern.match(input_str)
if match:
return match.group()
else:
return None
file_url = "hdfs://192
Dear All,
I installed spark 1.6.1 on Amazon EC2 using spark-ec2 script. Everything
was OK, but , it failed to start httpd at the end of the installation. I
followed exactly the instruction and I repeated the process many times, but
there is no luck.
-
[timing] rstudio setup: 00h
Data.filter(line => line.contains("spark")).count()
println("Lines with Hadoop : %s, Lines with Spark: %s".format(numAs,
numBs))
}
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-us
ssage-
From: "Amit Hora"
Sent: 4/13/2016 11:41 AM
To: "Jörn Franke"
Cc: "user@spark.apache.org"
Subject: RE: Unable to Access files in Hadoop HA enabled from using Spark
There are DNS entries for both of my namenode
Ambarimaster is standby and it resolves to ip per
uot;
Sent: 4/13/2016 11:37 AM
To: "Amit Singh Hora"
Cc: "user@spark.apache.org"
Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark
Is the host in /etc/hosts ?
> On 13 Apr 2016, at 07:28, Amit Singh Hora wrote:
>
> I am trying to access di
; hadoop and it works
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
This property already exists.
-Original Message-
From: "ashesh_28 [via Apache Spark User List]"
Sent: 4/13/2016 11:02 AM
To: "Amit Singh Hora"
Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark
Try adding the following propert
-Hadoop-HA-enabled-from-using-Spark-tp26768p26769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
success.
Any suggestion will be of a great help
Note:- Hadoop HA is working properly as i have tried uploading file to
hadoop and it works
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-Access-files-in-Hadoop-HA-enabled-from-using-Spark-tp26768.h
e any reference implementation that I could look at?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-best-way-to-process-streaming-data-from-multiple-channels-simultaneously-using-Spark-2-0-tp26720.html
Sent from the Apache Spark User List mailing
specialized HW.As a result, both rely on massive
parallelization. HANA is a true column store and does not end up
duplicating the data as Oracle does. Now going back to using Spark as
middleware accessing SAP HANA , it may make sense if the objective is to
extract data from SAP HANA and save it into
.
Regards,
Gourav
On Tue, Mar 29, 2016 at 3:54 PM, reena upadhyay <
reena.upadh...@impetus.co.in> wrote:
> I am trying to execute query using spark sql on SAP HANA from spark
> shell. I
> am able to create the data frame object. On calling any action on the data
> frame ob
y.Host is not serializable.
>
> Maybe post question on Sap Hana mailing list (if any) ?
>
> On Tue, Mar 29, 2016 at 7:54 AM, reena upadhyay <
> reena.upadh...@impetus.co.in> wrote:
>
>> I am trying to execute query using spark sql on SAP HANA from spark
>> shell.
As the error said, com.sap.db.jdbc.topology.Host is not serializable.
Maybe post question on Sap Hana mailing list (if any) ?
On Tue, Mar 29, 2016 at 7:54 AM, reena upadhyay <
reena.upadh...@impetus.co.in> wrote:
> I am trying to execute query using spark sql on SAP HANA from spark
&
I am trying to execute query using spark sql on SAP HANA from spark shell. I
am able to create the data frame object. On calling any action on the data
frame object, I am getting* java.io.NotSerializableException.*
Steps I followed after adding saphana driver jar in spark class path.
1. Start
making the println in the SocialUtil object appear.
>>
>> Thanks,
>>
>> KP
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-usin
.
>
> Thanks,
>
> KP
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-using-spark-submit-master-loc
ng on in this situation and how I can
go about making the println in the SocialUtil object appear.
Thanks,
KP
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/println-not-appearing-in-libraries-when-running-job-using-spark-submit-master-local-tp26617.html
I am having issues setting up my spark environment to read from a
kerberized HDFS file location.
At the moment I have tried to do the following:
def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match
{
case None => code
case Some(u) => u.doAs(new PrivilegedExceptionAc
I am having issues setting up my spark environment to read from a
kerberized HDFS file location.
At the moment I have tried to do the following:
def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match
{
case None => code
case Some(u) => u.doAs(new PrivilegedExceptionAc
at 11:07 PM, Michael Armbrust
wrote:
> But when tired using Spark streamng I could not find a way to store the
>> data with the avro schema information. The closest that I got was to create
>> a Dataframe using the json RDDs and store them as parquet. Here the parquet
>>
>
> But when tired using Spark streamng I could not find a way to store the
> data with the avro schema information. The closest that I got was to create
> a Dataframe using the json RDDs and store them as parquet. Here the parquet
> files had a spark specific schema in their foote
quetWriter in separately to create the Parquet
>> Files. The parquet files along with the data also had the 'avro schema'
>> stored on them as a part of their footer.
>>
>>But when tired using Spark streamng I could not find a way to
>> store the da
> I was able to use AvroParquetWriter in separately to create the Parquet
> Files. The parquet files along with the data also had the 'avro schema'
> stored on them as a part of their footer.
>
>But when tired using Spark streamng I could not find a way to
&
t of their footer.
But when tired using Spark streamng I could not find a way to
store the data with the avro schema information. The closest that I got was
to create a Dataframe using the json RDDs and store them as parquet. Here
the parquet files had a spark specific schema in their footer.
Have you read the materials linked from
https://github.com/koeninger/kafka-exactly-once
On Sun, Mar 6, 2016 at 8:39 AM, Zhun Shen wrote:
> Hi,
>
> I use KafkaUtils.createDirectStream to consumer data from Kafka, but I found
> that Zookeeper-based Kafka monitoring tools could not show progress of
Hi,
I use KafkaUtils.createDirectStream to consumer data from Kafka, but I found
that Zookeeper-based Kafka monitoring tools could not show progress of the
streaming application because createDirectStream save the offset in
checkpoints(http://spark.apache.org/docs/latest/streaming-kafka-integra
the error ('Invalid method name: ‘alter_table_with_cascade’’) you are seeing
may be related to mismatch of hive versions. Error looks similar to one
reported in https://issues.apache.org/jira/browse/SPARK-12496
> On Mar 3, 2016, at 7:43 AM, Gourav Sengupta wrote:
>
> Hi,
>
> Why are you
Hi,
Why are you trying to load data into HIVE and then access it via
hiveContext? (by the way hiveContext tables are not visible in the
sqlContext).
Please read the data directly into a SPARK dataframe and then register it
as a temp table to run queries on it.
Regards,
Gourav
On Thu, Mar 3, 20
Hi,
On AWS EMR 4.2 / Spark 1.5.2, I tried the example here
https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables
to load data from a file into a Hive table.
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> sqlContext.sql("CREATE TABLE
Hi spark users and developers,
Anyone has experience developing pattern matching over a sequence of rows
using Spark? I'm talking about functionality similar to matchpath in Hive
or match_recognize in Oracle DB. It is used for path analysis on
clickstream data. If you know of any libraries
Well spotted Sab. You are correct. An oversight by me. They should both
use "sales".
The results are now comparable
The following statement
"On the other hand using SQL the query 1 takes 19 seconds compared to
just under 4 minutes for functional programming
The seconds query using SQL ta
Spark has its own efficient in memory columnar format. So it's not ORC.
It's just that the data has to be serialized and deserialized over the
network. And that is consuming time.
Regards
Sab
On 24-Feb-2016 9:50 pm, "Mich Talebzadeh" <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>
> *
One more, you are referring to 2 different sales tables. That might account
for the difference in numbers.
Regards
Sab
On 24-Feb-2016 9:50 pm, "Mich Talebzadeh" <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>
> *Hi,*
>
> *Tools*
>
> *Spark 1.5.2, Hadoop 2.6, Hive 2.0, Spark-Shell, Hiv
HI,
TOOLS
SPARK 1.5.2, HADOOP 2.6, HIVE 2.0, SPARK-SHELL, HIVE DATABASE
OBJECTIVES: TIMING DIFFERENCES BETWEEN RUNNING SPARK USING SQL AND
RUNNING SPARK USING FUNCTIONAL PROGRAMING (FP) (FUNCTIONAL CALLS) ON
HIVE TABLES
UNDERLYING TABLES: THREE TABLES IN HIVE DATABASE USING ORC FORMAT
Hi Gourav,
I did a prove as you said, for me it’s working, I am using spark in local mode,
master and worker in the same machine. I run the example in spark-shell
—package com.databricks:spark-csv_2.10:1.3.0 without errors.
BR
From: Gourav Sengupta
Date: Monday, February 15, 2016 at 10:03
ter in local mode kindly do not attempt
in answering this question.
My question is how to use packages like
https://github.com/databricks/spark-csv when I using SPARK cluster in local
mode.
Regards,
Gourav Sengupta
<http://spark.apache.org/docs/latest/spark-standalone.html>
On Mon, Feb 15, 201
Hi Gourav,
I did not unterstand your problem… the - - packages command should not make
any difference if you are running standalone or in YARN for example.
Give us an example what packages are you trying to load, and what error are you
getting… If you want to use the libraries in spark-pack
Hi,
I am grateful for everyone's response, but sadly no one here actually has
read the question before responding.
Has anyone yet tried starting a SPARK cluster as mentioned in the link in
my email?
:)
Regards,
Gourav
On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado wrote:
> $SPARK_HOME/bin/s
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
It will download everything for you and register into your JVM. If you want
to use it in your Prod just package it with maven.
> On 15/02/2016, at 12:14, Gourav Sengupta wrote:
>
> Hi,
>
> How to we include the fol
Hi,
How to we include the following package:
https://github.com/databricks/spark-csv while starting a SPARK standalone
cluster as mentioned here:
http://spark.apache.org/docs/latest/spark-standalone.html
Thanks and Regards,
Gourav Sengupta
On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R
wrote:
Hi Gourav,
If your question is how to distribute python package dependencies across
the Spark cluster programmatically? ...here is an example -
$ export
PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
And in code:
sc.addPyFile('/path/to/thrift.
Hi,
So far no one is able to get my question at all. I know what it takes to
load packages via SPARK shell or SPARK submit.
How do I load packages when starting a SPARK cluster, as mentioned here
http://spark.apache.org/docs/latest/spark-standalone.html ?
Regards,
Gourav Sengupta
On Mon, Fe
Hi,
I was interested in knowing how to load the packages into SPARK cluster
started locally. Can someone pass me on the links to set the conf file so
that the packages can be loaded?
Regards,
Gourav
On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz wrote:
> Hello Gourav,
>
> The packages need to be
Hello Gourav,
The packages need to be loaded BEFORE you start the JVM, therefore you
won't be able to add packages dynamically in code. You should use the
--packages with pyspark before you start your application.
One option is to add a `conf` that will load some packages if you are
constantly goi
Hi,
I am creating sparkcontext in a SPARK standalone cluster as mentioned here:
http://spark.apache.org/docs/latest/spark-standalone.html using the
following code:
--
sc.stop()
I am using spark-1.6.0 and java. I created a cluster using spark-ec2. I am
having a heck of time figuring out how to write from my streaming app to AWS
s3. I should mention I have never used s3 before and am not sure it is set
up correctly.
org.apache.hadoop.fs.s3.S3Exception
Hi,
Pretty new to spark shell.
So decided to write this piece of code to get the data from spark shell on
Hiver tables. The issue is that I don't really need to define the sqlContext
here as I can do a simple command like sql("select count(1) from t") WITHOUT
sqlContext. sql("select cou
...@163.com
From: Ted Yu
Date: 2016-02-04 11:49
To: fightf...@163.com
CC: user
Subject: Re: Re: clear cache using spark sql cli
In spark-shell, I can do:
scala> sqlContext.clearCache()
Is that not the case for you ?
On Wed, Feb 3, 2016 at 7:35 PM, fightf...@163.com wrote:
Hi, Ted
Yes. I had s
gt; *To:* fightf...@163.com
> *CC:* user
> *Subject:* Re: clear cache using spark sql cli
> Have you looked at
> SPARK-5909 Add a clearCache command to Spark SQL's cache manager
>
> On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com
> wrote:
>
>> Hi,
>> How
From: Ted Yu
Date: 2016-02-04 11:22
To: fightf...@163.com
CC: user
Subject: Re: clear cache using spark sql cli
Have you looked at
SPARK-5909 Add a clearCache command to Spark SQL's cache manager
On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com wrote:
Hi,
How could I clear cache (execute sql
Have you looked at
SPARK-5909 Add a clearCache command to Spark SQL's cache manager
On Wed, Feb 3, 2016 at 7:16 PM, fightf...@163.com wrote:
> Hi,
> How could I clear cache (execute sql query without any cache) using spark
> sql cli ?
> Is there any command availabl
Hi,
How could I clear cache (execute sql query without any cache) using spark sql
cli ?
Is there any command available ?
Best,
Sun.
fightf...@163.com
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS
instance with 30GB physical memory.
Spark will cache data off-heap to Tachyon, the input data is also stored in
Tachyon.
Tachyon is configured to use 15GB memory, and use tired store.
Tachyon underFS is /tmp.
The only configura
Thanks Nick :)
Abid, you may also want to check out
http://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/43484,
which describes our work on a combination of Spark and Tachyon for Deep
Learning. We found significant gains in using Tachyon (with co-processing)
fo
> http://apache-spark-user-list.1001560.n3.nabble.com/deep-learning-with-heterogeneous-cloud-computing-using-spark-tp26109.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To
Dear all;
Is there any work in this area?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/deep-learning-with-heterogeneous-cloud-computing-using-spark-tp26109.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
s like a simple/naive question but really couldn’t find an answer.
>>
>>
>>
>> *From:* Fernandez, Andres
>> *Sent:* Tuesday, January 26, 2016 2:53 PM
>> *To:* 'Ewan Leith'; Iulian Dragoș
>> *Cc:* user
>> *Subject:* RE: how to correctly run scala script
ith'; Iulian Dragoș
> *Cc:* user
> *Subject:* RE: how to correctly run scala script using spark-shell
> through stdin (spark v1.0.0)
>
>
>
> True thank you. Is there a way of having the shell not closed (how to
> avoid the :quit statement). Thank you both.
>
Hey, Jia Zou
I'm curious about this exception, the error log you showed that the
exception is related to unlockBlock, could you upload your full master.log
and worker.log under tachyon/logs directory?
Best,
Cheng
在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道:
>
> Hi,
>
> Thanks for the detaile
Hi,
Thanks for the detailed information. How large is the dataset you are
running against? Also did you change any Tachyon configurations?
Thanks,
Calvin
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additiona
Hi Jakob,
Thanks a lot for your help. I'll try this.
Zoran
On Wed, Jan 27, 2016 at 10:49 AM, Jakob Odersky wrote:
> JavaSparkContext has a wrapper constructor for the "scala"
> SparkContext. In this case all you need to do is declare a
> SparkContext that is accessible both from the Java and S
JavaSparkContext has a wrapper constructor for the "scala"
SparkContext. In this case all you need to do is declare a
SparkContext that is accessible both from the Java and Scala sides of
your project and wrap the context with a JavaSparkContext.
Search for java source compatibilty with scala for
Hi,
I have a mixed Java/Scala project. I have already been using Spark in Scala
code in local mode. Now, some new team members should develop
functionalities that should use Spark but in Java code, and they are not
familiar with Scala. I know it's not possible to have two Spark contexts i
To: 'Ewan Leith'; Iulian Dragoș
Cc: user
Subject: RE: how to correctly run scala script using spark-shell through stdin
(spark v1.0.0)
True thank you. Is there a way of having the shell not closed (how to avoid the
:quit statement). Thank you both.
Andres
From: Ewan Leith [mail
This is a good start
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md
Thanks
Best Regards
On Sat, Jan 23, 2016 at 12:19 PM, Sree Eedupuganti wrote:
> New to Spark Streaming. My question is i want to load the XML files to
> database [cassandra] using
ream.java:122)
>
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>
> at
> org.apache.thrift.tr
java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 15 more
On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote:
> Dears, I keep getting below exception when using Spark 1.6.0 on top of
> Tachyon
401 - 500 of 1153 matches
Mail list logo