hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x ,
there are about 300+ hive tables.The data is stored an text (moving slowly to
Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be
able to define JOINS etc using a programming structure
Any resolution to this? I am having the same problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/zip-files-submitted-with-py-files-disappear-from-hdfs-after-a-while-on-EMR-tp22342p22919.html
Sent from the Apache Spark User List mailing list archive at
Any resolution to this? Im having the same problem.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-YARN-on-AWS-EMR-Issues-finding-file-on-hdfs-tp10214p22918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi Ayan and Helena,
I've considered using Cassandra/HBase but ended up opting to save to worker
hdfs because I want to take advantage of the data locality since the data
will often be loaded to Spark for further processing. I was also under the
impression that saving to filesystem (instead
nd delegate
> "update: part to them.
>
> On Fri, May 15, 2015 at 8:10 PM, Nisrina Luthfiyati
> mailto:nisrina.luthfiy...@gmail.com>> wrote:
>
> Hi all,
> I have a stream of data from Kafka that I want to process and store in hdfs
> using Spark Streaming.
>
r: An error occurred while calling
o30.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: hdfs://sdo-hdp-bd-master1.development.c4i:8020/user/hdfs/
/input/lprs/2015_05_14/file3.csv
Input path does not exist:
hdfs://sdo-hdp-bd-master1.development.c4i:8020/us
; Hi all,
> I have a stream of data from Kafka that I want to process and store in
> hdfs using Spark Streaming.
> Each data has a date/time dimension and I want to write data within the
> same time dimension to the same hdfs directory. The data stream might be
> unordered (by time
Hi all,
I have a stream of data from Kafka that I want to process and store in hdfs
using Spark Streaming.
Each data has a date/time dimension and I want to write data within the
same time dimension to the same hdfs directory. The data stream might be
unordered (by time dimension).
I'm wond
Hello,
I have Spark 1.3.1 running well on EC2 with ephemeral hdfs using the
spark-ec2 script, quite happy with it.
I want to switch to persistent-hdfs in order to be able to maintain data
between cluster stop/starts. Unfortunately spark-ec stop/start causes spark
to revert back from persistent
hiveContext as given below -
>>> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS
>>> 'com.abc.api.udf.MyUpper' USING JAR
>>> 'hdfs:///users/ravindra/customUDF2.jar'")
>>>
>>> I
ng to create custom udfs with hiveContext as given below -
>> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS
>> 'com.abc.api.udf.MyUpper' USING JAR
>> 'hdfs:///users/ravindra/customUDF2.jar'")
>>
>> I have put th
uot;CREATE TEMPORARY FUNCTION sample_to_upper AS
> 'com.abc.api.udf.MyUpper' USING JAR
> 'hdfs:///users/ravindra/customUDF2.jar'")
>
> I have put the udf jar in the hdfs at the path given above. The same
> command works well in the hive shell but failing here
Hi All,
I am trying to create custom udfs with hiveContext as given below -
scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS
'com.abc.api.udf.MyUpper' USING JAR
'hdfs:///users/ravindra/customUDF2.jar'")
I have put the udf jar in the hdfs
o http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi
>
>
> 2015-05-06 12:22 GMT+08:00 MrAsanjar . :
>
>> why not try https://github.com/linkedin/camus - camus is kafka to HDFS
>> pipeline
>>
>> On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior <
Also Kafka has a Hadoop consumer API for doing such things, please refer to
http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi
2015-05-06 12:22 GMT+08:00 MrAsanjar . :
> why not try https://github.com/linkedin/camus - camus is kafka to HDFS
> pipeline
>
> On Tue,
-sparkcontext-textfile).
Thanks
> On May 5, 2015, at 5:59 AM, Oleg Ruchovets wrote:
>
> Hi
>We are using pyspark 1.3 and input is text files located on hdfs.
>
> file structure
>
> file1.txt
> file2.txt
>
>
why not try https://github.com/linkedin/camus - camus is kafka to HDFS
pipeline
On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior <
rendy.b.jun...@gmail.com> wrote:
> Hi all,
>
> I am planning to load data from Kafka to HDFS. Is it normal to use spark
> streaming to load
Hi all,
I am planning to load data from Kafka to HDFS. Is it normal to use spark
streaming to load data from Kafka to HDFS? What are concerns on doing this?
There are no processing to be done by Spark, only to store data to HDFS
from Kafka for storage and for further Spark processing
Rendy
data node or make minreplication to 0. Hdfs is trying
> to replicate at least one more copy and not able to find another DN to do
> thay
> On 6 May 2015 09:37, "Sudarshan Murty" wrote:
>
>> Another thing - could it be a permission problem ?
>> It creates al
Try to add one more data node or make minreplication to 0. Hdfs is trying
to replicate at least one more copy and not able to find another DN to do
thay
On 6 May 2015 09:37, "Sudarshan Murty" wrote:
> Another thing - could it be a permission problem ?
> It creates all the directo
eem to indicate that the system is aware that a datanode exists
> but is excluded from the operation. So, it looks like it is not partitioned
> and Ambari indicates that HDFS is in good health with one NN, one SN, one
> DN.
> I am unable to figure out what the issue is.
> thanks fo
- which seem to indicate that the system is aware that a datanode exists
but is excluded from the operation. So, it looks like it is not partitioned
and Ambari indicates that HDFS is in good health with one NN, one SN, one
DN.
I am unable to figure out what the issue is.
thanks for your help.
On Tue, May
What happens when you try to put files to your hdfs from local filesystem?
Looks like its a hdfs issue rather than spark thing.
On 6 May 2015 05:04, "Sudarshan" wrote:
>
> I have searched all replies to this question & not found an answer.
>
> I am running standalone Sp
I have searched all replies to this question & not found an answer.I am
running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by side, on
the same machine and trying to write output of wordcount program into HDFS
(works fine writing to a local file, /tmp/wordcount).Only line I
Hi
We are using pyspark 1.3 and input is text files located on hdfs.
file structure
file1.txt
file2.txt
file1.txt
file2.txt
...
Question:
1) What is the way to provide as an input for PySpark job multiple
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I am building a mesos cluster for the purposes of using it to run
spark workloads (in addition to other frameworks). I am under the
impression that it is preferable/recommended to run hdfs datanode
process, spark slave on the same physical node
my system having hadoop cluster.I want to process data
stored in HDFS through spark.
When I am running code in eclipse it is giving the
following warning repeatedly:
scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources
.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Apr 20, 2015 at 12:22 PM, madhvi wrote:
>>
>>> Hi All,
>>>
>>> I am new to spark and have installed spark cluster over my system having
>>> hadoop cluster.I want to process d
Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen wrote:
> What machines are HDFS data no
other 2
nodes
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, April 20, 2015 12:57 PM
To: jamborta
Cc: user@spark.apache.org
Subject: Re: writing to hdfs on master node much faster
What machines are HDFS data nodes -- just your master? that would explain it
What machines are HDFS data nodes -- just your master? that would
explain it. Otherwise, is it actually the write that's slow or is
something else you're doing much faster on the master for other
reasons maybe? like you're actually shipping data via the master first
in some local
Hi all,
I have a three node cluster with identical hardware. I am trying a workflow
where it reads data from hdfs, repartitions it and runs a few map operations
then writes the results back to hdfs.
It looks like that all the computation, including the repartitioning and the
maps complete within
On Monday 20 April 2015 03:18 PM, Archit Thakur wrote:
There are lot of similar problems shared and resolved by users on this
same portal. I have been part of those discussions before, Search
those, Please Try them and let us know, if you still face problems.
Thanks and Regards,
Archit Thakur.
There are lot of similar problems shared and resolved by users on this same
portal. I have been part of those discussions before, Search those, Please
Try them and let us know, if you still face problems.
Thanks and Regards,
Archit Thakur.
On Mon, Apr 20, 2015 at 3:05 PM, madhvi wrote:
> On Mo
On Monday 20 April 2015 02:52 PM, SURAJ SHETH wrote:
Hi Madhvi,
I think the memory requested by your job, i.e. 2.0 GB is higher than
what is available.
Please request for 256 MB explicitly while creating Spark Context and
try again.
Thanks and Regards,
Suraj Sheth
Tried the same but still
Hi Madhvi,
I think the memory requested by your job, i.e. 2.0 GB is higher than what
is available.
Please request for 256 MB explicitly while creating Spark Context and try
again.
Thanks and Regards,
Suraj Sheth
he master uri
>> as shown in the web UI's top left corner like: spark://someIPorHost:7077
>> and it should be fine.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Apr 20, 2015 at 12:22 PM, madhvi wrote:
>>
>>> Hi All,
>>>
>>
SparkConf().setAppName("JavaWordCount");
sparkConf.setMaster("spark://192.168.0.119:7077");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://192.168.0.119:9000");
n Mon, Apr 20, 2015 at 12:22 PM, madhvi wrote:
>
>> Hi All,
>>
>> I am new to spark and have installed spark cluster over my system having
>> hadoop cluster.I want to process data stored in HDFS through spark.
>>
>> When I am running code in eclipse it is g
I am new to spark and have installed spark cluster over my system
having hadoop cluster.I want to process data stored in HDFS
through spark.
When I am running code in eclipse it is giving the following
warning repeatedly:
scheduler.TaskSchedulerImpl: Initial job has no
nstalled spark cluster over my system having
> hadoop cluster.I want to process data stored in HDFS through spark.
>
> When I am running code in eclipse it is giving the following warning
> repeatedly:
> scheduler.TaskSchedulerImpl: Initial job has not accepted any resources;
> chec
Hi All,
I am new to spark and have installed spark cluster over my system having
hadoop cluster.I want to process data stored in HDFS through spark.
When I am running code in eclipse it is giving the following warning
repeatedly:
scheduler.TaskSchedulerImpl: Initial job has not accepted any
tFile("tachyon://datanode8.bitauto.dmp:19998/apps/tachyon/adClick");
> Next,I just save this DataFrame onto HDFS with below code.It will generate
> 36 parquet files too,but the size of each file is about 265M
>
> tfs.repartition(36).saveAsParquetFile("/user/zhangxf/adClick
Thanks
From: Nick Pentreath [mailto:nick.pentre...@gmail.com]
Sent: Tuesday, April 07, 2015 5:52 PM
To: Puneet Kumar Ojha
Cc: user@spark.apache.org
Subject: Re: Difference between textFile Vs hadoopFile (textInoutFormat) on
HDFS data
There is no difference - textFile calls hadoopFile with a
ecutor, it
>> will lower the memory requirement, with running in a slower speed.
>>
>> Yong
>>
>> --
>> Date: Wed, 8 Apr 2015 04:57:22 +0800
>> Subject: Re: 'Java heap space' error occured when query 4G data file from
oncurrency of your executor, it
> will lower the memory requirement, with running in a slower speed.
>
> Yong
>
> --
> Date: Wed, 8 Apr 2015 04:57:22 +0800
> Subject: Re: 'Java heap space' error occured when query 4G data file from
&
ower the
cores for executor by set "-Dspark.deploy.defaultCores=". When you have not
enough memory, reduce the concurrency of your executor, it will lower the
memory requirement, with running in a slower speed.
Yong
Date: Wed, 8 Apr 2015 04:57:22 +0800
Subject: Re: 'Java heap space' error
Any help?please.
Help me do a right configure.
李铖 于2015年4月7日星期二写道:
> In my dev-test env .I have 3 virtual machines ,every machine have 12G
> memory,8 cpu core.
>
> Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not
> right.
>
> I run this command :*spark-submit --master yarn-
outFormat) when data is present in HDFS? Will there be any performance
> gain that can be observed?
> Puneet Kumar Ojha
> Data Architect | PubMatic<http://www.pubmatic.com/>
Hi ,
Is there any difference between Difference between textFile Vs hadoopFile
(textInoutFormat) when data is present in HDFS? Will there be any performance
gain that can be observed?
Puneet Kumar Ojha
Data Architect | PubMatic<http://www.pubmatic.com/>
In my dev-test env .I have 3 virtual machines ,every machine have 12G
memory,8 cpu core.
Here is spark-defaults.conf,and spark-env.sh.Maybe some config is not right.
I run this command :*spark-submit --master yarn-client --driver-memory 7g
--executor-memory 6g /home/hadoop/spark/main.py*
exceptio
In 1.3, you can use model.save(sc, "hdfs path"). You can check the
code examples here:
http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples.
-Xiangrui
On Fri, Apr 3, 2015 at 2:17 PM, Justin Yip wrote:
> Hello Zhou,
>
> You can look at the recomme
Hello Zhou,
You can look at the recommendation template
<http://templates.prediction.io/PredictionIO/template-scala-parallel-recommendation>
of PredictionIO. PredictionIO is built on the top of spark. And this
template illustrates how you can save the ALS model to HDFS and the reload
it
I am new to MLib so I have a basic question: is it possible to save MLlib
models (particularly CF models) to HDFS and then reload it later? If yes, could
u share some sample code (I could not find it in MLlib tutorial). Thanks!
/ byte[]. Review what
> you are writing since it is not BytesWritable / Text.
>
> On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers
> wrote:
> > I'm actually running this in a separate environment to our HDFS cluster.
> >
> > I think I've been able to sort out th
27;t Spark-specific; you do not have a
SequenceFile of byte[] / String, but of byte[] / byte[]. Review what
you are writing since it is not BytesWritable / Text.
On Thu, Apr 2, 2015 at 3:40 AM, Nick Travers wrote:
> I'm actually running this in a separate environment to our HDFS cluster.
>
I'm actually running this in a separate environment to our HDFS cluster.
I think I've been able to sort out the issue by copying
/opt/cloudera/parcels/CDH/lib to the machine I'm running this on (I'm just
using a one-worker setup at present) and adding the following to
s
o spark-env.sh
> (http://spark-env.sh) file, but still nothing.
>
> On Wed, Apr 1, 2015 at 7:19 PM, Xianjin YE (mailto:advance...@gmail.com)> wrote:
> > Can you read snappy compressed file in hdfs? Looks like the libsnappy.so
> > is not in the hadoop native lib path.
Apr 1, 2015 at 7:19 PM, Xianjin YE wrote:
> Can you read snappy compressed file in hdfs? Looks like the libsnappy.so
> is not in the hadoop native lib path.
>
> On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
>
> Has anyone else encountered the following error when
Can you read snappy compressed file in hdfs? Looks like the libsnappy.so is
not in the hadoop native lib path.
On Thursday, April 2, 2015 at 10:13 AM, Nick Travers wrote:
> Has anyone else encountered the following error when trying to read a snappy
> compressed sequence file fro
Has anyone else encountered the following error when trying to read a snappy
compressed sequence file from HDFS?
*java.lang.UnsatisfiedLinkError:
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z*
The following works for me when the file is uncompressed:
import org.apache.hadoop.io
memory) and the rest
>> >for regular mesos tasks?
>>
>> >This means, on each slave node I would have tachyon worker (+ hdfs
>> >configuration to talk to s3 or the hdfs datanode) and the mesos slave
>> ?process. Is this correct?
>>
>>
>>
>
>
> --
> --Sean
>
>
--
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/
gt;
> >This means, on each slave node I would have tachyon worker (+ hdfs
> >configuration to talk to s3 or the hdfs datanode) and the mesos slave
> ?process. Is this correct?
>
>
>
--
--Sean
total memory) and the rest
> for regular mesos tasks?
>
>
This depends on your machine spec and workload. The high level idea is to
give Tachyon the memory size equals to the total memory size of the machine
minus other processes' memory needs.
> This means, on each sla
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Haoyuan,
So on each mesos slave node I should allocate/section off some amount
of memory for tachyon (let's say 50% of the total memory) and the rest
for regular mesos tasks?
This means, on each slave node I would have tachyon worker (+
s deployment. I can't seem to figure out the "best
> practices" around HDFS and Tachyon. The documentation about Spark's
> data-locality section seems to point that each of my mesos slave nodes
> should also run a hdfs datanode. This seems fine but I can't seem to
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I am fairly new to the spark ecosystem and I have been trying to setup
a spark on mesos deployment. I can't seem to figure out the "best
practices" around HDFS and Tachyon. The documentation about Spark's
data-locality section
Try running it like this:
sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi
--deploy-mode cluster --master yarn
hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10
Caveats:
1) Make sure the permissions of /user/nick is 775 or 777.
2) No need for
Client mode would not support HDFS jar extraction.
I tried this:
sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi
--deploy-mode cluster --master yarn
hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10
And it worked.
--
View this message in context
I think the jar file has to be local. In HDFS is not supported yet in Spark.
See this answer:
http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs
> Date: Sun, 29 Mar 2015 22:34:46 -0700
> From: n.e.trav...@gmail.com
> To: user@spark.a
What happens when you do:
sc.textFile("hdfs://path/to/the_file.txt")
Thanks
Best Regards
On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers
wrote:
> Hi List,
>
> I'm following this example here
> <
> https://github.com/databricks/learning-spark/tree/master/min
\
--class com.oreilly.learningsparkexamples.mini.scala.WordCount \
hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
\
hdfs://host.domain.ex/user/nickt/linkage
hdfs://host.domain.ex/user/nickt/wordcounts
The jar is submitted fine and I can see it appear on the driver node (i.e.
connecting to and reading f
Made it work by using yarn-cluster as master instead of local.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-tp21840p22281.html
Sent from the Apache Spark User List mailing list archive at
Looking at SparkSubmit#addJarToClasspath():
uri.getScheme match {
case "file" | "local" =>
...
case _ =>
printWarning(s"Skip remote jar $uri.")
It seems hdfs scheme is not recognized.
FYI
On Thu, Feb 26, 2015 at 6:09 PM, dilm
Hi, did you resolve this issue or just work around it be keeping your
application jar local? Running into the same issue with 1.3.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-tp21840p22272.html
>From the standpoint of Spark SQL accessing the files - when it is hitting
Hive, it is in effect hitting HDFS as well. Hive provides a great
framework where the table structure is already well defined.But
underneath it, Hive is just accessing files from HDFS so you are hitting
HDFS either
That's a hadoop version incompatibility issue, you need to make sure
everything runs on the same version.
Thanks
Best Regards
On Sat, Mar 21, 2015 at 1:24 AM, morfious902002 wrote:
> Hi,
> I created a cluster using spark-ec2 script. But it installs HDFS version
> 1.0. I would li
Hi,
I created a cluster using spark-ec2 script. But it installs HDFS version
1.0. I would like to use this cluster to connect to HIVE installed on a
cloudera CDH 5.3 cluster. But I am getting the following error:-
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate
I am trying to explain that these are not either/or decisions. You are
likely going to be storing the data on HDFS no matter what other choices
you make.
You can use parquet to store the data whether or not you are addressing
files directly on HDFS or using the Hive Metastore to locate the
Did you mean that parquet is faster than hive format ,and hive format is
faster than hdfs ,for Spark SQL?
: )
2015-03-18 1:23 GMT+08:00 Michael Armbrust :
> The performance has more to do with the particular format you are using,
> not where the metadata is coming from. Even hive tabl
The performance has more to do with the particular format you are using,
not where the metadata is coming from. Even hive tables are read from
files HDFS usually.
You probably should use HiveContext as its query language is more powerful
than SQLContext. Also, parquet is usually the faster
This has been fixed by https://github.com/apache/spark/pull/5020
On 3/18/15 12:24 AM, Franz Graf wrote:
Hi all,
today we tested Spark 1.3.0.
Everything went pretty fine except that I seem to be unable to save an
RDD as parquet to HDFS.
A minimum example is:
import sqlContext.implicits
Hi all,
today we tested Spark 1.3.0.
Everything went pretty fine except that I seem to be unable to save an
RDD as parquet to HDFS.
A minimum example is:
import sqlContext.implicits._
// Reading works fine!
val foo: RDD[String] = spark.textFile("hdfs://")
// this work
Hi,everybody.
I am new in spark. Now I want to do interactive sql query using spark sql.
spark sql can run under hive or loading files from hdfs.
Which is better or faster?
Thanks.
Hi,everybody.
I am new in spark. Now I want to do interactive sql query using spark sql.
spark sql can run under hive or loading files from hdfs.
Which is better or faster?
Thanks.
All,
Does anyone have any reference to a publication or other, informal sources
(blogs, notes), showing
performance of Spark on HDFS vs. other shared (Lustre, etc.) or other file
system (NFS).
I need this for formal performance research.
We are currently doing a research into this on a very
You can add resolvers on SBT using
resolvers +=
"Sonatype OSS Snapshots" at
"https://oss.sonatype.org/content/repositories/snapshots";
On Thu, Feb 26, 2015 at 4:09 PM, MEETHU MATHEW
wrote:
> Hi,
>
> I am not able to read from HDFS(Intel distribution hadoop,Ha
Hello,
I'm new to ec2. I've set up a spark cluster on ec2 and am using
persistent-hdfs with the data nodes mounting ebs. I launched my cluster
using spot-instances
./spark-ec2 -k mykeypair -i ~/aws/mykeypair.pem -t m3.xlarge -s 4 -z
us-east-1c --spark-version=1.2.0 --spot-price=.032
ng data to Spark from NFS v HDFS?
>
As I understand it, one performance advantage of using HDFS is that the
task will be computed at a cluster node that has data on its local disk
already, so the tasks go to where the data is. In the case of NFS, all data
must be downloaded from the file server(
Hello,
I understand Spark can be used with Hadoop or standalone. I have certain
questions related to use of the correct FS for Spark data.
What is the efficiency trade-off in feeding data to Spark from NFS v HDFS?
If one is not using Hadoop, is it still usual to house data in HDFS for
Spark to
I'm trying to run a spark application using bin/spark-submit. When I
reference my application jar inside my local filesystem, it works. However,
when I copied my application jar to a directory in hdfs, i get the following
exception:
Warning: Skip remote jar
hdfs://localhost:9000/user/hdfs
Hi,
I am not able to read from HDFS(Intel distribution hadoop,Hadoop version is
1.0.3) from spark-shell(spark version is 1.2.1). I built spark using the
commandmvn -Dhadoop.version=1.0.3 clean package and started spark-shell and
read a HDFS file using sc.textFile() and the exception is
WARN
There was already a thread around it if i understood your question
correctly, you can go through this
https://mail-archives.apache.org/mod_mbox/spark-user/201502.mbox/%3ccannjawtrp0nd3odz-5-_ya351rin81q-9+f2u-qn+vruqy+...@mail.gmail.com%3E
Thanks
Best Regards
On Thu, Feb 19, 2015 at 8:16 PM, Chic
Hi all,
In Spark Streaming I want use the Dstream.saveAsTextFiles by bulk writing
because of the normal saveAsTextFiles cannot during the batch interval of
setting.
May be a common pool of writing or another assigned worker for bulk writing?
Thanks!
B/R
Jichao
;> and then pass that as the final argument.
>>
>> On Wed, Feb 11, 2015 at 6:35 AM, Akhil Das
>> wrote:
>> > Did you try :
>> >
>> > temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
>> > String.class,(Class)
>>
Feb 11, 2015 at 6:35 AM, Akhil Das
> wrote:
> > Did you try :
> >
> > temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
> String.class,(Class)
> > TextOutputFormat.class);
> >
> > Thanks
> > Best Regards
> >
> > O
Looks like this is caused by issue SPARK-5008:
https://issues.apache.org/jira/browse/SPARK-5008
On 13 February 2015 at 19:04, Joe Wass wrote:
> I've updated to Spark 1.2.0 and the EC2 and the persistent-hdfs behaviour
> appears to have changed.
>
> My launch script is
&g
wal/blinkdb) which
> seems to work only with Spark 0.9. However, if I want to access HDFS I need
> to compile Spark against Hadoop version which is running on my
> cluster(2.6.0). Hence, the versions problem ...
>
>
>
> On Friday, February 13, 2015 11:28 AM, Sean Owen wrote:
&g
I am trying to run BlinkDB(https://github.com/sameeragarwal/blinkdb) which
seems to work only with Spark 0.9. However, if I want to access HDFS I need to
compile Spark against Hadoop version which is running on my cluster(2.6.0).
Hence, the versions problem ...
On Friday, February 13
015 at 7:13 PM, Grandl Robert
wrote:
> Hi guys,
>
> Probably a dummy question. Do you know how to compile Spark 0.9 to easily
> integrate with HDFS 2.6.0 ?
>
> I was trying
> sbt/sbt -Pyarn -Phadoop-2.6 assembly
> or
> mvn -Dhadoop.version=2.6.0 -DskipTests clean package
>
> but none of these approaches succeeded.
>
> Thanks,
> Robert
901 - 1000 of 1462 matches
Mail list logo