Did you know that simple demo program of reading characters from file didn't
work ?
Who wrote that simple hello world type little program ?
jane thorpe
janethor...@aol.com
-Original Message-
From: jane thorpe
To: somplasticllc ; user
Sent: Fri, 3 Apr 2020 2:44
Subject: Re: HDF
>
> jane thorpe
> janethor...@aol.com
>
>
> -Original Message-
> From: jane thorpe
> To: somplasticllc ; user
> Sent: Fri, 3 Apr 2020 2:44
> Subject: Re: HDFS file hdfs://
> 127.0.0.1:9000/hdfs/spark/examples/README.txt
>
>
> Thanks darling
>
>
: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt
Thanks darling
I tried this and worked
hdfs getconf -confKey fs.defaultFS
hdfs://localhost:9000
scala> :paste
// Entering paste mode (ctrl-D to finish)
val textFile =
sc.textFile("hdfs://127.0.0.1:9000/hdfs/spark/
0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at
textFile at :27
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at
reduceByKey at :30
scala> :quit
jane thorpe
janethor...@aol.com
-Original Message-
From: Som Lima
CC: user
Sent: Tue, 31 Mar 2020
Hi Jane
Try this example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
Som
On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote:
> hi,
>
> Are there setup instructions on the website for
>
hi,
Are there setup instructions on the website for
spark-3.0.0-preview2-bin-hadoop2.7I can run same program for hdfs format
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ +
Hi,
when I create a dataset by reading a json file from hdfs ,I found the partition
number of the dataset not equals to the file blocks,
so what define the partition number of the dataset when I read file from hdfs ?
Hello Experts,
I am required to use a specific user id to save files on a remote hdfs
cluster. Remote in the sense, spark jobs run on EMR and write to a CDH
cluster. Hence I cannot change the hdfs-site.xml etc to point to the
destination cluster. As a result I am using webhdfs to save the files
Hi All,
object Step1 {
def main(args: Array[String]) = {
val sparkConf = new SparkConf().setAppName("my-app")
val sc = new SparkContext(sparkConf)
val hiveSqlContext: HiveContext = new
org.apache.spark.sql.hive.HiveContext(sc)
Hi Ayan,
, thanks for the explanation,
I am aware of compression codecs.
How does locality level set?
Is it done by Spark or yarn?
Please let me know,
Thanks,
Yesh
On Nov 22, 2016 5:13 PM, "ayan guha" wrote:
Hi
RACK_LOCAL = Task running on the same rack but not on
Hi
RACK_LOCAL = Task running on the same rack but not on the same node where
data is
NODE_LOCAL = task and data is co-located. Probably you were looking for
this one?
GZIP - Read is through GZIP codec, but because it is non-splittable, so you
can have atmost 1 task reading a gzip file. Now, the
Hi Ayan,
we have default rack topology.
-Yeshwanth
Can you Imagine what I would do if I could do all I can - Art of War
On Tue, Nov 22, 2016 at 6:37 AM, ayan guha wrote:
> Because snappy is not splittable, so single task makes sense.
>
> Are sure about rack topology?
Because snappy is not splittable, so single task makes sense.
Are sure about rack topology? Ie 225 is in a different rack than 227 or
228? What does your topology file says?
On 22 Nov 2016 10:14, "yeshwanth kumar" wrote:
> Thanks for your reply,
>
> i can definitely
Thanks for your reply,
i can definitely change the underlying compression format.
but i am trying to understand the Locality Level,
why executor ran on a different node, where the blocks are not present,
when Locality Level is RACK_LOCAL
can you shed some light on this.
Thanks,
Yesh
Use as a format orc, parquet or avro because they support any compression type
with parallel processing. Alternatively split your file in several smaller
ones. Another alternative would be bzip2 (but slower in general) or Lzo
(usually it is not included by default in many distributions).
> On
Try changing compression to bzip2 or lzo. For reference -
http://comphadoop.weebly.com
Thanks,
Aniket
On Mon, Nov 21, 2016, 10:18 PM yeshwanth kumar
wrote:
> Hi,
>
> we are running Hive on Spark, we have an external table over snappy
> compressed csv file of size 917.4 M
Hi,
we are running Hive on Spark, we have an external table over snappy
compressed csv file of size 917.4 M
HDFS block size is set to 256 MB
as per my Understanding, if i run a query over that external table , it
should launch 4 tasks. one for each block.
but i am seeing one executor and one
Hi,
To be new here, I hope to get assistant from you guys. I wonder whether I have
some elegant way to get some directory under some path. For example, I have a
path like on hfs /a/b/c/d/e/f, and I am given a/b/c, is there any straight
forward way to get the path /a/b/c/d/e . I think I can do
hey u can use repartition and set it to 1
as in this example
unionDStream.foreachRDD((rdd, time) => {
val count = rdd.count()
println("count" + count)
if (count > 0) {
print("rdd partition=" + rdd.partitions.length)
val outputRDD =
Hi All,
I am working with Kafka, Spark Streaming and I want to write the
streaming output to a single file. dstream.saveAsTexFiles() is creating
files in different folders. Is there a way to write to a single folder ? or
else if written to different folders, how do I merge them ?
Thanks,
Padma
I am having issues setting up my spark environment to read from a
kerberized HDFS file location.
At the moment I have tried to do the following:
def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match
{
case None => code
case Some(u) => u
I am having issues setting up my spark environment to read from a
kerberized HDFS file location.
At the moment I have tried to do the following:
def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match
{
case None => code
case Some(u) => u
On the line preceding the one that the compiler is complaining about (which
doesn't actually have a problem in itself), you declare df as
"df"+fileName, making it a string. Then you try to assign a DataFrame to
df, but it's already a string. I don't quite understand your intent with
that previous
Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory
My code looks like
> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
>
Hi all,
I have calculated a covariance?? it's a Matrix type ,now i want to save
the result to hdfs, how can i do it?
thx
Hi all,
I have calculated a covariance?? it's a Matrix type ,now i want to save
the result to hdfs, how can i do it?
thx
Matrix can be save as column of type MatrixUDT.
Hi Yanbo,
I'm using java language and the environment is spark 1.4.1.
Can u tell me how to do it more detail , the follows is my code, how can
i save the cov to hdfs file ?
"
RowMatrix mat = new RowMatrix(rows.rdd());
Matrix cov = mat.computeCovar
Hi,
Is there a way to read a text file from inside a spark executor? I need to
do this for an streaming application where we need to read a file(whose
contents would change) from a closure.
I cannot use the "sc.textFile" method since spark context is not
serializable. I also cannot read a file
You will need to use the HDFS API to do that.
Try something like:
val conf = sc.hadoopConfiguration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)
fs.rename(new org.apache.hadoop.fs.Path("/path/on/hdfs/file.txt"), new
org.apache.hadoop.fs.Path("/path/on/hdfs/other/file.txt"))
Full API for
For some file on hdfs, it is necessary to copy/move it to some another specific
hdfs directory, and the directory name would keep unchanged.Just need finish
it in spark program, but not hdfs commands.Is there any codes, it seems not to
be done by searching spark doc ...
Thanks in advance!
My guess is No, unless you are okay to read the data and write it back
again.
On Tue, Jan 5, 2016 at 2:07 PM, Zhiliang Zhu
wrote:
>
> For some file on hdfs, it is necessary to copy/move it to some another
> specific hdfs directory, and the directory name would keep
Hi!
I configured log4j.properties file in conf folder of spark with following
values...
log4j.appender.file.File=hdfs://
I expected all log files to log output to the file in HDFS.
Instead files are created locally.
Has anybody tried logging to HDFS by configuring log4j.properties?
Warm
This would require a special HDFS log4j appender. Alternatively try the flume
log4j appender
> On 08 Dec 2015, at 13:00, sunil m <260885smanik...@gmail.com> wrote:
>
> Hi!
> I configured log4j.properties file in conf folder of spark with following
> values...
>
>
directory after the program started running but never got any
output.
I even passed a non existent directory as the input to the textFileStream
but the application did not throw any error and ran just like it did when i
had the right directory.
I am able to access the same HDFS file system from non
at 10:39 PM, Matan Safriel dev.ma...@gmail.com
wrote:
Hi,
Is it possible to append to an existing (hdfs) file, through some Spark
action?
Should there be any reason not to use a hadoop append api within a Spark
job?
Thanks,
Matan
Hi,
Is it possible to append to an existing (hdfs) file, through some Spark
action?
Should there be any reason not to use a hadoop append api within a Spark
job?
Thanks,
Matan
is
that RDDs are immutable and so their input and output is naturally
immutable, not mutable.
On Wed, Jan 28, 2015 at 10:39 PM, Matan Safriel dev.ma...@gmail.com wrote:
Hi,
Is it possible to append to an existing (hdfs) file, through some Spark
action?
Should there be any reason not to use a hadoop
Hi, allI wonder how to delete hdfs file/directory using spark API?
wonder how to delete hdfs file/directory using spark API?
wrote:
Hi, all
I wonder how to delete hdfs file/directory using spark API?
Hi Devies.
Thank you for the quick answer.
I have a code like this:
sc = SparkContext(appName=TAD)
lines = sc.textFile(sys.argv[1], 1)
result = lines.map(doSplit).groupByKey().map(lambda (k,vc):
traffic_process_model(k,vc))
result.saveAsTextFile(sys.argv[2])
Can you please give short
Hi,
I am trying to read a HDFS file from Spark scheduler code. I could find
how to write hdfs read/writes in java.
But I need to access hdfs from spark using scala. Can someone please help
me in this regard.
like this?
val file = sc.textFile(hdfs://localhost:9000/sigmoid/input.txt)
Thanks
Best Regards
On Fri, Nov 14, 2014 at 9:02 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
I am trying to read a HDFS file from Spark scheduler code. I could find
how to write hdfs read/writes in java
trying to read a HDFS file from Spark scheduler code. I could
find how to write hdfs read/writes in java.
But I need to access hdfs from spark using scala. Can someone please
help me in this regard.
...@gmail.com]
*Sent:* Friday, November 14, 2014 9:42 AM
*To:* Akhil Das; user@spark.apache.org
*Subject:* Re: Read a HDFS file from Spark using HDFS API
No. I am not accessing hdfs from either shell or a spark application. I
want to access from spark Scheduler code.
I face an error when I
@spark.apache.org
*Subject:* Re: Read a HDFS file from Spark using HDFS API
No. I am not accessing hdfs from either shell or a spark application. I
want to access from spark Scheduler code.
I face an error when I use sc.textFile() as SparkContext wouldn't have
been created yet. So, error says: sc
*From:* rapelly kartheek [mailto:kartheek.m...@gmail.com]
*Sent:* Friday, November 14, 2014 9:42 AM
*To:* Akhil Das; user@spark.apache.org
*Subject:* Re: Read a HDFS file from Spark using HDFS API
No. I am not accessing hdfs from either shell or a spark application. I
want to access from
Hi ,
I am running pyspark job.
I need serialize final result to *hdfs in binary files* and having ability
to give a *name for output files*.
I found this post:
http://stackoverflow.com/questions/25293962/specifying-the-output-file-name-in-apache-spark
but it explains how to do it using scala.
One option maybe call HDFS tools or client to rename them after saveAsXXXFile().
On Thu, Nov 13, 2014 at 9:39 PM, Oleg Ruchovets oruchov...@gmail.com wrote:
Hi ,
I am running pyspark job.
I need serialize final result to hdfs in binary files and having ability to
give a name for output
Hi
I am trying to access a file in HDFS from spark source code. Basically, I
am tweaking the spark source code. I need to access a file in HDFS from the
source code of the spark. I am really not understanding how to go about
doing this.
Can someone please help me out in this regard.
Thank you!!
Instead of a file path, use a HDFS URI.
For example: (In Python)
data = sc.textFile(hdfs://localhost/user/someuser/data)
On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi
I am trying to access a file in HDFS from spark source code. Basically,
I am
Hi Sean,
I was following this link;
http://mund-consulting.com/Blog/Posts/file-operations-in-HDFS-using-java.aspx
But, I was facing FileSystem ambiguity error. I really don't have any idea
as to how to go about doing this.
Can you please help me how to start off with this?
On Wed, Nov 12, 2014
Hi Simon,
I'm trying to do the same but I'm quite lost.
How did you do that? (Too direct? :)
Thanks and ciao,
r-
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html
Sent from the Apache Spark User List mailing
-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
an integer index and an
iterator to another iterator. How does that help with retrieving the hdfs
file name?
I am obviously missing some context..
Thanks.
On May 30, 2014 1:28 AM, Aaron Davidson ilike...@gmail.com wrote:
Currently there is not a way to do this using textFile(). However, you
I don't quite get it..
mapPartitionWithIndex takes a function that maps an integer index and an
iterator to another iterator. How does that help with retrieving the hdfs
file name?
I am obviously missing some context..
Thanks.
On May 30, 2014 1:28 AM, Aaron Davidson ilike...@gmail.com wrote
Hello,
A quick question about using spark to parse text-format CSV files stored on
hdfs.
I have something very simple:
sc.textFile(hdfs://test/path/*).map(line = line.split(,)).map(p =
(XXX, p[0], p[2]))
Here, I want to replace XXX with a string, which is the current csv
filename for the line.
Currently there is not a way to do this using textFile(). However, you
could pretty straightforwardly define your own subclass of HadoopRDD [1] in
order to get access to this information (likely using
mapPartitionsWithIndex to look up the InputSplit for a particular
partition).
Note that
dependency to 0.18.0 in spark's
pom.xml.
Rebuilding the JAR with this configuration solves the issue.
-Anant
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1
pom.xml.
Rebuilding the JAR with this configuration solves the issue.
-Anant
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p4286.html
Sent from the Apache Spark User
://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Wisely,
Could you please post your pom.xml here.
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html
Sent from the Apache Spark User List
Egor, i encounter the same problem which you have asked in this thread:
http://mail-archives.apache.org/mod_mbox/spark-user/201402.mbox/%3CCAMrx5DwJVJS0g_FE7_2qwMu4Xf0y5VfV=tlyauv2kh5v4k6...@mail.gmail.com%3E
have you fixed this problem?
i am using shark to read a table which i have created on
Starting with Spark 0.9 the protobuf dependency we use is shaded and
cannot interfere with other protobuf libaries including those in
Hadoop. Not sure what's going on in this case. Would someone who is
having this problem post exactly how they are building spark?
- Patrick
On Fri, Mar 21, 2014
On Tue, Mar 18, 2014 at 12:56 PM, Ognen Duzlevski
og...@plainvanillagames.com wrote:
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote:
On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote:
Is there a reason for spark using the older akka?
On Sun, Mar 2, 2014 at 1:53 PM, 1esha
Check this thread out,
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p2807.html
-- you have to remove conflicting akka and protbuf versions.
Thanks
Prasad.
--
View this message in context
.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p2217.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Is the solution to exclude the 2.4.*. dependency on protobuf or will thi
produce more
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote:
On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote:
Is there a reason for spark using the older akka?
On Sun, Mar 2, 2014 at 1:53 PM, 1esha alexey.r...@gmail.com wrote:
The problem is in akka remote. It contains files compiled
error while reading HDFS file using spark
0.9.0 -- i am running on hadoop 2.2.0 .
When i look thru, i find that i have both 2.4.1 and 2.5 and some blogs
suggest that there is some incompatability issues betwen 2.4.1 and 2.5
hduser@prasadHdp1:~/spark-0.9.0-incubating$ find ~/ -name
protobuf
, Ognen Duzlevski
og...@plainvanillagames.com wrote:
A stupid question, by the way, you did compile Spark with Hadoop 2.2.0
support?
Ognen
On 2/28/14, 10:51 AM, Prasad wrote:
Hi
I am getting the protobuf error while reading HDFS file using spark
0.9.0 -- i am running on hadoop 2.2.0
71 matches
Mail list logo