My spark driver program reads multiple images from hdfs and searches for a
particular image using image name. If it finds the image, It converts the
received byte array of the image back to its original form. But the image I
get on conversion is showing corrupted image. I am using ImageSchema to
>From my experience in spark, when working on hdfs data base, spark reads data
in form of records and does computation on every record as soon as it reads
it. I have multiple images as my data on hdfs, where each image is a record.
I want spark to read multiple records before doing any
Hi,
I am working with Apache Spark 2.3.2, implementing an image grep application
using Scala 2.11. I am reading images from HDFS using ImageSchema package.
The series of step I run are:
1. import org.apache.spark.ml.image.ImageSchema
2. val df = ImageSchema.readImages("hdfs://filepath/*") // all
I am running a grep application on spark 2.3.4 and scala version 2.11. I have
an input textfile of 813MB stored on a remote source (not a part of spark
infrastructure) using hdfs. My application just reads the textfile line by
line from hdfs server and filters for a given keyword in each line and
It is possible that the Application Master is not getting started. Try
increasing the memory limit of the application master in yarn-site.xml or in
capacity-scheduler if you have it configured.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Then are the mlib of spark compatible with scala 2.12? Or can I change the
spark version from spark3.0 to 2.3 or 2.4 in local spark/master?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe
I am trying to build my spark using build/sbt package, after changing the
scala versions to 2.11 in pom.xml because my applications jar files use
scala 2.11. But building the spark code gives an error in sql saying "A
method with a varargs annotation produces a forwarder method with the same
Is there a way I can log even the milliseconds elapsed apart from HH:mm::ss
for every logged timestamps of spark?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Hi,
In Spark source code, Hadoop.scala (in RDD). Spark updates the information
of total bytes read after every 1000 records. Displaying the bytes read
along side the update function it shows 65536. Even if I change the code to
update bytes read after every record it, it still shows 65536 multiple
Hey, I am working with spark source code. I am printing logs within the code
to understand how hadoopRDD works. I wan't to print a timestamp when
executor first reads the textFile RDD (input source(URL) in form of hdfs). I
tried to print some logs in executor.scala but they do not display on the
I was working with custom spark listener library. There, I am not able to
figure out a way to break into the details of task. I only have a listener
which runs on task start, But I want to calculate the time my executor took
to read input data from remote data source for that task, but as spark
I am running a model where the workers should not have the data stored in
them. They are only for execution purpose. The other cluster (its just a
single node) which I am receiving data from is just acting as a file server,
for which I could have used any other way like nfs or ftp. So I went with
Hi,
I am new to spark. I am running a hdfs file system on a remote cluster
whereas my spark workers are on another cluster. When my textFile RDD gets
executed, does spark worker read from the file according to hdfs partitions
task by task, or do they read it once when the blockmanager sets after
13 matches
Mail list logo