ng like this
>
> import org.apache.spark.TaskContext
> ds.map(r => {
> val taskContext = TaskContext.get()
> if (taskContext.partitionId == 1000) {
> throw new RuntimeException
> }
> r
> })
>
> On Mon, Feb 11, 2019 at 8:41 AM Serega Sheypak
> wrote:
>
I need to crash task which does repartition.
пн, 11 февр. 2019 г. в 10:37, Gabor Somogyi :
> What blocks you to put if conditions inside the mentioned map function?
>
> On Mon, Feb 11, 2019 at 10:31 AM Serega Sheypak
> wrote:
>
>> Yeah, but I don't need to crash entir
DS.map(_ / 0).writeStream.format("console").start()
>
> G
>
>
> On Sun, Feb 10, 2019 at 9:36 PM Serega Sheypak
> wrote:
>
>> Hi BR,
>> thanks for your reply. I want to mimic the issue and kill tasks at a
>> certain stage. Killing executor is also an
> G
>
>
> On Sun, Feb 10, 2019 at 4:19 PM Jörn Franke wrote:
>
>> yarn application -kill applicationid ?
>>
>> > Am 10.02.2019 um 13:30 schrieb Serega Sheypak > >:
>> >
>> > Hi there!
>> > I have weird issue that appears only when
Hi there!
I have weird issue that appears only when tasks fail at specific stage. I
would like to imitate failure on my own.
The plan is to run problematic app and then kill entire executor or some
tasks when execution reaches certain stage.
Is it do-able?
Hi, I have spark job that produces duplicates when one or tasks from
repartition stage fails.
Here is simplified code.
sparkContext.setCheckpointDir("hdfs://path-to-checkpoint-dir")
*val *inputRDDs: List[RDD[String]] = *List*.*empty *// an RDD per input dir
*val *updatedRDDs = inputRDDs.map{
ng that issue)
>
> On Tue, Jan 22, 2019 at 6:09 AM Jörn Franke wrote:
>
>> You can try with Yarn node labels:
>>
>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
>>
>> Then you can whitelist nodes.
>>
>> A
Hi Apiros, thanks for your reply.
Is it this one: https://github.com/apache/spark/pull/23223 ?
Can I try to reach you through Cloudera Support portal?
пн, 21 янв. 2019 г. в 20:06, attilapiros :
> Hello, I was working on this area last year (I have developed the
> YarnAllocatorBlacklistTracker)
pulate such a
> blacklist.
>
> If you can change yarn config, the equivalent is node label:
> https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
>
>
>
> --
> *From:* Li Gao
> *Sent:* Saturday, January 19, 2019 8:43 AM
> *T
Hi, is there any possibility to tell Scheduler to blacklist specific nodes
in advance?
Hi, I'm running spark on YARN. My code is very simple. I want to kill one
executor when "data.repartition(10)" is executed. Ho can I do it in easy
way?
val data = sc.sequenceFile[NullWritable, BytesWritable](inputPath)
.map { case (key, value) =>
Data.fromBytes(value)
}
process =
Ok, this one works:
.withColumn("hour", hour(from_unixtime(typedDataset.col("ts") / 1000)))
2018-03-20 22:43 GMT+01:00 Serega Sheypak <serega.shey...@gmail.com>:
> Hi, any updates? Looks like some API inconsistency or bug..?
>
> 2018-03-17 13:09 GM
Hi, any updates? Looks like some API inconsistency or bug..?
2018-03-17 13:09 GMT+01:00 Serega Sheypak <serega.shey...@gmail.com>:
> > Not sure why you are dividing by 1000. from_unixtime expects a long type
> It expects seconds, I have milliseconds.
>
>
>
> 2018-03-1
19 13:41 GMT+01:00 Jörn Franke <jornfra...@gmail.com>:
> Maybe you should better run it in yarn cluster mode. Yarn client would
> start the driver on the oozie server.
>
> On 19. Mar 2018, at 12:58, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
> I'm
> Jacek
>
> On 19 Mar 2018 00:20, "Serega Sheypak" <serega.shey...@gmail.com> wrote:
>
>> Hi, Is it even possible to run spark on yarn as usual java application?
>> I've built jat using maven with spark-yarn dependency and I manually
>> populate S
Hi, Is it even possible to run spark on yarn as usual java application?
I've built jat using maven with spark-yarn dependency and I manually
populate SparkConf with all hadoop properties.
SparkContext fails to start with exception:
1. Caused by: java.lang.IllegalStateException: Library
ion
>
>
> I guess it's managed by
>
> job.getConfiguration.set(DATASOURCE_WRITEJOBUUID, uniqueWriteJobId.toString)
>
>
> On 17 March 2018 at 20:46, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
>> Hi Denis, great to see you here :)
>> It works, thanks!
>&
com>:
> Hello Serega,
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Please try SaveMode.Append option. Does it work for you?
>
>
> сб, 17 мар. 2018 г., 15:19 Serega Sheypak <serega.shey...@gmail.com>:
>
>> Hi, I', using spark-sql t
Hi, I', using spark-sql to process my data and store result as parquet
partitioned by several columns
ds.write
.partitionBy("year", "month", "day", "hour", "workflowId")
.parquet("/here/is/my/dir")
I want to run more jobs that will produce new partitions or add more files
to existing
> Not sure why you are dividing by 1000. from_unixtime expects a long type
It expects seconds, I have milliseconds.
2018-03-12 6:16 GMT+01:00 vermanurag :
> Not sure why you are dividing by 1000. from_unixtime expects a long type
> which is time in milliseconds
hi, desperately trying to extract hour from unix seconds
year, month, dayofmonth functions work as expected.
hour function always returns 0.
val ds = dataset
.withColumn("year", year(to_date(from_unixtime(dataset.col("ts") / 1000
.withColumn("month",
Hi, did anyone try to implement Spark SQL dataset reader from SEQ file with
protobuf inside to Dataset?
Imagine I have protobuf def
Person
- name: String
- lastName: String
- phones: List[String]
and generated scala case class:
case class Person(name:String, lastName: String, phones:
nappy-vs-lzf-vs-zlib-a-comparison-of
>
> performance of snappy and lzf were on-par to each other.
>
> Maybe lzf has lower memory requirement.
>
> On Wed, May 18, 2016 at 7:22 AM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
>> Switching from snappy to l
Switching from snappy to lzf helped me:
*spark.io.compression.codec=lzf*
Do you know why? :) I can't find exact explanation...
2016-05-18 15:41 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:
> Please increase the number of partitions.
>
> Cheers
>
> On Wed, May 18, 2016 at 4:
Hi, please have a look at log snippet:
16/05/18 03:27:16 INFO spark.MapOutputTrackerWorker: Doing the fetch;
tracker endpoint =
NettyRpcEndpointRef(spark://mapoutputtrac...@xxx.xxx.xxx.xxx:38128)
16/05/18 03:27:16 INFO spark.MapOutputTrackerWorker: Got the output
locations
16/05/18 03:27:16 INFO
sname \ //location of your main classname
> --master yarn \
> --deploy-mode cluster \
> /home/hadoop/SparkSampleProgram.jar //location of your jar file
>
> Thanks
> Raj
>
>
>
> Sent from Yahoo Mail. Get the app <https://yho.com/148vdq>
>
>
> On Tuesday,
spark-submit --conf "spark.driver.userClassPathFirst=true" --class
com.MyClass --master yarn --deploy-mode client --jars
hdfs:///my-lib.jar,hdfs:///my-seocnd-lib.jar jar-wth-com-MyClass.jar
job_params
2016-05-17 15:41 GMT+02:00 Serega Sheypak <serega.shey...@gmail.c
https://issues.apache.org/jira/browse/SPARK-10643
Looks like it's the reason...
2016-05-17 15:31 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>:
> No, and it looks like a problem.
>
> 2.2. --master yarn --deploy-mode client
> means:
> 1. submit spark as yarn
t; On Tue, May 17, 2016 at 8:33 PM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
>> hi, I'm trying to:
>> 1. upload my app jar files to HDFS
>> 2. run spark-submit with:
>> 2.1. --master yarn --deploy-mode cluster
>> or
>> 2.2. --master yarn -
hi, I'm trying to:
1. upload my app jar files to HDFS
2. run spark-submit with:
2.1. --master yarn --deploy-mode cluster
or
2.2. --master yarn --deploy-mode client
specifying --jars hdfs:///my/home/commons.jar,hdfs:///my/home/super.jar
When spark job is submitted, SparkSubmit client outputs:
and this particular table is causing issues or are you trying to
figure out the right way to do a read).
What version of Spark and Cassandra-connector are you using?
Also, what do you get for select count(*) from foo -- is that just as
bad?
On Wed, Jun 17, 2015 at 4:37 AM, Serega Sheypak serega.shey
Hi, can somebody suggest me the way to reduce quantity of task?
2015-06-15 18:26 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:
Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, Each
of them has spark worker.
The problem is that spark runs 869 task to read 3 lines
Hi, spark-sql estimated input for Cassandra table with 3 rows as 8 TB.
sometimes it's estimated as -167B.
I run it on laptop, I don't have 8 TB space for the data.
, 2015 at 4:37 AM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi, can somebody suggest me the way to reduce quantity of task?
2015-06-15 18:26 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:
Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes,
Each of them has spark
Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, Each
of them has spark worker.
The problem is that spark runs 869 task to read 3 lines: select bar from
foo.
I've tried these properties:
#try to avoid 769 tasks per dummy select foo from bar qeury
The memory leak could be related to this
https://issues.apache.org/jira/browse/SPARK-5967 defect that was resolved
in Spark 1.2.2 and 1.3.0.
@Sean
Will it be backported to CDH? I did't find that bug in CDH 5.4 release
notes.
2015-04-29 14:51 GMT+02:00 Conor Fennell conor.fenn...@altocloud.com:
(), it doesn't know that those applications have been stopped.
Note that in spark 1.3, the history server can also display running
applications (including completed applications, but that it thinks are
still running), which improves things a little bit.
On Fri, Apr 17, 2015 at 10:13 AM, Serega
Here is related problem:
http://apache-spark-user-list.1001560.n3.nabble.com/Launching-history-server-problem-td12574.html
but no answer.
What I'm trying to do: wrap spark-history with /etc/init.d script
Problems I have: can't make it read spark-defaults.conf
I've put this file here:
38 matches
Mail list logo