, 2017 at 2:07 PM, Pradeep Gollakota
wrote:
> Usually this kind of thing can be done at a lower level in the InputFormat
> usually by specifying the max split size. Have you looked into that
> possibility with your InputFormat?
>
> On Sun, Jan 15, 2017 at 9:42 PM, Fei Hu wrote:
don’t think
> RDD number of partitions will be increased.
>
>
>
> Thanks,
>
> Jasbir
>
>
>
> *From:* Fei Hu [mailto:hufe...@gmail.com]
> *Sent:* Sunday, January 15, 2017 10:10 PM
> *To:* zouz...@cs.toronto.edu
> *Cc:* user @spark ; d...@spark.apache.org
>
CoalescedRDD code to implement your
> requirement.
>
> Good luck!
> Cheers,
> Anastasios
>
>
> On Sun, Jan 15, 2017 at 5:39 PM, Fei Hu wrote:
>
>> Hi Anastasios,
>>
>> Thanks for your reply. If I just increase the numPartitions to be twice
>> larger
iting to HDFS, but it might still be a narrow dependency (satisfying your
> requirements) if you increase the # of partitions.
>
> Best,
> Anastasios
>
> On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu wrote:
>
>> Dear all,
>>
>> I want to equally divide a RDD partition
locality.
Thanks,
Fei
On Sun, Jan 15, 2017 at 2:33 AM, Rishi Yadav wrote:
> Can you provide some more details:
> 1. How many partitions does RDD have
> 2. How big is the cluster
> On Sat, Jan 14, 2017 at 3:59 PM Fei Hu wrote:
>
>> Dear all,
>>
>> I want to equall
Dear all,
I want to equally divide a RDD partition into two partitions. That means,
the first half of elements in the partition will create a new partition,
and the second half of elements in the partition will generate another new
partition. But the two new partitions are required to be at the sa
Job inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement
> getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu wrote:
>
> That is a good idea.
>
> I tried add the following code to get ge
Dear all,
I tried to customize my own RDD. In the getPreferredLocations() function, I
used the following code to query anonter RDD, which was used as an input to
initialize this customized RDD:
* val results: Array[Array[DataChunkPartition]] =
context.runJob(partitionsRDD, (con
de the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "
Dear all,
Is there any way to change the host location for a certain partition of RDD?
"protected def getPreferredLocations(split: Partition)" can be used to
initialize the location, but how to change it after the initialization?
Thanks,
Fei
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use org.apache.spark.
serializer.KryoSerializer as spark.serializer. And in the dependency I
added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I wro
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use
org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the
dependency I added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I wrot
Dear all,
I have a question about how to measure the runtime for a Spak application.
Here is an example:
- On the Spark UI: the total duration time is 2.0 minutes = 120 seconds
as following
[image: Screen Shot 2016-07-09 at 11.45.44 PM.png]
- However, when I check the jobs launched by
Thanks for your link. It will be helpful for me.
Best,
Fei
> On Mar 29, 2016, at 11:14 AM, Steve Loughran wrote:
>
>
>> On 29 Mar 2016, at 14:30, Fei Hu > <mailto:hufe...@gmail.com>> wrote:
>>
>> Hi Jeff,
>>
>> Thanks for your info! I
scala
>
> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala>
>
> What's your purpose for using "java -cp", for local development, IDE should
> be sufficient.
>
>
>
>
>
> On Tue, M
Hi,
I am wondering how to run the spark job by java command, such as: java -cp
spark.jar mainclass. When running/debugging the spark program in IntelliJ IDEA,
it uses java command to run spark main class, so I think it should be able to
run the spark job by java command besides the spark-submit
Hi,
I am trying to migrate the program from Hadoop to Spark, but I met a problem
about the serialization. In the Hadoop program, the key and value classes
implement org.apache.hadoop.io.WritableComparable, which are for the
serialization. Now in the spark program, I used newAPIHadoopRDD to read
17 matches
Mail list logo