Re: Join highly skewed datasets

๏̯͡๏ Sun, 28 Jun 2015 14:37:06 -0700

I am able to use blockjoin API and it does not throw compilation error

val viEventsWithListings: RDD[(Long, (DetailInputRecord, VISummary, Long))]
= lstgItem.blockJoin(viEvents,1,1).map {


}

Here viEvents is highly skewed and both are on HDFS.

What should be the optimal values of replication, i gave 1,1



On Sun, Jun 28, 2015 at 1:47 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:

> I incremented the version of spark from 1.4.0 to 1.4.0.1 and ran
>
>  ./make-distribution.sh  --tgz -Phadoop-2.4 -Pyarn  -Phive
> -Phive-thriftserver
>
> Build was successful but the script faild. Is there a way to pass the
> incremented version ?
>
>
> [INFO] BUILD SUCCESS
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Total time: 09:56 min
>
> [INFO] Finished at: 2015-06-28T13:45:29-07:00
>
> [INFO] Final Memory: 84M/902M
>
> [INFO]
> ------------------------------------------------------------------------
>
> + rm -rf /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist
>
> + mkdir -p /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/lib
>
> + echo 'Spark 1.4.0.1 built for Hadoop 2.4.0'
>
> + echo 'Build flags: -Phadoop-2.4' -Pyarn -Phive -Phive-thriftserver
>
> + cp
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/assembly/target/scala-2.10/spark-assembly-1.4.0.1-hadoop2.4.0.jar
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/lib/
>
> + cp
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/examples/target/scala-2.10/spark-examples-1.4.0.1-hadoop2.4.0.jar
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/lib/
>
> + cp
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/network/yarn/target/scala-2.10/spark-1.4.0.1-yarn-shuffle.jar
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/lib/
>
> + mkdir -p
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/examples/src/main
>
> + cp -r /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/examples/src/main
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/examples/src/
>
> + '[' 1 == 1 ']'
>
> + cp
> '/Users/dvasthimal/ebay/projects/ep/spark-1.4.0/lib_managed/jars/datanucleus*.jar'
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/dist/lib/
>
> cp:
> /Users/dvasthimal/ebay/projects/ep/spark-1.4.0/lib_managed/jars/datanucleus*.jar:
> No such file or directory
>
> LM-SJL-00877532:spark-1.4.0 dvasthimal$ ./make-distribution.sh  --tgz
> -Phadoop-2.4 -Pyarn  -Phive -Phive-thriftserver
>
>
>
> On Sun, Jun 28, 2015 at 1:41 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> you need 1) to publish to inhouse maven, so your application can depend
>> on your version, and 2) use the spark distribution you compiled to launch
>> your job (assuming you run with yarn so you can launch multiple versions of
>> spark on same cluster)
>>
>> On Sun, Jun 28, 2015 at 4:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>> wrote:
>>
>>> How can i import this pre-built spark into my application via maven as i
>>> want to use the block join API.
>>>
>>> On Sun, Jun 28, 2015 at 1:31 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>>> wrote:
>>>
>>>> I ran this w/o maven options
>>>>
>>>> ./make-distribution.sh  --tgz -Phadoop-2.4 -Pyarn  -Phive
>>>> -Phive-thriftserver
>>>>
>>>> I got this spark-1.4.0-bin-2.4.0.tgz in the same working directory.
>>>>
>>>> I hope this is built with 2.4.x hadoop as i did specify -P
>>>>
>>>> On Sun, Jun 28, 2015 at 1:10 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
>>>> wrote:
>>>>
>>>>>  ./make-distribution.sh  --tgz --*mvn* "-Phadoop-2.4 -Pyarn
>>>>> -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean 
>>>>> package"
>>>>>
>>>>>
>>>>> or
>>>>>
>>>>>
>>>>>  ./make-distribution.sh  --tgz --*mvn* -Phadoop-2.4 -Pyarn
>>>>> -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean 
>>>>> package"
>>>>> Both fail with
>>>>>
>>>>> + echo -e 'Specify the Maven command with the --mvn flag'
>>>>>
>>>>> Specify the Maven command with the --mvn flag
>>>>>
>>>>> + exit -1
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Deepak
>>>>
>>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Re: Join highly skewed datasets

Reply via email to