Why did you use Rdd#saveAsTextFile instead of DataFrame#save writing as parquet, orc, ...?
// maropu On Wed, May 25, 2016 at 7:10 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi , Yes I have joined using DataFrame join. Now to save this into hdfs .I > am converting the joined dataframe to rdd (dataframe.rdd) and using > saveAsTextFile, trying to save it. However, this is also taking too much > time. > > Thanks, > Padma Ch > > On Wed, May 25, 2016 at 1:32 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> Seems you'd be better off using DataFrame#join instead of RDD.cartesian >> because it always needs shuffle operations which have alot of overheads >> such as reflection, serialization, ... >> In your case, since the smaller table is 7mb, DataFrame#join uses a >> broadcast strategy. >> This is a little more efficient than RDD.cartesian. >> >> // maropu >> >> On Wed, May 25, 2016 at 4:20 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> It is basically a Cartesian join like RDBMS >>> >>> Example: >>> >>> SELECT * FROM FinancialCodes, FinancialData >>> >>> The results of this query matches every row in the FinancialCodes table >>> with every row in the FinancialData table. Each row consists of all >>> columns from the FinancialCodes table followed by all columns from the >>> FinancialData table. >>> >>> >>> Not very useful >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 25 May 2016 at 08:05, Priya Ch <learnings.chitt...@gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> I have two RDDs A and B where in A is of size 30 MB and B is of size >>>> 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in >>>> cartesian operation ? >>>> >>>> I am using spark 1.6.0 version >>>> >>>> Regards, >>>> Padma Ch >>>> >>> >>> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- --- Takeshi Yamamuro