Well, the dataframes make it easier to work on some columns of the data only 
and to store results in new columns, removing the need to zip it all back 
together and thus to preserve order.


On 2017-09-05 14:04 CEST, mehmet.su...@gmail.com wrote:

Hi Johan,
 DataFrames are building on top of RDDs, not sure if the ordering issues are 
different there. Maybe you could create minimally large enough simulated data 
and example series of transformations as an example to experiment on.
Best,
-m

Mehmet Süzen, MSc, PhD
<su...@acm.org>



On 15 September 2017 at 09:44,  <johan.grande....@orange.com> wrote:
> Thanks all for your answers. After reading the provided links I am still 
> uncertain of the details of what I'd need to do to get my calculations right 
> with RDDs. However I discovered DataFrames and Pipelines on the "ML" side of 
> the libs and I think they'll be better suited to my needs.
>
> Best,
> Johan Grande

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Reply via email to