DD further
> transformations and actions are performed. And, as spark says, child RDD get
> partitions from parent RDD.
>
> Therefore, is there any way to decide partitioning strategy after filter
> operations?
>
> Regards,
> Jasbir Singh
>
>
> This message i
get partitions from parent
RDD.
Therefore, is there any way to decide partitioning strategy after filter
operations?
Regards,
Jasbir Singh
This message is for the designated recipient only and may contain privileged,
proprietary, or otherwise confidential
Yeah, it works for me.
Thanks
On Fri, Nov 18, 2016 at 3:08 AM, ayan guha wrote:
> Hi
>
> I think you can use map reduce paradigm here. Create a key using user ID
> and date and record as a value. Then you can express your operation (do
> something) part as a function. If the function meets cer
Hi
I think you can use map reduce paradigm here. Create a key using user ID
and date and record as a value. Then you can express your operation (do
something) part as a function. If the function meets certain criteria such
as associative and cumulative like, say Add or multiplication, you can use
That would help but again in a particular partitions i would need to a
iterate over the customers having first n letters of user id in that
partition. I want to get rid of nested iterations.
Thanks
On Thu, Nov 17, 2016 at 10:28 PM, Xiaomeng Wan wrote:
> You can partitioned on the first n letter
You can partitioned on the first n letters of userid
On 17 November 2016 at 08:25, titli batali wrote:
> Hi,
>
> I have a use case, where we have 1000 csv files with a column user_Id,
> having 8 million unique users. The data contains: userid,date,transaction,
> where we run some queries.
>
> We
Hi,
I have a use case, where we have 1000 csv files with a column user_Id,
having 8 million unique users. The data contains: userid,date,transaction,
where we run some queries.
We have a case where we need to iterate for each transaction in a
particular date for each user. There is three nesting