unsubscribe

2020-02-19 Thread julio . cesare
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Fastest way to drop useless columns

2018-05-31 Thread julio . cesare
I believe this only works when we need to drop duplicate ROWS Here I want to drop cols which contains one unique value Le 2018-05-31 11:16, Divya Gehlot a écrit : you can try dropduplicate function

Fastest way to drop useless columns

2018-05-31 Thread julio . cesare
Hi there ! I have a potentially large dataset ( regarding number of rows and cols ) And I want to find the fastest way to drop some useless cols for me, i.e. cols containing only an unique value ! I want to know what do you think that I could do to do this as fast as possible using spark.

Feature generation / aggregate functions / timeseries

2017-12-14 Thread julio . cesare
Hi dear spark community ! I want to create a lib which generates features for potentially very large datasets, so I believe spark could be a nice tool for that. Let me explain what I need to do : Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp (

Fwd: Feature Generation for Large datasets composed of many time series

2017-12-14 Thread julio . cesare
Hi dear spark community ! I want to create a lib which generates features for potentially very large datasets, so I believe spark could be a nice tool for that. Let me explain what I need to do : Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp (

Re: Feature Generation for Large datasets composed of many time series

2017-07-24 Thread julio . cesare
Ok thanks ! That's exactly the kind of thing I was imagining with Apache BEAM. I still have a few questions. - regarding performances will this be efficient ? Even with large "window" / many id / values / timestamps ... ? - my goal after all this is to store it in cassandra and/or use the

Union large number of DataFrames

2017-07-24 Thread julio . cesare
Hi there ! Let's imagine I have a large number of very small dataframe with the same schema ( a list of DataFrames : allDFs) and I want to create one large dataset with this. I have been trying this : -> allDFs.reduce ( (a,b) => a.union(b) ) And after this one : -> allDFs.reduce ( (a,b) =>

Feature Generation for Large datasets composed of many time series

2017-07-19 Thread julio . cesare
Hello, I want to create a lib which generates features for potentially very large datasets. Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp ( or a long value ) - a value ( int or string ) I want my tool to : - compute aggregate function for

Feature Generation for Large datasets composed of many time series

2017-07-19 Thread julio . cesare
Hello, I want to create a lib which generates features for potentially very large datasets. Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp ( or a long value ) - a value ( int or string ) I want my tool to : - compute aggregate function for many