from:"julio . cesare"

unsubscribe

2020-02-19 Thread julio . cesare

- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Fastest way to drop useless columns

2018-05-31 Thread julio . cesare

I believe this only works when we need to drop duplicate ROWS Here I want to drop cols which contains one unique value Le 2018-05-31 11:16, Divya Gehlot a écrit : you can try dropduplicate function

Fastest way to drop useless columns

2018-05-31 Thread julio . cesare

Hi there ! I have a potentially large dataset ( regarding number of rows and cols ) And I want to find the fastest way to drop some useless cols for me, i.e. cols containing only an unique value ! I want to know what do you think that I could do to do this as fast as possible using spark.

Feature generation / aggregate functions / timeseries

2017-12-14 Thread julio . cesare

Hi dear spark community ! I want to create a lib which generates features for potentially very large datasets, so I believe spark could be a nice tool for that. Let me explain what I need to do : Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp (

Fwd: Feature Generation for Large datasets composed of many time series

2017-12-14 Thread julio . cesare

Hi dear spark community ! I want to create a lib which generates features for potentially very large datasets, so I believe spark could be a nice tool for that. Let me explain what I need to do : Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp (

Re: Feature Generation for Large datasets composed of many time series

2017-07-24 Thread julio . cesare

Ok thanks ! That's exactly the kind of thing I was imagining with Apache BEAM. I still have a few questions. - regarding performances will this be efficient ? Even with large "window" / many id / values / timestamps ... ? - my goal after all this is to store it in cassandra and/or use the

Union large number of DataFrames

2017-07-24 Thread julio . cesare

Hi there ! Let's imagine I have a large number of very small dataframe with the same schema ( a list of DataFrames : allDFs) and I want to create one large dataset with this. I have been trying this : -> allDFs.reduce ( (a,b) => a.union(b) ) And after this one : -> allDFs.reduce ( (a,b) =>

Feature Generation for Large datasets composed of many time series

2017-07-19 Thread julio . cesare

Hello, I want to create a lib which generates features for potentially very large datasets. Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp ( or a long value ) - a value ( int or string ) I want my tool to : - compute aggregate function for

Feature Generation for Large datasets composed of many time series

2017-07-19 Thread julio . cesare

Hello, I want to create a lib which generates features for potentially very large datasets. Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp ( or a long value ) - a value ( int or string ) I want my tool to : - compute aggregate function for many

unsubscribe

Re: Fastest way to drop useless columns

Fastest way to drop useless columns

Feature generation / aggregate functions / timeseries

Fwd: Feature Generation for Large datasets composed of many time series

Re: Feature Generation for Large datasets composed of many time series

Union large number of DataFrames

Feature Generation for Large datasets composed of many time series

Feature Generation for Large datasets composed of many time series

9 matches

Site Navigation

Mail list logo

Footer information