Re: [scikit-learn] Imputers and DataFrame objects

2020-08-19 Thread Ram Rachum
I'll check it out. Thank you. On Wed, Aug 19, 2020 at 9:46 AM Sole Galli via scikit-learn < scikit-learn@python.org> wrote: > Did you have a look at the package feature-engine? It has its own imputers > and encoders that allow you to select the columns to transform and returns > a dataframe. It a

Re: [scikit-learn] Imputers and DataFrame objects

2020-08-18 Thread Sole Galli via scikit-learn
Did you have a look at the package feature-engine? It has its own imputers and encoders that allow you to select the columns to transform and returns a dataframe. It also has a sklear wrapper that wraps sklearn transformers so that they return a dataframe instead of a numpy array. Cheers. Sole

Re: [scikit-learn] Imputers and DataFrame objects

2020-08-18 Thread Ram Rachum
On Tue, Aug 18, 2020 at 6:53 PM Kevin Markham wrote: > Hi Ram, > > > For a column with numbers written like "one", "two" and missing values > "?", I had to do two things: Change them to numbers (1, 2), and then, > instead of the missing values, add the most common element, or mean or > whatever.

Re: [scikit-learn] Imputers and DataFrame objects

2020-08-18 Thread Kevin Markham
Hi Ram, > For a column with numbers written like "one", "two" and missing values "?", I had to do two things: Change them to numbers (1, 2), and then, instead of the missing values, add the most common element, or mean or whatever. When I tried to use LabelEncoder to do the first part, it complain

Re: [scikit-learn] Imputers and DataFrame objects

2020-08-18 Thread Ram Rachum
On Mon, Aug 17, 2020 at 8:55 PM Kevin Markham wrote: > Hi Ram, > > These are great questions! > Thank you for the detailed answers. > > > The task was to remove these irregularities. So for the "?" items, > replace them with mean, and for the "one", "two" etc. replace with a > numerical value.

Re: [scikit-learn] Imputers and DataFrame objects

2020-08-17 Thread Kevin Markham
Hi Ram, These are great questions! > The task was to remove these irregularities. So for the "?" items, replace them with mean, and for the "one", "two" etc. replace with a numerical value. If your primary task is "data cleaning", then pandas is usually the optimal tool. If "preprocessing your d

[scikit-learn] Imputers and DataFrame objects

2020-08-17 Thread Ram Rachum
Hey guys, This is a bit of a complicated question. I was helping my friend do a task with Pandas/sklearn for her data science class. I figured it'll be a breeze, since I'm fancy-pancy Python programmer. Oh wow, it was so not. I was trying to do things that felt simple to me, but there were so ma