Forgot to include this reply to the list (as others may want to comment). ---------- Forwarded message ---------- From: Paul Barry <paul.james.ba...@gmail.com> Date: 24 June 2017 at 12:21 Subject: Re: Unable to convert pandas object to string To: Bhaskar Dhariyal <dhariyalbhas...@gmail.com>
Note that .info(), according to its docs, gives you a "Concise summary of a DataFrame". Everything is an object in Python, including strings, so the output from .info() is technically correct (but maybe not very helpful in your case). As I've shown, we can work out that the data you want to work with is in fact a string, so I've added some code to my notebook to show you how to tokenize the first row of data. This should get you started on doing this to the rest of your data. Note, too, that some of the data in these specific columns contains something other than a string, so you'll need to clean up that first (see the end of the updated notebook, attached, for how I worked out that this was indeed the case). I hope this all helps. Paul. On 24 June 2017 at 11:31, Bhaskar Dhariyal <dhariyalbhas...@gmail.com> wrote: > The data type showing there is object. In[4] in the first page. I wanted > to tokenize the name & desc column and clean it > > > On Sat, Jun 24, 2017 at 3:54 PM, Paul Barry <paul.james.ba...@gmail.com> > wrote: > >> Hi Bhaskar. >> >> Please see attached PDF of a small Jupyter notebook. As you'll see, the >> data in the fields you mentioned are *already* strings. What is it you are >> trying to do here? >> >> Paul. >> >> On 24 June 2017 at 10:51, Bhaskar Dhariyal <dhariyalbhas...@gmail.com> >> wrote: >> >>> >>> train.csv >>> <https://drive.google.com/file/d/0B1D4AyluMGU0enoxbElGTV94Q0E/view?usp=drive_web> >>> here it is thanks for quick reply >>> >>> On Sat, Jun 24, 2017 at 3:14 PM, Paul Barry <paul.james.ba...@gmail.com> >>> wrote: >>> >>>> Any chance you could post one line of data so we can see what we have >>>> to work with? >>>> >>>> Also - have you taken a look at Jake VanderPlas's notebooks? There's >>>> lot of help with pandas to be found there: https://github.com/jake >>>> vdp/PythonDataScienceHandbook >>>> >>>> Paul. >>>> >>>> On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariyalbhas...@gmail.com> >>>> wrote: >>>> >>>>> <class 'pandas.core.frame.DataFrame'> >>>>> Int64Index: 171594 entries, 0 to 63464 >>>>> Data columns (total 7 columns): >>>>> project_id 171594 non-null object >>>>> desc 171594 non-null object >>>>> goal 171594 non-null float64 >>>>> keywords 171594 non-null object >>>>> diff_creat_laun 171594 non-null int64 >>>>> diff_laun_status 171594 non-null int64 >>>>> diff_status_dead 171594 non-null int64 >>>>> dtypes: float64(1), int64(3), object(3) >>>>> >>>>> not able to convert desc and keywords to string for preprocessing. >>>>> Tried astype(str). Please help >>>>> -- >>>>> https://mail.python.org/mailman/listinfo/python-list >>>>> >>>> >>>> >>>> >>>> -- >>>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w: >>>> http://paulbarry.itcarlow.ie - e: paul.ba...@itcarlow.ie >>>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland. >>>> >>> >>> >> >> >> -- >> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w: >> http://paulbarry.itcarlow.ie - e: paul.ba...@itcarlow.ie >> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland. >> > > -- Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w: http://paulbarry.itcarlow.ie - e: paul.ba...@itcarlow.ie Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland. -- Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w: http://paulbarry.itcarlow.ie - e: paul.ba...@itcarlow.ie Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland. -- https://mail.python.org/mailman/listinfo/python-list