The first thing I do once I import new data (as a pandas dataframe) is to 
.head() it, .describe() it, and then kick around a few specific stats according 
to what I see.

But I'm not satisfied with .describe(). Amongst others, non-numerical columns 
are ignored, and off-the-shelf stats will be computed for any numerical column.

I've been shopping around for a "data peeping" function that would:

(1) Have a hands-off mode where simply typing
       diagnose_this(data)
the function would figure things out on its own, and notify me when in doubt. 
For example, would assume that any string data with not too many unique values 
should be considered categorical and appropriate statistics erected.

(2) Perform standard diagnoses and print them out. For example, (a) missing 
values? (b) heterogeneously formatted data? (c) columns with only one unique 
value? etc.

(3) Be parametrizable, if I so choose.

Does anyone know of such a function?
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to