Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-11 Thread Alan Spence
On 09 Jan 2013, at 00:02:11 Steven D'Aprano wrote: > The point I keep making, that everybody seems to be ignoring, is that > eyeballing a line of best fit is subjective, unreliable and impossible to > verify. How could I check that the line you say is the "best fit" > actually *is* the *best

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Steven D'Aprano
On Wed, 09 Jan 2013 07:14:51 +1100, Chris Angelico wrote: > Three types of lies. Oh, surely more than that. White lies. Regular or garden variety lies. Malicious lies. Accidental or innocent lies. FUD -- "fear, uncertainty, doubt". Half-truths. Lying by omission. Exaggeration and underst

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Jason Friedman
> Statistical analysis is a huge science. So is lying. And I'm not sure > most people can pick one from the other. Chris, your sentence causes me to think of Mr. Twain's sentence, or at least the one he popularized: http://www.twainquotes.com/Statistics.html. -- http://mail.python.org/mailman/lis

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Jason Friedman
> Statistical analysis is a huge science. So is lying. And I'm not sure > most people can pick one from the other. Chris, your sentence causes me to think of Mr. Twain's sentence, or at least the one he popularized: http://www.twainquotes.com/Statistics.html. -- http://mail.python.org/mailman/lis

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Steven D'Aprano
On Tue, 08 Jan 2013 04:07:08 -0500, Terry Reedy wrote: >> But that is not fitting a line by eye, which is what I am talking >> about. > > With the line constrained to go through 0,0 a line eyeballed with a > clear ruler could easily be better than either regression line, as a > human will tend t

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Robert Kern
On 08/01/2013 20:14, Chris Angelico wrote: On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern wrote: On 08/01/2013 06:35, Chris Angelico wrote: ... it looks quite significant to show a line going from the bottom of the graph to the top, but sounds a lot less noteworthy when you see it as a half-degre

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Chris Angelico
On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern wrote: > On 08/01/2013 06:35, Chris Angelico wrote: >> ... it looks >> quite significant to show a line going from the bottom of the graph to >> the top, but sounds a lot less noteworthy when you see it as a >> half-degree increase on about (I think?) 30

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Maarten
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote: > With the line constrained to go through 0,0, a line eyeballed with a > clear ruler could easily be better than either regression line, as a > human will tend to minimize the deviations *perpendicular to the line*, > which is t

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Robert Kern
On 08/01/2013 06:35, Chris Angelico wrote: On Tue, Jan 8, 2013 at 1:06 PM, Steven D'Aprano wrote: given that weather patterns have been known to follow cycles at least that long. That is not a given. "Weather patterns" don't last for thirty years. Perhaps you are talking about climate pattern

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Oscar Benjamin
On 8 January 2013 01:23, Steven D'Aprano wrote: > On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote: > > [...] >> I also think it would >> be highly foolish to go so far with refusing to eyeball data that you >> would accept the output of some regression algorithm even when it >> clearly lo

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-08 Thread Terry Reedy
On 1/7/2013 8:23 PM, Steven D'Aprano wrote: On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote: An example: Earlier today I was looking at some experimental data. A simple model of the process underlying the experiment suggests that two variables x and y will vary in direct proportion to

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Chris Angelico
On Tue, Jan 8, 2013 at 1:06 PM, Steven D'Aprano wrote: >> given that weather patterns have been known to follow cycles at least >> that long. > > That is not a given. "Weather patterns" don't last for thirty years. > Perhaps you are talking about climate patterns? Yes, that's what I meant. In any

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Steven D'Aprano
On Tue, 08 Jan 2013 06:43:46 +1100, Chris Angelico wrote: > On Tue, Jan 8, 2013 at 4:58 AM, Steven D'Aprano > wrote: >> Anyone can fool themselves into placing a line through a subset of non- >> linear data. Or, sadly more often, *deliberately* cherry picking fake >> clusters in order to fool oth

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Steven D'Aprano
On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote: > An example: Earlier today I was looking at some experimental data. A > simple model of the process underlying the experiment suggests that two > variables x and y will vary in direct proportion to one another and the > data broadly reflec

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Oscar Benjamin
On 7 January 2013 17:58, Steven D'Aprano wrote: > On Mon, 07 Jan 2013 15:20:57 +, Oscar Benjamin wrote: > >> There are sometimes good reasons to get a line of best fit by eye. In >> particular if your data contains clusters that are hard to separate, >> sometimes it's useful to just pick out r

Re: [Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Chris Angelico
On Tue, Jan 8, 2013 at 4:58 AM, Steven D'Aprano wrote: > Anyone can fool themselves into placing a line through a subset of non- > linear data. Or, sadly more often, *deliberately* cherry picking fake > clusters in order to fool others. Here is a real world example of what > happens when people pi

[Offtopic] Line fitting [was Re: Numpy outlier removal]

2013-01-07 Thread Steven D'Aprano
On Mon, 07 Jan 2013 15:20:57 +, Oscar Benjamin wrote: > There are sometimes good reasons to get a line of best fit by eye. In > particular if your data contains clusters that are hard to separate, > sometimes it's useful to just pick out roughly where you think a line > through a subset of the

Re: Numpy outlier removal

2013-01-07 Thread Robert Kern
On 07/01/2013 15:20, Oscar Benjamin wrote: On 7 January 2013 05:11, Steven D'Aprano wrote: On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote: On 7 January 2013 01:46, Steven D'Aprano wrote: On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: I'm not sure that this approach i

Re: Numpy outlier removal

2013-01-07 Thread Oscar Benjamin
On 7 January 2013 05:11, Steven D'Aprano wrote: > On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote: > >> On 7 January 2013 01:46, Steven D'Aprano >> wrote: >>> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: >>> >>> I'm not sure that this approach is statistically robust. No,

RE: Numpy outlier removal

2013-01-07 Thread Joseph L. Casale
> In other words: this approach for detecting outliers is nothing more than  > a very rough, and very bad, heuristic, and should be avoided. Heh, very true but the results will only be used for conversational purposes. I am making an assumption that the data is normally distributed and I do expec

Re: Numpy outlier removal

2013-01-06 Thread Steven D'Aprano
On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote: > On 7 January 2013 01:46, Steven D'Aprano > wrote: >> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: >> >>> I have a dataset that consists of a dict with text descriptions and >>> values that are integers. If required, I coll

Re: Numpy outlier removal

2013-01-06 Thread Oscar Benjamin
On 7 January 2013 01:46, Steven D'Aprano wrote: > On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: > >> I have a dataset that consists of a dict with text descriptions and >> values that are integers. If required, I collect the values into a list >> and create a numpy array running it t

Re: Numpy outlier removal

2013-01-06 Thread Paul Simon
"Steven D'Aprano" wrote in message news:50ea28e7$0$30003$c3e8da3$54964...@news.astraweb.com... > On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: > >> I have a dataset that consists of a dict with text descriptions and >> values that are integers. If required, I collect the values int

Re: Numpy outlier removal

2013-01-06 Thread Steven D'Aprano
On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote: > I have a dataset that consists of a dict with text descriptions and > values that are integers. If required, I collect the values into a list > and create a numpy array running it through a simple routine:  > > data[abs(data - mean(d

Re: Numpy outlier removal

2013-01-06 Thread MRAB
On 2013-01-06 22:33, Hans Mulder wrote: On 6/01/13 20:44:08, Joseph L. Casale wrote: I have a dataset that consists of a dict with text descriptions and values that are integers. If required, I collect the values into a list and create a numpy array running it through a simple routine: data[ab

RE: Numpy outlier removal

2013-01-06 Thread Joseph L. Casale
>Assuming your data and the dictionary are keyed by a common set of keys:  > >for key in descriptions: >    if abs(data[key] - mean(data)) >= m * std(data): >        del data[key] >        del descriptions[key] Heh, yeah sometimes the obvious is too simple to see. I used a dict comp to rebuild

Re: Numpy outlier removal

2013-01-06 Thread Hans Mulder
On 6/01/13 20:44:08, Joseph L. Casale wrote: > I have a dataset that consists of a dict with text descriptions and values > that are integers. If > required, I collect the values into a list and create a numpy array running > it through a simple > routine: data[abs(data - mean(data)) < m * std(da

Numpy outlier removal

2013-01-06 Thread Joseph L. Casale
I have a dataset that consists of a dict with text descriptions and values that are integers. If required, I collect the values into a list and create a numpy array running it through a simple routine: data[abs(data - mean(data)) < m * std(data)] where m is the number of std deviations to includ