On 09 Jan 2013, at 00:02:11 Steven D'Aprano wrote:
> The point I keep making, that everybody seems to be ignoring, is that
> eyeballing a line of best fit is subjective, unreliable and impossible to
> verify. How could I check that the line you say is the "best fit"
> actually *is* the *best
On Wed, 09 Jan 2013 07:14:51 +1100, Chris Angelico wrote:
> Three types of lies.
Oh, surely more than that.
White lies.
Regular or garden variety lies.
Malicious lies.
Accidental or innocent lies.
FUD -- "fear, uncertainty, doubt".
Half-truths.
Lying by omission.
Exaggeration and underst
> Statistical analysis is a huge science. So is lying. And I'm not sure
> most people can pick one from the other.
Chris, your sentence causes me to think of Mr. Twain's sentence, or at
least the one he popularized:
http://www.twainquotes.com/Statistics.html.
--
http://mail.python.org/mailman/lis
> Statistical analysis is a huge science. So is lying. And I'm not sure
> most people can pick one from the other.
Chris, your sentence causes me to think of Mr. Twain's sentence, or at
least the one he popularized:
http://www.twainquotes.com/Statistics.html.
--
http://mail.python.org/mailman/lis
On Tue, 08 Jan 2013 04:07:08 -0500, Terry Reedy wrote:
>> But that is not fitting a line by eye, which is what I am talking
>> about.
>
> With the line constrained to go through 0,0 a line eyeballed with a
> clear ruler could easily be better than either regression line, as a
> human will tend t
On 08/01/2013 20:14, Chris Angelico wrote:
On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern wrote:
On 08/01/2013 06:35, Chris Angelico wrote:
... it looks
quite significant to show a line going from the bottom of the graph to
the top, but sounds a lot less noteworthy when you see it as a
half-degre
On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern wrote:
> On 08/01/2013 06:35, Chris Angelico wrote:
>> ... it looks
>> quite significant to show a line going from the bottom of the graph to
>> the top, but sounds a lot less noteworthy when you see it as a
>> half-degree increase on about (I think?) 30
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote:
> With the line constrained to go through 0,0, a line eyeballed with a
> clear ruler could easily be better than either regression line, as a
> human will tend to minimize the deviations *perpendicular to the line*,
> which is t
On 08/01/2013 06:35, Chris Angelico wrote:
On Tue, Jan 8, 2013 at 1:06 PM, Steven D'Aprano
wrote:
given that weather patterns have been known to follow cycles at least
that long.
That is not a given. "Weather patterns" don't last for thirty years.
Perhaps you are talking about climate pattern
On 8 January 2013 01:23, Steven D'Aprano
wrote:
> On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote:
>
> [...]
>> I also think it would
>> be highly foolish to go so far with refusing to eyeball data that you
>> would accept the output of some regression algorithm even when it
>> clearly lo
On 1/7/2013 8:23 PM, Steven D'Aprano wrote:
On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote:
An example: Earlier today I was looking at some experimental data. A
simple model of the process underlying the experiment suggests that two
variables x and y will vary in direct proportion to
On Tue, Jan 8, 2013 at 1:06 PM, Steven D'Aprano
wrote:
>> given that weather patterns have been known to follow cycles at least
>> that long.
>
> That is not a given. "Weather patterns" don't last for thirty years.
> Perhaps you are talking about climate patterns?
Yes, that's what I meant. In any
On Tue, 08 Jan 2013 06:43:46 +1100, Chris Angelico wrote:
> On Tue, Jan 8, 2013 at 4:58 AM, Steven D'Aprano
> wrote:
>> Anyone can fool themselves into placing a line through a subset of non-
>> linear data. Or, sadly more often, *deliberately* cherry picking fake
>> clusters in order to fool oth
On Mon, 07 Jan 2013 22:32:54 +, Oscar Benjamin wrote:
> An example: Earlier today I was looking at some experimental data. A
> simple model of the process underlying the experiment suggests that two
> variables x and y will vary in direct proportion to one another and the
> data broadly reflec
On 7 January 2013 17:58, Steven D'Aprano
wrote:
> On Mon, 07 Jan 2013 15:20:57 +, Oscar Benjamin wrote:
>
>> There are sometimes good reasons to get a line of best fit by eye. In
>> particular if your data contains clusters that are hard to separate,
>> sometimes it's useful to just pick out r
On Tue, Jan 8, 2013 at 4:58 AM, Steven D'Aprano
wrote:
> Anyone can fool themselves into placing a line through a subset of non-
> linear data. Or, sadly more often, *deliberately* cherry picking fake
> clusters in order to fool others. Here is a real world example of what
> happens when people pi
On Mon, 07 Jan 2013 15:20:57 +, Oscar Benjamin wrote:
> There are sometimes good reasons to get a line of best fit by eye. In
> particular if your data contains clusters that are hard to separate,
> sometimes it's useful to just pick out roughly where you think a line
> through a subset of the
On 07/01/2013 15:20, Oscar Benjamin wrote:
On 7 January 2013 05:11, Steven D'Aprano
wrote:
On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote:
On 7 January 2013 01:46, Steven D'Aprano
wrote:
On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
I'm not sure that this approach i
On 7 January 2013 05:11, Steven D'Aprano
wrote:
> On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote:
>
>> On 7 January 2013 01:46, Steven D'Aprano
>> wrote:
>>> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
>>>
>>> I'm not sure that this approach is statistically robust. No,
> In other words: this approach for detecting outliers is nothing more than
> a very rough, and very bad, heuristic, and should be avoided.
Heh, very true but the results will only be used for conversational purposes.
I am making an assumption that the data is normally distributed and I do expec
On Mon, 07 Jan 2013 02:29:27 +, Oscar Benjamin wrote:
> On 7 January 2013 01:46, Steven D'Aprano
> wrote:
>> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
>>
>>> I have a dataset that consists of a dict with text descriptions and
>>> values that are integers. If required, I coll
On 7 January 2013 01:46, Steven D'Aprano
wrote:
> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
>
>> I have a dataset that consists of a dict with text descriptions and
>> values that are integers. If required, I collect the values into a list
>> and create a numpy array running it t
"Steven D'Aprano" wrote in message
news:50ea28e7$0$30003$c3e8da3$54964...@news.astraweb.com...
> On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
>
>> I have a dataset that consists of a dict with text descriptions and
>> values that are integers. If required, I collect the values int
On Sun, 06 Jan 2013 19:44:08 +, Joseph L. Casale wrote:
> I have a dataset that consists of a dict with text descriptions and
> values that are integers. If required, I collect the values into a list
> and create a numpy array running it through a simple routine:
>
> data[abs(data - mean(d
On 2013-01-06 22:33, Hans Mulder wrote:
On 6/01/13 20:44:08, Joseph L. Casale wrote:
I have a dataset that consists of a dict with text descriptions and values that
are integers. If
required, I collect the values into a list and create a numpy array running it
through a simple
routine: data[ab
>Assuming your data and the dictionary are keyed by a common set of keys:
>
>for key in descriptions:
> if abs(data[key] - mean(data)) >= m * std(data):
> del data[key]
> del descriptions[key]
Heh, yeah sometimes the obvious is too simple to see. I used a dict comp to
rebuild
On 6/01/13 20:44:08, Joseph L. Casale wrote:
> I have a dataset that consists of a dict with text descriptions and values
> that are integers. If
> required, I collect the values into a list and create a numpy array running
> it through a simple
> routine: data[abs(data - mean(data)) < m * std(da
I have a dataset that consists of a dict with text descriptions and values that
are integers. If
required, I collect the values into a list and create a numpy array running it
through a simple
routine: data[abs(data - mean(data)) < m * std(data)] where m is the number of
std deviations
to includ
28 matches
Mail list logo