sentative stats collection
>
>
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John
> > Kanagaraj
> > Sent: Tuesday, May 28, 2002 1:49 PM
> > To: Multiple recipients of list ORACLE-L
> > Subject: RE: St
One problem I have with histograms is having to calculate the proper number of
buckets. Thanks to Steve Adams I have a program to do just that. I keep such
information in the "dbms_stats_table" so it can be used by dbms_stats when the table
is reanalyzed.
The proper number of buckets may
> I remember that Steve Orr of this list used Histograms and enjoyed huge
> increase in performance.
Hi John, that over a year ago, your memory is too good. :-)
Haven't worked with histograms lately but here are some before and after
stats from that former histogram implementation:
- Rows scan
We have a Y/N (yes/no) "processing flag" column on a very large table.
As an alternative to a bitmapped index, we changed the code to manipulate
Y/null values because < 0.1% of values are 'Y' and the others are null.
The resulting index is quite small but we do rebuild it periodically.
This has
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John
> Kanagaraj
> Sent: Tuesday, May 28, 2002 1:49 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Statistical sampling and repre
Have used them on a for all indexed columns basis and they make a massive
difference on heavily skewed data - particularly a sort of waiting to be
processed flag which only has about 5 distinct values but the ones we want
to pick will make up only about 0.01%.
Haven't used them on all columns a
Sorry, I did not use anything that sophisticated (nor accurate?):
Uniform distribution = 1/(number of unique values)
Max distribution = (the highest occurrence of a value) / (number of records in
the table)
Tom
-Original Message-
Sent: Tuesday, May 28, 2002 4:34 PM
To: Multiple recipie
>
> Based on the scarcity of previous responses to emails on this list,
> it seems that histograms are not that widely used throughout the
> industry. I'm not sure why.
I've used them in the past, but only in very specific instances and
certainly not for all tables/columns. Sometimes just 1 or 2
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John
> Kanagaraj
> Sent: Tuesday, May 28, 2002 1:49 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Statistical sampling and representative stats collection
>
>
>
Cherie,
> In my experience, histograms seem a bit hit or miss but in the
> cases where they've worked, the performance improvement
> has been good or even fantastic. In the cases where they haven't
> helped, I've simply removed them.
As I observed before, histograms help only when literal pred
tom,
This is interesting. How did you determine max
distribution and uniform distribution? Did you use
standard deviation and variance?
regards,
jack silvey
--- "Terrian, Tom" <[EMAIL PROTECTED]> wrote:
> John,
>
> I know in a previous job, we determined that
> histograms where not worth it.
I have had some really good experiences with using histograms.
They didn't always produce the improvements that I expected
but in many cases, I saw 10 times, 100 times, or even 1,000 faster
execution times after adding histograms.
I don't have the specifics, but these were cases where the data
w
John,
I know in a previous job, we determined that histograms where not worth it. The
following is from a test that we performed:
***
Table- F_tab Uniform DistributionMax Distribution
Field- P_code 0.6
Ian,
> John are you saying to create histograms on all indexed
> columns, or just the ones with distributions which are skewed
> and also for ones which although symmetric in distribution
> have some values much more prevalent than others?
To keep this simplistic, I wouldn't use Histograms
Hi Larry,
I sure wasn't disappointed to receive this note from a CBO/SQL guru such as
yourself. Thanks for the ideas.
> My question here is would this database and your findings be
> applicable to
> other databases and the nature of their data? I wouldn't
The nature of the data and the varian
> Hi Jack,
>
> > One question - you mention that an index analyze
> > provides beter data distribution. Could you discuss
> > what you found in more detail?
>
> What I meant was that the Histograms that are created during an
> ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of th
John are you saying to create histograms on all indexed columns, or just the ones with
distributions which are skewed and also for ones which although symmetric in
distribution have some values much more prevalent than others?
Ian MacGregor
Stanford Linear Accelerator Center
[EMAIL PROTECTED
John,
Great work. I do have some comments and questions, but don't take it as
being critical. Your work on this is greatly appreciated. My
comments/question are in-line and only asked so that the list can bounce
around some more ideas and thoughts.
> --- Begin Quote ---
> MYTH: "COMPUTE IS BETT
Hi Jack,
> One question - you mention that an index analyze
> provides beter data distribution. Could you discuss
> what you found in more detail?
What I meant was that the Histograms that are created during an
ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of the
data distrib
John,
Thanks for a very good explanation between COMPUTE and ESTIMATE. I am
changing the scripts to use DBMS_STATS instead of ANALYZE on out 9012
database, but on 8i I'll keep this in mind.
Thanks again
Raj
__
Rajendra Jamadagni MI
Fantastic research, John!
- Original Message -
To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]>
Sent: Thursday, May 23, 2002 7:13 PM
> Jack,
>
> I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper
and
> arrived at the following: (cut-and-paste of releva
John,
Thanks for the great input. I think at this time we
are working our way towards a invalidate/load/rebuild
compute index strategy, so estimate is probably not in
the picture for our indexes. Since the data is ordered
in the index and does not have to be sorted during an
analyze, a compute sh
Jack,
I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and
arrived at the following: (cut-and-paste of relevant parts of the paper)
--- Begin Quote ---
MYTH: "COMPUTE IS BETTER THAN ESTIMATE"
This one generates an endless debate actually, so we will not take a firm
stand
M
Please respond to ORACLE-L
To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]>
cc:
Subject: RE: Statistical sampling and representative stats collection
I remember, in 96/97, I was working on a Oracle 7 database and none of the
this sample()
We use dbms_stats.gather_schema_stats and do not pass it sys as a schema owner. Under
8i this method did not work for "gather empty", but did fine with "gather stale".
This has been fixed in 9i. However, it still abends if one of the schema owns an
iot_overflow table.
Ian
-Origina
Raj,
Would the temp have gotten the same result by counting the frequencies of
the letters in the game of scrabble?
Chaim
"Jamadagni, Rajendra" <[EMAIL PROTECTED]>@fatcity.com on
05/22/2002 11:53:22 AM
Please respond to [EMAIL PROTECTED]
Sent by:[EMAIL PROTECTED]
To:Multiple reci
Jack
You are right keeping in mind your application and variation of data in
different tables. I just mentioned 30% as most agreed percentage for
estimate. I think Dennis gave you a more pracitcial answer yesterday.
Regards
Rafiq
Reply-To: [EMAIL PROTECTED]
To: Multiple recipients of list
That is a very good question. Anyone else know the
answer?
My approach to skewness right now is to compare the
mean and the median of the entire dataset, since if
you have two measures of centrality that are not the
same, you have skewness.
/jack
--- [EMAIL PROTECTED] wrote:
>
> Jack,
>
> Is
I remember, in 96/97, I was working on a Oracle 7 database and none of the
this sample() was available. My task was to select x people out of z number
of people randomly. We hired a temp, who took Dallas/Fort Worth dictionary
(I was working in that area) and counted (yes counted) number of people
Cherie,
Had some problems with DBMS_STATS, but we are making
our way towards it. Just have to be careful, since it
can do quirky things like analyze the dictionary. We
are using dba_tab_modifications to monitor our tables.
Thanks for the input.
Jack
--- [EMAIL PROTECTED] wrote:
>
> Jack,
>
Jack, Raj
I agree. The main point I feel is that if you follow statistics
theory, which a good part of our modern technology relies on, you will test
a fixed number of samples, rather than a percentage of the table rows.
For a small table, you may have to sample the entire table t
Jack,
Nielsen Ratings (the TV Rating company) monitors about 5000 people (and
their TV watching habits) to supply ratings for all the shows on most of the
networks for the whole United States. So, as long as you have a working and
proven statistical model, and a good sample, it works. How do I kn
Jack,
Isn't the problem with this concept that it doesn't take into consideration
how skewed the data is? Statistically significance would be relevant to
perfectly distributed data but wouldn't you need a higher percentage of
data for significance in more highly skewed data?
Just something to
Jack,
What version are you on? Are you able to utilize the gather_stale option.
That way you would not only be optimizing the amount you are estimating but
the interval between analyzing.
Of course, that option of DBMS_STATS is not available on older versions.
Cherie Machler
Oracle DBA
Gelco
Hi Rafiq,
We have been using 35 percent on our warehouse, even
on our fact partitions. Now that I have thought about
it for a while, that seems like a lot given the volume
of data. If a representative sample can be gathered
with 10,000 or 50,000 or 100,000 rows, and our fact
partitions have milli
Jack - I believe that you are on the right track. If you think about it,
political pollsters randomly assess national populations with only a few
thousand samples, and it works pretty good as long as it isn't biased by bad
questions, lying respondents, etc. And you don't have those issues here.
The most of the list memeber agrees on estimate with 30%
Regards
Rafiq
Reply-To: [EMAIL PROTECTED]
To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]>
Date: Tue, 21 May 2002 13:43:33 -0800
Hi all,
Did some investigation about statistical sampling this
weekend since we are going
37 matches
Mail list logo