RE: Statistical sampling and representative stats collection

2002-05-29 Thread John Kanagaraj
sentative stats collection > > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John > > Kanagaraj > > Sent: Tuesday, May 28, 2002 1:49 PM > > To: Multiple recipients of list ORACLE-L > > Subject: RE: St

RE: Statistical sampling and representative stats collection

2002-05-29 Thread MacGregor, Ian A.
One problem I have with histograms is having to calculate the proper number of buckets. Thanks to Steve Adams I have a program to do just that. I keep such information in the "dbms_stats_table" so it can be used by dbms_stats when the table is reanalyzed. The proper number of buckets may

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Orr, Steve
> I remember that Steve Orr of this list used Histograms and enjoyed huge > increase in performance. Hi John, that over a year ago, your memory is too good. :-) Haven't worked with histograms lately but here are some before and after stats from that former histogram implementation: - Rows scan

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Orr, Steve
We have a Y/N (yes/no) "processing flag" column on a very large table. As an alternative to a bitmapped index, we changed the code to manipulate Y/null values because < 0.1% of values are 'Y' and the others are null. The resulting index is quite small but we do rebuild it periodically. This has

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Cherie_Machler
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John > Kanagaraj > Sent: Tuesday, May 28, 2002 1:49 PM > To: Multiple recipients of list ORACLE-L > Subject: RE: Statistical sampling and repre

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Nicoll, Iain (Calanais)
Have used them on a for all indexed columns basis and they make a massive difference on heavily skewed data - particularly a sort of waiting to be processed flag which only has about 5 distinct values but the ones we want to pick will make up only about 0.01%. Haven't used them on all columns a

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Terrian, Tom
Sorry, I did not use anything that sophisticated (nor accurate?): Uniform distribution = 1/(number of unique values) Max distribution = (the highest occurrence of a value) / (number of records in the table) Tom -Original Message- Sent: Tuesday, May 28, 2002 4:34 PM To: Multiple recipie

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Larry Elkins
> > Based on the scarcity of previous responses to emails on this list, > it seems that histograms are not that widely used throughout the > industry. I'm not sure why. I've used them in the past, but only in very specific instances and certainly not for all tables/columns. Sometimes just 1 or 2

RE: Statistical sampling and representative stats collection

2002-05-29 Thread Larry Elkins
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of John > Kanagaraj > Sent: Tuesday, May 28, 2002 1:49 PM > To: Multiple recipients of list ORACLE-L > Subject: RE: Statistical sampling and representative stats collection > > >

RE: Statistical sampling and representative stats collection

2002-05-28 Thread John Kanagaraj
Cherie, > In my experience, histograms seem a bit hit or miss but in the > cases where they've worked, the performance improvement > has been good or even fantastic. In the cases where they haven't > helped, I've simply removed them. As I observed before, histograms help only when literal pred

RE: Statistical sampling and representative stats collection

2002-05-28 Thread Jack Silvey
tom, This is interesting. How did you determine max distribution and uniform distribution? Did you use standard deviation and variance? regards, jack silvey --- "Terrian, Tom" <[EMAIL PROTECTED]> wrote: > John, > > I know in a previous job, we determined that > histograms where not worth it.

RE: Statistical sampling and representative stats collection

2002-05-28 Thread Cherie_Machler
I have had some really good experiences with using histograms. They didn't always produce the improvements that I expected but in many cases, I saw 10 times, 100 times, or even 1,000 faster execution times after adding histograms. I don't have the specifics, but these were cases where the data w

RE: Statistical sampling and representative stats collection

2002-05-28 Thread Terrian, Tom
John, I know in a previous job, we determined that histograms where not worth it. The following is from a test that we performed: *** Table- F_tab Uniform DistributionMax Distribution Field- P_code 0.6

RE: Statistical sampling and representative stats collection

2002-05-28 Thread John Kanagaraj
Ian, > John are you saying to create histograms on all indexed > columns, or just the ones with distributions which are skewed > and also for ones which although symmetric in distribution > have some values much more prevalent than others? To keep this simplistic, I wouldn't use Histograms

RE: Statistical sampling and representative stats collection

2002-05-28 Thread John Kanagaraj
Hi Larry, I sure wasn't disappointed to receive this note from a CBO/SQL guru such as yourself. Thanks for the ideas. > My question here is would this database and your findings be > applicable to > other databases and the nature of their data? I wouldn't The nature of the data and the varian

RE: Statistical sampling and representative stats collection

2002-05-24 Thread Larry Elkins
> Hi Jack, > > > One question - you mention that an index analyze > > provides beter data distribution. Could you discuss > > what you found in more detail? > > What I meant was that the Histograms that are created during an > ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of th

RE: Statistical sampling and representative stats collection

2002-05-24 Thread MacGregor, Ian A.
John are you saying to create histograms on all indexed columns, or just the ones with distributions which are skewed and also for ones which although symmetric in distribution have some values much more prevalent than others? Ian MacGregor Stanford Linear Accelerator Center [EMAIL PROTECTED

RE: Statistical sampling and representative stats collection

2002-05-24 Thread Larry Elkins
John, Great work. I do have some comments and questions, but don't take it as being critical. Your work on this is greatly appreciated. My comments/question are in-line and only asked so that the list can bounce around some more ideas and thoughts. > --- Begin Quote --- > MYTH: "COMPUTE IS BETT

RE: Statistical sampling and representative stats collection

2002-05-24 Thread John Kanagaraj
Hi Jack, > One question - you mention that an index analyze > provides beter data distribution. Could you discuss > what you found in more detail? What I meant was that the Histograms that are created during an ANALYZE/COMPUTE on Indexes will provide an almost perfect picture of the data distrib

RE: Statistical sampling and representative stats collection

2002-05-24 Thread Jamadagni, Rajendra
John, Thanks for a very good explanation between COMPUTE and ESTIMATE. I am changing the scripts to use DBMS_STATS instead of ANALYZE on out 9012 database, but on 8i I'll keep this in mind. Thanks again Raj __ Rajendra Jamadagni MI

Re: Statistical sampling and representative stats collection

2002-05-24 Thread Tim Gorman
Fantastic research, John! - Original Message - To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]> Sent: Thursday, May 23, 2002 7:13 PM > Jack, > > I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and > arrived at the following: (cut-and-paste of releva

RE: Statistical sampling and representative stats collection

2002-05-23 Thread Jack Silvey
John, Thanks for the great input. I think at this time we are working our way towards a invalidate/load/rebuild compute index strategy, so estimate is probably not in the picture for our indexes. Since the data is ordered in the index and does not have to be sorted during an analyze, a compute sh

RE: Statistical sampling and representative stats collection

2002-05-23 Thread John Kanagaraj
Jack, I conducted some tests of ANALYZE ESTIMATE vs COMPUTE for my IOUG paper and arrived at the following: (cut-and-paste of relevant parts of the paper) --- Begin Quote --- MYTH: "COMPUTE IS BETTER THAN ESTIMATE" This one generates an endless debate actually, so we will not take a firm stand

RE: Statistical sampling and representative stats collection

2002-05-23 Thread Jared . Still
M Please respond to ORACLE-L To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]> cc: Subject: RE: Statistical sampling and representative stats collection I remember, in 96/97, I was working on a Oracle 7 database and none of the this sample()

RE: Statistical sampling and representative stats collection

2002-05-22 Thread MacGregor, Ian A.
We use dbms_stats.gather_schema_stats and do not pass it sys as a schema owner. Under 8i this method did not work for "gather empty", but did fine with "gather stale". This has been fixed in 9i. However, it still abends if one of the schema owns an iot_overflow table. Ian -Origina

RE: Statistical sampling and representative stats collection

2002-05-22 Thread Chaim . Katz
Raj, Would the temp have gotten the same result by counting the frequencies of the letters in the game of scrabble? Chaim "Jamadagni, Rajendra" <[EMAIL PROTECTED]>@fatcity.com on 05/22/2002 11:53:22 AM Please respond to [EMAIL PROTECTED] Sent by:[EMAIL PROTECTED] To:Multiple reci

Re: Statistical sampling and representative stats collection

2002-05-22 Thread Mohammad Rafiq
Jack You are right keeping in mind your application and variation of data in different tables. I just mentioned 30% as most agreed percentage for estimate. I think Dennis gave you a more pracitcial answer yesterday. Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list

Re: Statistical sampling and representative stats collection

2002-05-22 Thread Jack Silvey
That is a very good question. Anyone else know the answer? My approach to skewness right now is to compare the mean and the median of the entire dataset, since if you have two measures of centrality that are not the same, you have skewness. /jack --- [EMAIL PROTECTED] wrote: > > Jack, > > Is

RE: Statistical sampling and representative stats collection

2002-05-22 Thread Jamadagni, Rajendra
I remember, in 96/97, I was working on a Oracle 7 database and none of the this sample() was available. My task was to select x people out of z number of people randomly. We hired a temp, who took Dallas/Fort Worth dictionary (I was working in that area) and counted (yes counted) number of people

Re: Statistical sampling and representative stats collection

2002-05-22 Thread Jack Silvey
Cherie, Had some problems with DBMS_STATS, but we are making our way towards it. Just have to be careful, since it can do quirky things like analyze the dictionary. We are using dba_tab_modifications to monitor our tables. Thanks for the input. Jack --- [EMAIL PROTECTED] wrote: > > Jack, >

RE: Statistical sampling and representative stats collection

2002-05-22 Thread DENNIS WILLIAMS
Jack, Raj I agree. The main point I feel is that if you follow statistics theory, which a good part of our modern technology relies on, you will test a fixed number of samples, rather than a percentage of the table rows. For a small table, you may have to sample the entire table t

RE: Statistical sampling and representative stats collection

2002-05-22 Thread Jamadagni, Rajendra
Jack, Nielsen Ratings (the TV Rating company) monitors about 5000 people (and their TV watching habits) to supply ratings for all the shows on most of the networks for the whole United States. So, as long as you have a working and proven statistical model, and a good sample, it works. How do I kn

Re: Statistical sampling and representative stats collection

2002-05-22 Thread Cherie_Machler
Jack, Isn't the problem with this concept that it doesn't take into consideration how skewed the data is? Statistically significance would be relevant to perfectly distributed data but wouldn't you need a higher percentage of data for significance in more highly skewed data? Just something to

Re: Statistical sampling and representative stats collection

2002-05-22 Thread Cherie_Machler
Jack, What version are you on? Are you able to utilize the gather_stale option. That way you would not only be optimizing the amount you are estimating but the interval between analyzing. Of course, that option of DBMS_STATS is not available on older versions. Cherie Machler Oracle DBA Gelco

Re: Statistical sampling and representative stats collection

2002-05-21 Thread Jack Silvey
Hi Rafiq, We have been using 35 percent on our warehouse, even on our fact partitions. Now that I have thought about it for a while, that seems like a lot given the volume of data. If a representative sample can be gathered with 10,000 or 50,000 or 100,000 rows, and our fact partitions have milli

RE: Statistical sampling and representative stats collection

2002-05-21 Thread DENNIS WILLIAMS
Jack - I believe that you are on the right track. If you think about it, political pollsters randomly assess national populations with only a few thousand samples, and it works pretty good as long as it isn't biased by bad questions, lying respondents, etc. And you don't have those issues here.

Re: Statistical sampling and representative stats collection

2002-05-21 Thread Mohammad Rafiq
The most of the list memeber agrees on estimate with 30% Regards Rafiq Reply-To: [EMAIL PROTECTED] To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]> Date: Tue, 21 May 2002 13:43:33 -0800 Hi all, Did some investigation about statistical sampling this weekend since we are going