I need to correct myself here before someone else does. I didn't
actually reverse the probabilities as promised for the failing case. It
was late last night and I was starting to get a little cloudy.
Pf(D|H) = 0.2 (We *guess* a 20% chance by random any column is Int.)
This can be read
On 2007-05-20, John Machin [EMAIL PROTECTED] wrote:
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
This is excellect advise, thank you gentelman.
Paddy:
We can't really, in this arena make assumtions about the data source.
I fully agree with your point, but if we had the luxury of really
knowing the source we wouldn't be having this conversation. Files we
can deal with could be consumer data
py_genetic wrote:
Using a baysian method were my inital thoughts as well. The key to
this method, I feel is getting a solid random sample of the entire
file without having to load the whole beast into memory.
If you feel only the first 1000 rows are representative, then you can
take a random
John Machin wrote:
Against that background, please explain to me how I can use
results from previous tables as priors.
Cheers,
John
It depends on how you want to model your probabilities, but, as an
example, you might find the following frequencies of columns in all
tables you have parsed
James Stroud wrote:
Now with one test positive for Int, you are getting pretty certain you
have an Int column. Now we take a second cell randomly from the same
column and find that it too casts to Int.
P_2(H) = 0.9607843-- Confidence its an Int column from round 1
P(D|H) = 0.98
On 20/05/2007 5:47 PM, James Stroud wrote:
John Machin wrote:
Against that background, please explain to me how I can use results
from previous tables as priors.
Cheers,
John
It depends on how you want to model your probabilities, but, as an
example, you might find the following
John Machin wrote:
So, all in all, Bayesian inference doesn't seem much use in this scenario.
This is equivalent to saying that any statistical analysis doesn't seem
much use in this scenario--but you go ahead and use statistics anyway?
--
http://mail.python.org/mailman/listinfo/python-list
On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote:
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling
John Machin wrote:
The model would have to be a lot more complicated than that. There is a
base number of required columns. The kind suppliers of the data randomly
add extra columns, randomly permute the order in which the columns
appear, and, for date columns
I'm going to ignore this
On 20/05/2007 8:52 PM, Paddy wrote:
On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote:
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing
On May 20, 1:12 pm, John Machin [EMAIL PROTECTED] wrote:
On 20/05/2007 8:52 PM, Paddy wrote:
On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote:
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of
On May 21, 2:04 am, Paddy [EMAIL PROTECTED] wrote:
On May 20, 1:12 pm, John Machin [EMAIL PROTECTED] wrote:
On 20/05/2007 8:52 PM, Paddy wrote:
On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote:
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL
On May 18, 7:07 pm, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file,
John Machin wrote:
The approach that I've adopted is to test the values in a column for all
types, and choose the non-text type that has the highest success rate
(provided the rate is greater than some threshold e.g. 90%, otherwise
it's text).
For large files, taking a 1/N sample can
On 19/05/2007 9:17 PM, James Stroud wrote:
John Machin wrote:
The approach that I've adopted is to test the values in a column for
all types, and choose the non-text type that has the highest success
rate (provided the rate is greater than some threshold e.g. 90%,
otherwise it's text).
On 19/05/2007 3:14 PM, Paddy wrote:
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file, eg. height weight income etc.) to determine the
most
On May 18, 6:07 pm, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file,
py_genetic wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file, eg. height weight income etc.) to
On 19/05/2007 10:04 AM, James Stroud wrote:
py_genetic wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file,
22 matches
Mail list logo