Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-21 Thread James Stroud
I need to correct myself here before someone else does. I didn't actually reverse the probabilities as promised for the failing case. It was late last night and I was starting to get a little cloudy. Pf(D|H) = 0.2 (We *guess* a 20% chance by random any column is Int.) This can be read

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-21 Thread Neil Cerutti
On 2007-05-20, John Machin [EMAIL PROTECTED] wrote: On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-21 Thread py_genetic
This is excellect advise, thank you gentelman. Paddy: We can't really, in this arena make assumtions about the data source. I fully agree with your point, but if we had the luxury of really knowing the source we wouldn't be having this conversation. Files we can deal with could be consumer data

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-21 Thread James Stroud
py_genetic wrote: Using a baysian method were my inital thoughts as well. The key to this method, I feel is getting a solid random sample of the entire file without having to load the whole beast into memory. If you feel only the first 1000 rows are representative, then you can take a random

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread James Stroud
John Machin wrote: Against that background, please explain to me how I can use results from previous tables as priors. Cheers, John It depends on how you want to model your probabilities, but, as an example, you might find the following frequencies of columns in all tables you have parsed

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread James Stroud
James Stroud wrote: Now with one test positive for Int, you are getting pretty certain you have an Int column. Now we take a second cell randomly from the same column and find that it too casts to Int. P_2(H) = 0.9607843-- Confidence its an Int column from round 1 P(D|H) = 0.98

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread John Machin
On 20/05/2007 5:47 PM, James Stroud wrote: John Machin wrote: Against that background, please explain to me how I can use results from previous tables as priors. Cheers, John It depends on how you want to model your probabilities, but, as an example, you might find the following

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread James Stroud
John Machin wrote: So, all in all, Bayesian inference doesn't seem much use in this scenario. This is equivalent to saying that any statistical analysis doesn't seem much use in this scenario--but you go ahead and use statistics anyway? -- http://mail.python.org/mailman/listinfo/python-list

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread Paddy
On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote: On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread James Stroud
John Machin wrote: The model would have to be a lot more complicated than that. There is a base number of required columns. The kind suppliers of the data randomly add extra columns, randomly permute the order in which the columns appear, and, for date columns I'm going to ignore this

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread John Machin
On 20/05/2007 8:52 PM, Paddy wrote: On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote: On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread Paddy
On May 20, 1:12 pm, John Machin [EMAIL PROTECTED] wrote: On 20/05/2007 8:52 PM, Paddy wrote: On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote: On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread John Machin
On May 21, 2:04 am, Paddy [EMAIL PROTECTED] wrote: On May 20, 1:12 pm, John Machin [EMAIL PROTECTED] wrote: On 20/05/2007 8:52 PM, Paddy wrote: On May 20, 2:16 am, John Machin [EMAIL PROTECTED] wrote: On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-20 Thread George Sakkis
On May 18, 7:07 pm, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file,

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-19 Thread James Stroud
John Machin wrote: The approach that I've adopted is to test the values in a column for all types, and choose the non-text type that has the highest success rate (provided the rate is greater than some threshold e.g. 90%, otherwise it's text). For large files, taking a 1/N sample can

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-19 Thread John Machin
On 19/05/2007 9:17 PM, James Stroud wrote: John Machin wrote: The approach that I've adopted is to test the values in a column for all types, and choose the non-text type that has the highest success rate (provided the rate is greater than some threshold e.g. 90%, otherwise it's text).

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-19 Thread John Machin
On 19/05/2007 3:14 PM, Paddy wrote: On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the

converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-18 Thread py_genetic
Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file, eg. height weight income etc.) to determine the most

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-18 Thread Dustan
On May 18, 6:07 pm, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file,

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-18 Thread James Stroud
py_genetic wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file, eg. height weight income etc.) to

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-18 Thread John Machin
On 19/05/2007 10:04 AM, James Stroud wrote: py_genetic wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv

Re: converting strings to most their efficient types '1' -- 1, 'A' --- 'A', '1.2'--- 1.2

2007-05-18 Thread Paddy
On May 19, 12:07 am, py_genetic [EMAIL PROTECTED] wrote: Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the attributes (colls in a data base/ csv file,