Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer
Am 21.12.15 um 09:24 schrieb Peter Otten: Steven D'Aprano wrote: I have a large number of strings (originally file names) which tend to fall into two groups. Some are human-meaningful, but not necessarily dictionary words e.g.: baby lions at play saturday_morning12 Fukushima ImpossibleFork

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer
Am 21.12.15 um 11:36 schrieb Steven D'Aprano: On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote: Apfelkiste:Tests chris$ python score_my.py -8.74 baby lions at play -7.63 saturday_morning12 -6.38 Fukushima -5.72 ImpossibleFork -10.6 xy39mGWbosjY -12.9 9sjz7s8198ghwt -12.1

Re: Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

2015-12-21 Thread Steven D'Aprano
On Monday 21 December 2015 14:45, Ben Finney wrote: > Steven D'Aprano writes: > >> Let's call the second group "random" and the first "non-random", >> without getting bogged down into arguments about whether they are >> really random or not. > > I think we should discuss

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Vlastimil Brom
2015-12-21 4:01 GMT+01:00 Steven D'Aprano : > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima >

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Vincent Davis
On Mon, Dec 21, 2015 at 7:25 AM, Vlastimil Brom wrote: > > baby lions at play > > saturday_morning12 > > Fukushima > > ImpossibleFork > > > > > > (note that some use underscores, others spaces, and some CamelCase) while > > others are completely meaningless (or mostly

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Rick Johnson
On Sunday, December 20, 2015 at 10:22:57 PM UTC-6, Chris Angelico wrote: > DuckDuckGo doesn't give a result count, so I skipped it. Yahoo search yielded: So why bother to mention it then? Is this another one of your "pikeish" propaganda campaigns? --

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Peter Otten
Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to > fall into two groups. Some are human-meaningful, but not necessarily > dictionary words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima > ImpossibleFork > > > (note that some use

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Steven D'Aprano
On Monday 21 December 2015 15:22, Chris Angelico wrote: > On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano > wrote: >> I have a large number of strings (originally file names) which tend to >> fall into two groups. Some are human-meaningful, but not necessarily >> dictionary

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Steven D'Aprano
On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote: > Apfelkiste:Tests chris$ python score_my.py > -8.74 baby lions at play > -7.63 saturday_morning12 > -6.38 Fukushima > -5.72 ImpossibleFork > -10.6 xy39mGWbosjY > -12.9 9sjz7s8198ghwt > -12.1 rz4sdko-28dbRW00u > Apfelkiste:Tests

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer
Am 21.12.15 um 11:53 schrieb Christian Gollwitzer: So for the spaces, either use a proper trainig material (some long corpus from Wikipedia or such), with punctuation removed. Then it will catch the correct probabilities at word boundaries. Or preprocess by removing the spaces. Christian

Re: Catogorising strings into random versus non-random

2015-12-21 Thread duncan smith
On 21/12/15 03:01, Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima > ImpossibleFork > > >

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Ian Kelly
On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: > Finite state machine / transition matrix. Learn from some English text > source. Then process your strings by lower casing, replacing underscores > with spaces, removing trailing numeric characters etc. Base your score

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Mark Lawrence
On 21/12/2015 16:49, Ian Kelly wrote: On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: Finite state machine / transition matrix. Learn from some English text source. Then process your strings by lower casing, replacing underscores with spaces, removing trailing

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Paul Rubin
Steven D'Aprano writes: > Does anyone have any suggestions for how to do this? Preferably something > already existing. I have some thoughts and/or questions: I think I'd just look at the set of digraphs or trigraphs in each name and see if there are a lot that aren't found

Re: Catogorising strings into random versus non-random

2015-12-21 Thread duncan smith
On 21/12/15 16:49, Ian Kelly wrote: > On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: >> Finite state machine / transition matrix. Learn from some English text >> source. Then process your strings by lower casing, replacing underscores >> with spaces, removing

Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

2015-12-20 Thread Ben Finney
Steven D'Aprano writes: > Let's call the second group "random" and the first "non-random", > without getting bogged down into arguments about whether they are > really random or not. I think we should discuss it, even at risk of getting bogged down. As you know better than

Re: Catogorising strings into random versus non-random

2015-12-20 Thread Chris Angelico
On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 >

Catogorising strings into random versus non-random

2015-12-20 Thread Steven D'Aprano
I have a large number of strings (originally file names) which tend to fall into two groups. Some are human-meaningful, but not necessarily dictionary words e.g.: baby lions at play saturday_morning12 Fukushima ImpossibleFork (note that some use underscores, others spaces, and some CamelCase)