subject:"Catogorising strings into random versus non\-random"

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer

Am 21.12.15 um 09:24 schrieb Peter Otten: Steven D'Aprano wrote: I have a large number of strings (originally file names) which tend to fall into two groups. Some are human-meaningful, but not necessarily dictionary words e.g.: baby lions at play saturday_morning12 Fukushima ImpossibleFork

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer

Am 21.12.15 um 11:36 schrieb Steven D'Aprano: On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote: Apfelkiste:Tests chris$ python score_my.py -8.74 baby lions at play -7.63 saturday_morning12 -6.38 Fukushima -5.72 ImpossibleFork -10.6 xy39mGWbosjY -12.9 9sjz7s8198ghwt -12.1

Re: Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

2015-12-21 Thread Steven D'Aprano

On Monday 21 December 2015 14:45, Ben Finney wrote: > Steven D'Aprano writes: > >> Let's call the second group "random" and the first "non-random", >> without getting bogged down into arguments about whether they are >> really random or not. > > I think we should discuss

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Vlastimil Brom

2015-12-21 4:01 GMT+01:00 Steven D'Aprano : > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima >

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Vincent Davis

On Mon, Dec 21, 2015 at 7:25 AM, Vlastimil Brom wrote: > > baby lions at play > > saturday_morning12 > > Fukushima > > ImpossibleFork > > > > > > (note that some use underscores, others spaces, and some CamelCase) while > > others are completely meaningless (or mostly

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Rick Johnson

On Sunday, December 20, 2015 at 10:22:57 PM UTC-6, Chris Angelico wrote: > DuckDuckGo doesn't give a result count, so I skipped it. Yahoo search yielded: So why bother to mention it then? Is this another one of your "pikeish" propaganda campaigns? --

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Peter Otten

Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to > fall into two groups. Some are human-meaningful, but not necessarily > dictionary words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima > ImpossibleFork > > > (note that some use

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Steven D'Aprano

On Monday 21 December 2015 15:22, Chris Angelico wrote: > On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano > wrote: >> I have a large number of strings (originally file names) which tend to >> fall into two groups. Some are human-meaningful, but not necessarily >> dictionary

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Steven D'Aprano

On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote: > Apfelkiste:Tests chris$ python score_my.py > -8.74 baby lions at play > -7.63 saturday_morning12 > -6.38 Fukushima > -5.72 ImpossibleFork > -10.6 xy39mGWbosjY > -12.9 9sjz7s8198ghwt > -12.1 rz4sdko-28dbRW00u > Apfelkiste:Tests

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Christian Gollwitzer

Am 21.12.15 um 11:53 schrieb Christian Gollwitzer: So for the spaces, either use a proper trainig material (some long corpus from Wikipedia or such), with punctuation removed. Then it will catch the correct probabilities at word boundaries. Or preprocess by removing the spaces. Christian

Re: Catogorising strings into random versus non-random

2015-12-21 Thread duncan smith

On 21/12/15 03:01, Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 > Fukushima > ImpossibleFork > > >

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Ian Kelly

On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: > Finite state machine / transition matrix. Learn from some English text > source. Then process your strings by lower casing, replacing underscores > with spaces, removing trailing numeric characters etc. Base your score

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Mark Lawrence

On 21/12/2015 16:49, Ian Kelly wrote: On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: Finite state machine / transition matrix. Learn from some English text source. Then process your strings by lower casing, replacing underscores with spaces, removing trailing

Re: Catogorising strings into random versus non-random

2015-12-21 Thread Paul Rubin

Steven D'Aprano writes: > Does anyone have any suggestions for how to do this? Preferably something > already existing. I have some thoughts and/or questions: I think I'd just look at the set of digraphs or trigraphs in each name and see if there are a lot that aren't found

Re: Catogorising strings into random versus non-random

2015-12-21 Thread duncan smith

On 21/12/15 16:49, Ian Kelly wrote: > On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote: >> Finite state machine / transition matrix. Learn from some English text >> source. Then process your strings by lower casing, replacing underscores >> with spaces, removing

Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

2015-12-20 Thread Ben Finney

Steven D'Aprano writes: > Let's call the second group "random" and the first "non-random", > without getting bogged down into arguments about whether they are > really random or not. I think we should discuss it, even at risk of getting bogged down. As you know better than

Re: Catogorising strings into random versus non-random

2015-12-20 Thread Chris Angelico

On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano wrote: > I have a large number of strings (originally file names) which tend to fall > into two groups. Some are human-meaningful, but not necessarily dictionary > words e.g.: > > > baby lions at play > saturday_morning12 >

Catogorising strings into random versus non-random

2015-12-20 Thread Steven D'Aprano

I have a large number of strings (originally file names) which tend to fall into two groups. Some are human-meaningful, but not necessarily dictionary words e.g.: baby lions at play saturday_morning12 Fukushima ImpossibleFork (note that some use underscores, others spaces, and some CamelCase)

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Re: Catogorising strings into random versus non-random

Categorising strings on meaningful–meaningless spectrum (was: Catogorising strings into random versus non-random)

Re: Catogorising strings into random versus non-random

Catogorising strings into random versus non-random

18 matches

Site Navigation

Mail list logo

Footer information