Am 21.12.15 um 09:24 schrieb Peter Otten:
Steven D'Aprano wrote:
I have a large number of strings (originally file names) which tend to
fall into two groups. Some are human-meaningful, but not necessarily
dictionary words e.g.:
baby lions at play
saturday_morning12
Fukushima
ImpossibleFork
Am 21.12.15 um 11:36 schrieb Steven D'Aprano:
On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote:
Apfelkiste:Tests chris$ python score_my.py
-8.74 baby lions at play
-7.63 saturday_morning12
-6.38 Fukushima
-5.72 ImpossibleFork
-10.6 xy39mGWbosjY
-12.9 9sjz7s8198ghwt
-12.1
On Monday 21 December 2015 14:45, Ben Finney wrote:
> Steven D'Aprano writes:
>
>> Let's call the second group "random" and the first "non-random",
>> without getting bogged down into arguments about whether they are
>> really random or not.
>
> I think we should discuss
2015-12-21 4:01 GMT+01:00 Steven D'Aprano :
> I have a large number of strings (originally file names) which tend to fall
> into two groups. Some are human-meaningful, but not necessarily dictionary
> words e.g.:
>
>
> baby lions at play
> saturday_morning12
> Fukushima
>
On Mon, Dec 21, 2015 at 7:25 AM, Vlastimil Brom
wrote:
> > baby lions at play
> > saturday_morning12
> > Fukushima
> > ImpossibleFork
> >
> >
> > (note that some use underscores, others spaces, and some CamelCase) while
> > others are completely meaningless (or mostly
On Sunday, December 20, 2015 at 10:22:57 PM UTC-6, Chris Angelico wrote:
> DuckDuckGo doesn't give a result count, so I skipped it. Yahoo search yielded:
So why bother to mention it then? Is this another one of your "pikeish"
propaganda campaigns?
--
Steven D'Aprano wrote:
> I have a large number of strings (originally file names) which tend to
> fall into two groups. Some are human-meaningful, but not necessarily
> dictionary words e.g.:
>
>
> baby lions at play
> saturday_morning12
> Fukushima
> ImpossibleFork
>
>
> (note that some use
On Monday 21 December 2015 15:22, Chris Angelico wrote:
> On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano
> wrote:
>> I have a large number of strings (originally file names) which tend to
>> fall into two groups. Some are human-meaningful, but not necessarily
>> dictionary
On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote:
> Apfelkiste:Tests chris$ python score_my.py
> -8.74 baby lions at play
> -7.63 saturday_morning12
> -6.38 Fukushima
> -5.72 ImpossibleFork
> -10.6 xy39mGWbosjY
> -12.9 9sjz7s8198ghwt
> -12.1 rz4sdko-28dbRW00u
> Apfelkiste:Tests
Am 21.12.15 um 11:53 schrieb Christian Gollwitzer:
So for the spaces, either use a proper trainig material (some long
corpus from Wikipedia or such), with punctuation removed. Then it will
catch the correct probabilities at word boundaries. Or preprocess by
removing the spaces.
Christian
On 21/12/15 03:01, Steven D'Aprano wrote:
> I have a large number of strings (originally file names) which tend to fall
> into two groups. Some are human-meaningful, but not necessarily dictionary
> words e.g.:
>
>
> baby lions at play
> saturday_morning12
> Fukushima
> ImpossibleFork
>
>
>
On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote:
> Finite state machine / transition matrix. Learn from some English text
> source. Then process your strings by lower casing, replacing underscores
> with spaces, removing trailing numeric characters etc. Base your score
On 21/12/2015 16:49, Ian Kelly wrote:
On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote:
Finite state machine / transition matrix. Learn from some English text
source. Then process your strings by lower casing, replacing underscores
with spaces, removing trailing
Steven D'Aprano writes:
> Does anyone have any suggestions for how to do this? Preferably something
> already existing. I have some thoughts and/or questions:
I think I'd just look at the set of digraphs or trigraphs in each name
and see if there are a lot that aren't found
On 21/12/15 16:49, Ian Kelly wrote:
> On Mon, Dec 21, 2015 at 9:40 AM, duncan smith wrote:
>> Finite state machine / transition matrix. Learn from some English text
>> source. Then process your strings by lower casing, replacing underscores
>> with spaces, removing
Steven D'Aprano writes:
> Let's call the second group "random" and the first "non-random",
> without getting bogged down into arguments about whether they are
> really random or not.
I think we should discuss it, even at risk of getting bogged down. As
you know better than
On Mon, Dec 21, 2015 at 2:01 PM, Steven D'Aprano wrote:
> I have a large number of strings (originally file names) which tend to fall
> into two groups. Some are human-meaningful, but not necessarily dictionary
> words e.g.:
>
>
> baby lions at play
> saturday_morning12
>
I have a large number of strings (originally file names) which tend to fall
into two groups. Some are human-meaningful, but not necessarily dictionary
words e.g.:
baby lions at play
saturday_morning12
Fukushima
ImpossibleFork
(note that some use underscores, others spaces, and some CamelCase)
18 matches
Mail list logo