Sorry for the delay; I'm catching up.
On Thu Jan 22, 2004, wren argetlahm [EMAIL PROTECTED] wrote:
I'm working on a linguistic module
Great! I'm an avid would-be linguist myself.
and I'm trying to
find a good way to split a string up into segments.
I can't assume single charecter strings and
Wren Argetlahm wrote:
I'm working on a linguistic module and I'm trying to
find a good way to split a string up into segments.
I can't assume single charecter strings and want to
assume maximal segments. As an example, the word
church would be rendered as the list ('ch', 'u',
'r', 'ch') and
wren,
You need to get a book on regex's. Perl.com has the best available,
Mastering Regular Expressions is what you want.
Sounds like a formidable task though. For some additional help with
your regex you can play with a tool posted on the perlhelp.com web
site. Go to Resources and look for
Well, both the problem and the project are way over my head, but it
looks to me like something that will only be solved with brute force.
I think if your string is split into words and the segments are sorted
longest to shortest, alphabetically, then your could sort words by the
first letter
On Thu, Jan 22, 2004 at 09:28:30PM -0800, wren argetlahm wrote:
--- Bill Stephenson [EMAIL PROTECTED] wrote:
You need to get a book on regex's.
I know the solution lies in regex's
I don't. I expect the code would be a lot clearer and considerably
quicker if you pull your strings apart using
On Thu, 22 Jan 2004, wren argetlahm wrote:
snip
Maybe Parse::RecDescent? Maybe I'm over-thinking this...
This is what I thought of immediately, an old but excellent article
maybe a good place to start:
http://search.cpan.org/src/DCONWAY/Parse-RecDescent-1.94/tutorial/tutorial.html
On Thu, 22 Jan 2004, wren argetlahm wrote:
--- Rick Measham [EMAIL PROTECTED] wrote:
Wren, when you say 'segments' it appears you
mean phonemes or phonetics.
Yeah, I do mean phonemes (or something like it). The
module is language independent, but I'll check those
modules out.
That's
--- Chris Devers [EMAIL PROTECTED] wrote:
Do you need to handle ambiguities? For example,
-ough can famously be pronounced several ways:
The way it's set up now can't deal with them, but I'm
about to rewrite the thing to handle more than one
segment having the same orthographic representation.
I'm working on a linguistic module and I'm trying to
find a good way to split a string up into segments.
I can't assume single charecter strings and want to
assume maximal segments. As an example, the word
church would be rendered as the list ('ch', 'u',
'r', 'ch') and wouldn't break the ch up
On 23 Jan 2004, at 01:21 pm, wren argetlahm wrote:
I'm working on a linguistic module and I'm trying to
find a good way to split a string up into segments.
I can't assume single charecter strings and want to
assume maximal segments. As an example, the word
church would be rendered as the list
On Thu, 22 Jan 2004, wren argetlahm wrote:
I'm working on a linguistic module and I'm trying to
find a good way to split a string up into segments.
Your definition os segment here is vague; is it safe to ignore that and
just accept that a canonical list of each language's 'segments' is a
--- Bill Stephenson [EMAIL PROTECTED] wrote:
You need to get a book on regex's.
I know the solution lies in regex's, the problem is
that I can't quite figure out a generic enough way of
doing it. The problem is for a module and so the list
of valid segments is user defined. I guess I could do
12 matches
Mail list logo