Re: tricky parsing question

2004-02-12 Thread Paul Hoffman
Sorry for the delay; I'm catching up. On Thu Jan 22, 2004, wren argetlahm [EMAIL PROTECTED] wrote: I'm working on a linguistic module Great! I'm an avid would-be linguist myself. and I'm trying to find a good way to split a string up into segments. I can't assume single charecter strings and

Re: tricky parsing question

2004-02-12 Thread Matt Diephouse
Wren Argetlahm wrote: I'm working on a linguistic module and I'm trying to find a good way to split a string up into segments. I can't assume single charecter strings and want to assume maximal segments. As an example, the word church would be rendered as the list ('ch', 'u', 'r', 'ch') and

Re: tricky parsing question

2004-01-23 Thread Bill Stephenson
wren, You need to get a book on regex's. Perl.com has the best available, Mastering Regular Expressions is what you want. Sounds like a formidable task though. For some additional help with your regex you can play with a tool posted on the perlhelp.com web site. Go to Resources and look for

Re: tricky parsing question

2004-01-23 Thread Bill Stephenson
Well, both the problem and the project are way over my head, but it looks to me like something that will only be solved with brute force. I think if your string is split into words and the segments are sorted longest to shortest, alphabetically, then your could sort words by the first letter

Re: tricky parsing question

2004-01-23 Thread David Cantrell
On Thu, Jan 22, 2004 at 09:28:30PM -0800, wren argetlahm wrote: --- Bill Stephenson [EMAIL PROTECTED] wrote: You need to get a book on regex's. I know the solution lies in regex's I don't. I expect the code would be a lot clearer and considerably quicker if you pull your strings apart using

Re: tricky parsing question

2004-01-23 Thread Wiggins d Anconia
On Thu, 22 Jan 2004, wren argetlahm wrote: snip Maybe Parse::RecDescent? Maybe I'm over-thinking this... This is what I thought of immediately, an old but excellent article maybe a good place to start: http://search.cpan.org/src/DCONWAY/Parse-RecDescent-1.94/tutorial/tutorial.html

Re: tricky parsing question

2004-01-23 Thread Chris Devers
On Thu, 22 Jan 2004, wren argetlahm wrote: --- Rick Measham [EMAIL PROTECTED] wrote: Wren, when you say 'segments' it appears you mean phonemes or phonetics. Yeah, I do mean phonemes (or something like it). The module is language independent, but I'll check those modules out. That's

Re: tricky parsing question

2004-01-23 Thread wren argetlahm
--- Chris Devers [EMAIL PROTECTED] wrote: Do you need to handle ambiguities? For example, -ough can famously be pronounced several ways: The way it's set up now can't deal with them, but I'm about to rewrite the thing to handle more than one segment having the same orthographic representation.

tricky parsing question

2004-01-22 Thread wren argetlahm
I'm working on a linguistic module and I'm trying to find a good way to split a string up into segments. I can't assume single charecter strings and want to assume maximal segments. As an example, the word church would be rendered as the list ('ch', 'u', 'r', 'ch') and wouldn't break the ch up

Re: tricky parsing question

2004-01-22 Thread Rick Measham
On 23 Jan 2004, at 01:21 pm, wren argetlahm wrote: I'm working on a linguistic module and I'm trying to find a good way to split a string up into segments. I can't assume single charecter strings and want to assume maximal segments. As an example, the word church would be rendered as the list

Re: tricky parsing question

2004-01-22 Thread Chris Devers
On Thu, 22 Jan 2004, wren argetlahm wrote: I'm working on a linguistic module and I'm trying to find a good way to split a string up into segments. Your definition os segment here is vague; is it safe to ignore that and just accept that a canonical list of each language's 'segments' is a

Re: tricky parsing question

2004-01-22 Thread wren argetlahm
--- Bill Stephenson [EMAIL PROTECTED] wrote: You need to get a book on regex's. I know the solution lies in regex's, the problem is that I can't quite figure out a generic enough way of doing it. The problem is for a module and so the list of valid segments is user defined. I guess I could do