[EMAIL PROTECTED] wrote:
> Im trying to sort out why the last line in the routine tokenizer()
> fails to strip off the trailing double quote for all but the last
> token. 
> 
> AFAICT, none of the tokens have any newlines - so I'm at a loss as to
> why it does not do what is expected - (this function is originally
> from the DICT server Jiten which I'm trying to use under Win32 - this
> problem seems to stop the server from finding the proper results
> since it misinterprets the request from the client. I've also
> reworked the initial tokenizer regex and it now seems to do what is
> needed for the strings I have seen up till now.)
> 
> TIA
> Arnold
> ===================================================
> # Test program to test the tokenizer for jiten
> 
> use strict;
> use warnings;
> sub tokenize;
> 
> my $t;
> my @tokens = ();
> @tokens = tokenize '"define" "*" "admin" "admin" "admin"';
> foreach $t (@tokens) {
>     print "$t\n";
> }
> #####################################################################
> # Other
> sub tokenize {
>    my $line = shift;
>    my @tokens;
>    # this _should_ decompose a line into it's individual 'tokens', but
>    # fails for a line such as 'DEFINE "*" "admin"
>    # it lumps the last two tokens into one
>    # original line
>    #  while($line=~s/(\".+\"|\S+)\s*/push @tokens,$1;'';/e) { ; }
>    # my attempts at fixes - seems to work so far
>    while($line=~s/ (          # handle plain words - no quotes
>                    (\S+\s+)   #   one or more 'word' char followed
        I think you are foretgetting that \S+ hits " along with the letters, 
etc.
I switched the first and second around and got the following output:
[C:/Common] aapl004w
*1*: "define" "*" "admin" "admin" "admin"  <-- My print of what is coming in
<"define";"*";"admin";"admin";"admin">   <-- What is the @tokens w/in the sub
define  <---- Your printed output after the sub call.
*
admin
admin
admin

Wags ;)

>                               #   by one or more non-word char
>                       |       # or handle quoted strings,
>                    (\"\S+\")  #   quote + one or more of any 'word'
>                               #   char followed by a trailing quote
>                               # this will NOT handle any quoted
>                               #   strings containing spaces or tabs
>                    )
>                    \s*        # followed by 0 or more
>                               #   whitespace chars = [ \t\n\r\f]
>                  /push @tokens,$1;'';
>                  /ex)    # evaluate => execute the push....
>     { ; }     # do nothing loop
> 
>    # this is supposed to strip off the leading and trailing " from
>    each # token but fails for all tokens with quotes except the last
>    one; # it strips the leading quote but leaves the trailing quote
>    for all # but the last one
>    return map{s/^\"//;s/\"$//;$_;} @tokens;
> }
> 
> _______________________________________________
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************


_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to