Unclear in that are you saying? Is a__ where _ is a space is valid or can three spaces be valid? So are you working with only alpha and spaces or alpha, numbers and spaces?
 
Wags ;)
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of amit hetawal
Sent: Wednesday, February 22, 2006 12:09
To: $Bill Luebkert
Cc: perl-win32-users@listserv.ActiveState.com
Subject: Re: Re: Tri-grams?




On Wed, 22 Feb 2006 $Bill Luebkert wrote :
>amit hetawal wrote:
>
> >
> > Hello All,
> >  I needed some logic behind the code for text analysis. its like i have
> > to calculate the frequency of the each of the tri-grams present in a
> > given piece of text
> > i.e i need a way to get the number of occurence of 't' follwoed by 'he'
> > for the word 'the' in whole of the text.
> >
> > for text : ' hello how are you '
> > i need to get occurences of 'h' follwed by 'el' then 'e' follwed by 'll'...
> > and so on for each of the words present if a space is present then it is
> > considered as a type of character only...
> > Can you please help me with this...
> > I was able to do it for bigrams where i had only 2 characters and used 2
> > d arrays...but for 3 charcters i am still lost...
> > Can anybody help me with this...
>
>I would just make a hash and walk the string char by char using substr.
>You could make it more generic if 3 isn't the only ossible length - I
>just checked each char to make sure it wasn't a space rather than adding
>a second length loop.
>
>use strict;
>use warnings;
>use Data::Dumper; $Data::Dumper::Indent=1; $Data::Dumper::Sortkeys=1;
>
>$_ = ' hello how are you hello ';
>
>my %hash;
>my $len = length $_;
>for (my $ii = 0; $ii < $len; ++$ii) {
>      my $char = substr $_, $ii, 1;
>      next if $char eq ' ';
>      my $char2 = substr $_, $ii+1, 1;
>      next if $char2 eq ' ';
>      my $char3 = substr $_, $ii+2, 1;
>      next if $char3 eq ' ';
>      ++$hash{$char.$char2.$char3};
>}
>print Data::Dumper->Dump([\%hash], [qw(\%hash)]);
>
>__END__
>
>Result:
>
>$\%hash = {
>  'are' => 1,
>  'ell' => 2,
>  'hel' => 2,
>  'how' => 1,
>  'llo' => 2,
>  'you' => 1
>};
>

Hello Bill
Thanks for your help...
i got the initial part running but now i have so read the text from a large text file havin all the sentences and special characters and in that i have to get the trigrams working and not only for the above text ...can you please suggest how should i format the file and access the hash values which i store for all of the whole text ...
a tri gram with 'ab_' will also work since i have to consider the space as a character ...too..
please help :(
thanks..





*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to