Hi Gregory,

Is the filesize var well reinited to "0" at the begining of each new year stories parsing ?

Best,
--
Pierre Sahores
mobile : 06 03 95 77 70
www.sahores-conseil.com


Le 16 juil. 09 à 23:32, Gregory Lypny a écrit :

        Hello everyone,

Sorry for the long message. I'm scratching my head on this one, and I'd be interested to know what you think. I'm doing some research processing stories released by Canada NewsWire from 1999 through 2003. I've got one text file of stories for each year, five in total. I created a Revolution stack to read these flat files, identify where each story begins and ends (see the Sample Story at the bottom of this message), and grab the headlines and some other information.

What I expected to find is that number of stories would grow year by year with the growing popularity of news on the Internet. And that is true, except for the last year, 2003, where the number of stories is the lowest (see table Stats on the Stories). What doesn't make sense is that 2003 is the biggest file at 144 MB. So, I figure there must be something in my script that is causing me to skip stories in 2003 but I can't find it. I identify the start of each story by the five lines like those in the sample that have

        cnnw000020011206dxc600795
        592 Words
        06 December 2001
        16:57 GMT
        Canada NewsWire

I've browsed through the 2003 file and the format does not appear to have changed. I also replaced line endings for every block of text I read in to make sure that isn't messing me up.

        replace crlf with return in it
        replace numToChar(13) with return in it

The average number of words per story has remained roughly the same for all five years, so how is it that the 2003 file can be roughly three times bigger than the 1999 file yet have 4,000 fewer stories! What am I missing here?

        Regards,

                Gregory


STATS ON THE STORIES

Year            Number of stories               Number of words         File 
size (MB)
1999    17,653                          7,950,395                       53.8
2000    25,887                          13,714,615                      92.4
2001    32,764                          17,996,931                      121.3
2002    37,403                          20,160,555                      137
2003    13,668                          8,341,830                       144.2


SAMPLE STORY

Factiva (R) Dow Jones & Reuters
---------------------------------------------
Yahoo! Canada en francais launches Shopping Guide
cnnw000020011206dxc600795
592 Words
06 December 2001
16:57 GMT
Canada NewsWire
English
(Copyright Canada NewsWire 2001)

Search in French, Connect in French and now buy in French on Yahoo!

Canada en francais

Yahoo! Canada en francais - always open

TORONTO, Dec. 6 /CNW/ - Yahoo! Canada en francais today announced the launch of a new shopping guide for French speaking Canadian consumers. Yahoo! Canada en francais Shopping is an ideal solution for francophones who want the convenience of shopping from home, plus a variety of shopping options. Shoppers can get started right away by going to francais.yahoo.ca Shop now and check out great Canadian stores like Compaq Canada, Sony Style, and Camelot, a division owned by Archambault.
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution



_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to