Hi Gregory,
Is the filesize var well reinited to "0" at the begining of each new
year stories parsing ?
Best,
--
Pierre Sahores
mobile : 06 03 95 77 70
www.sahores-conseil.com
Le 16 juil. 09 à 23:32, Gregory Lypny a écrit :
Hello everyone,
Sorry for the long message. I'm scratching my head on this one,
and I'd be interested to know what you think. I'm doing some
research processing stories released by Canada NewsWire from 1999
through 2003. I've got one text file of stories for each year, five
in total. I created a Revolution stack to read these flat files,
identify where each story begins and ends (see the Sample Story at
the bottom of this message), and grab the headlines and some other
information.
What I expected to find is that number of stories would grow year
by year with the growing popularity of news on the Internet. And
that is true, except for the last year, 2003, where the number of
stories is the lowest (see table Stats on the Stories). What
doesn't make sense is that 2003 is the biggest file at 144 MB. So,
I figure there must be something in my script that is causing me to
skip stories in 2003 but I can't find it. I identify the start of
each story by the five lines like those in the sample that have
cnnw000020011206dxc600795
592 Words
06 December 2001
16:57 GMT
Canada NewsWire
I've browsed through the 2003 file and the format does not appear to
have changed. I also replaced line endings for every block of text
I read in to make sure that isn't messing me up.
replace crlf with return in it
replace numToChar(13) with return in it
The average number of words per story has remained roughly the same
for all five years, so how is it that the 2003 file can be roughly
three times bigger than the 1999 file yet have 4,000 fewer stories!
What am I missing here?
Regards,
Gregory
STATS ON THE STORIES
Year Number of stories Number of words File
size (MB)
1999 17,653 7,950,395 53.8
2000 25,887 13,714,615 92.4
2001 32,764 17,996,931 121.3
2002 37,403 20,160,555 137
2003 13,668 8,341,830 144.2
SAMPLE STORY
Factiva (R) Dow Jones & Reuters
---------------------------------------------
Yahoo! Canada en francais launches Shopping Guide
cnnw000020011206dxc600795
592 Words
06 December 2001
16:57 GMT
Canada NewsWire
English
(Copyright Canada NewsWire 2001)
Search in French, Connect in French and now buy in French on Yahoo!
Canada en francais
Yahoo! Canada en francais - always open
TORONTO, Dec. 6 /CNW/ - Yahoo! Canada en francais today announced
the launch of a new shopping guide for French speaking Canadian
consumers. Yahoo! Canada en francais Shopping is an ideal solution
for francophones who want the convenience of shopping from home,
plus a variety of shopping options. Shoppers can get started right
away by going to francais.yahoo.ca Shop now and check out great
Canadian stores like Compaq Canada, Sony Style, and Camelot, a
division owned by Archambault.
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution