Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

Art Kendall Fri, 07 May 2010 04:37:41 -0700


On 5/6/2010 8:52 PM, Dave Angel wrote:

I got my own copy of the papers, athttp://thomas.loc.gov/home/histdox/fedpaper.txt
I copied your code, and added logic to it to initialize termlist fromthe actual file. And it does complete the output file at 83 lines,approx 17000 columns per line (because most counts are one digit). Ittakes quite a while, and perhaps you weren't waiting for it tocomplete. I'd suggest either adding a print to the loop, showing thecount, and/or adding a line that prints "done" after the loopterminates normally.
I watched memory usage, and as expected, it didn't get very high.There are things you need to redesign, however. One is that all thepunctuation and digits and such need to be converted to spaces.
DaveA


Thank you for going the extra mile.

I obtained my copy before I retired in 2001 and there are somedifferences. In the current copy from the LOC papers 7, 63, and 81start with "FEDERALIST." (an extra period). That explains why you have83. There also some comments such as attributed author. After theweekend, I'll do a file compare and see differences in more detail.

Please email me your version of the code. I'll try it as is. Then I'llput in a counter, have it print the count and paper number, and a 'done'message.

As a check after reading in the counts, I'll include the counts fromNoteTab and see if these counts sum to those from NoteTab.

I'll use SPSS to create a version of the .txt file with punctuation andnumerals changed to spaces and try using that as the corpus. Then I'lltry to create a similar file with Python.


Art
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

Reply via email to