I went to the URL you posted, and it looks like that error is the content you should be recieving. Try refreshing your browser cache, you could be loading a cached page.
Charles yookyung wrote: > I am trying to crawl webpages in citeseer domain (a collection of research > papers mostly in computer science). > > I have used the following code snippet. > > ##### > import urllib > > sock = urllib.urlopen("http://citeseer.ist.psu.edu") > webcontent = sock.read().split('\n') > sock.close() > print webcontent > ######## > > Then I get the following error message. > > > ['<!--#set var="TITLE" value="Server error!"', '--><!--#include > virtual="include/top.html" -->', '', ' <!--#if > expr="$REDIRECT_ERROR_NOTES" -->', '', ' The server encountered an > internal error and was ', ' unable to complete your request.', '', ' > <!--#include virtual="include/spacer.html" -->', '', ' Error message:', ' > <br /><!--#echo encoding="none" var="REDIRECT_ERROR_NOTES" -->', '', ' > <!--#else -->', '', ' The server encountered an internal error and was ', > ' unable to complete your request. Either the server is', ' overloaded > or there was an error in a CGI script.', '', ' <!--#endif -->', '', > '<!--#include virtual="include/bottom.html" -->', ''] > > However, the url is valid and it works fine if I open the url in my web > browser. > Or, if I use a different url (http://www.google.com instead of > http://citeseer.ist.psu.edu), > then it works. > > What is wrong? > Could it be that the citeseer webserver checks the http request, and it sees > something > that it doesn't like and reject the request? > What should I do? > > Thank you. > > Best regards, > Yookyung -- http://mail.python.org/mailman/listinfo/python-list