Hello Isaac, This second posting you have made has provided more information about what you are trying to accomplish and how (and also was readable, where the first one looked like it got mangled by your mail user agent; it's best to try to post only plain text messages to this sort of mailing list).
I suspect that we can help you a bit more, now. If we knew even more about what you were looking to do, we might be able to help you further (with all of the usual remarks about how we won't do your homework for you, but all of us volunteers will gladly help you understand the tools, the systems, the world of Python and anything else we can suggest in the realm of computers, computer science and problem solving). I will credit the person who assigned this task for you, as this is not dissimilar from the sort of problem that one often has when facing a new practical computing problem. Often (and in your case) there is opaque structure and hidden assumptions in the question which need to be understood. See further below.... These were your four lines of code: >with >urllib.request.urlopen("https://www.sdstate.edu/electrical-engineering-and-computer-science") > as cs: > cs_page = cs.read() > soup = BeautifulSoup(cs_page, "html.parser") > print(len(soup.body.find_all(string = ["Engineering","engineering"]))) The fourth line is an impressive attempt at compressing all of the searching, finding, counting and reporting steps into a single line. Your task (I think), is more complicated than that single line can express. So, that will need to be expanded to a few more lines of code. You may have heard these aphorisms before: * brevity is the soul of wit * fewer lines of code are better * prefer a short elegant solution But, when complexity intrudes into brevity, the human mind struggles. As a practitioner, I will say that I spend more of my time reading and understanding code than writing it, so writing simple, self-contained and understandable units of code leads to intelligibility for humans and composability for systems. Try this at a Python console [1]. import this >i used control + f on the link in the code and i get 11 for ctrl + >f and 3 for the code Applause! Look at the raw data! Study the raw data! That is an excellent way to start to try to understand the raw data. You must always go back to the raw input data and then consider whether your tooling or the data model in your program matches what you are trying to extract/compute/transform. The answer (for number of occurrences of the word 'engineering', case-insensitive) that I get is close to your answer when searching with control + f, but is a bit larger than 11. Anyway, here are my thoughts. I will start with some tips that are relevant to your 4-line pasted program: * BeautifulSoup is wonderfully convenient, but also remember it is another high-level tool; it is often forgiving where other tools are more rigorous, however it is excellent for learning and (I hope you see below) that it is a great tool for the problem you are trying to solve * in your code, soup.body is a handle that points to the <body> tag of the HTML document you have fetched; so why can't you simply find_all of the strings "Engineering" and "engineering" in the text and count them? - find_all is a method that returns all of the tags in the structured document below (in this case) soup.body - your intent is not to count tags with the string 'engineering' but rather , you are looking for that string in the text (I think) * it is almost always a mistake to try to process HTML with regular expressions, however, it seems that you are trying to find all matches of the (case-insensitive) word 'engineering' in the text of this document; that is something tailor-made for regular expressions, so there's the Python regular expression library, too: 'import re' * and on a minor note, since you are using urllib.request.open() in a with statement (using contexts this way is wonderful), you could collect the data from the network socket, then drop out of the 'with' block to allow the context to close, so if your block worked as you wanted, you could adjust it as follows: with urllib.request.urlopen(uri as cs: cs_page = cs.read() soup = BeautifulSoup(cs_page, "html.parser") print(len(soup.body.find_all(string = ["Engineering","engineering"]))) * On a much more minor point, I'll mention that urllib / urllib2 are available with the main Python releases but there are other libraries for handling fetching; I often recommend the third-party requests [0] library, as it is both very Pythonic, reasonably high-level and frightfully flexible So, connecting the Zen of Python [1] to your problem, I would suggest making shorter, simpler lines and separating the logic. See below: Here are some code suggestions. * collect the relevant data: Once you have fetched the text into a variable, get just the part that you know you want to process as pure text, for example: soup = BeautifulSoup(r.text, "html.parser") bodytext = soup.body.text * walk/process/compute the data: search that text to find the subset of data you wish to operate on or which are the answer: pattern = re.compile('engineering', re.I) matches = re.findall(pattern, bodytext) * report to the end user: Finally, print it out print('Found "engineering" (case-insensitive) %d times.' % (len(matches),)) Good luck and enjoy Python, -Martin [0] http://docs.python-requests.org/en/master/ url = "https://www.sdstate.edu/electrical-engineering-and-computer-science" r = requests.get(url) if not r.ok: # -- die/return/handle-error here soup = BeautifulSoup(r.text, "html.parser") [1] You do use the Python console to explore Python, your data and your code, don't you? $ python3 Python 3.4.5 (default, Jul 03 2016, 13:55:08) [GCC] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! -- Martin A. Brown http://linux-ip.net/ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor