Intercodes wrote: > Hello everyone, > > Iam new to this mailing list as well as python(uptime-3 weeks).Today > I learnt about RE from http://www.amk.ca/python/howto/regex/ > <http://www.amk.ca/python/howto/regex/%22RE%27s>.This one was really > helpful. I started working out with few examples on my own. The first > one was to collect all the HTML tags used in an HTML file. > > I get the output but with tags repeated. I want to display all the tags > used in a file ,but no repetitions.Say the output to one of the HTML > file I got was : "<html><link> <a><br><a><br>"
You might consider Beautiful Soup or another HTML parser to collect the tags. Then use a set to find unique tags. For example (Python 2.4 version), >>> import urllib >>> from BeautifulSoup import BeautifulSoup as BS >>> data = urllib.urlopen('http://www.python.org').read() >>> bs = BS(data) >>> help(bs.fetch) Help on method fetch in module BeautifulSoup: fetch(self, name=None, attrs={}, recursive=True, text=None, limit=None) method of BeautifulSoup.BeautifulSoup instance Extracts a list of Tag objects that match the given criteria. You can specify the name of the Tag and any attributes you want the Tag to have. >>> tags = set(tag.name for tag in bs.fetch()) >>> sorted(tags) ['a', 'b', 'body', 'br', 'center', 'div', 'font', 'form', 'h4', 'head', 'html', 'i', 'img', 'input', 'li', 'link', 'meta', 'p', 'small', 'table', 'td', 'title', 'tr', 'ul'] http://www.crummy.com/software/BeautifulSoup/index.html Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor