Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
Thanks Guys! I've written several functions yesterday to import from different types of raw data including html and different text formats. In the end I never used the extract function or the parser module, but your advice put me on the right track. All these functions are now in a single object a

Re: Mining strings from a HTML document.

2006-01-26 Thread Runsun Pan
>def extract(text,s1,s2): >''' Extract strings wrapped between s1 and s2. > >>>> t="""this is a test for extract() >that does multiple extract """ >>>> extract(t,'','') >['test', 'extract()', 'does multiple extract'] > >''' >beg = [1,0

Re: Mining strings from a HTML document.

2006-01-26 Thread Magnus Lycka
Derick van Niekerk wrote: > Could you/anyone explain the 4 lines of code to me though? A crash > course in Python shorthand? What does it mean when you use two sets of > brackets as in : beg = [1,0][text.startswith(s1)] ? It's not as strange as it looks. [1,0] is a list. If you put [] after a list

Re: Mining strings from a HTML document.

2006-01-26 Thread Cameron Laird
In article <[EMAIL PROTECTED]>, Derick van Niekerk <[EMAIL PROTECTED]> wrote: . . . >I suppose very few books on python start off with HTML processing in >stead of 'hello world' :p .

Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
Runsun Pan helped me out with the following: You can also try the following very primitive solution that I sometimes use to extract simple information in a quick and dirty way: def extract(text,s1,s2): ''' Extract strings wrapped between s1 and s2. >>> t="""this is a

Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
I'm battling to understand this. I am switching to python while in a production environment so I am tossed into the deep end. Python seems easier to learn than other languages, but some of the conventions still trip me up. Thanks for the link - I'll have to go through all the previous chapters to u

Re: Mining strings from a HTML document.

2006-01-25 Thread Derick van Niekerk
Thanks, Jay! I'll try this out today. Trying to write my own parser is such a pain. This BeatifullSoup script is very nice! I'll give it a try. If you can help me out with an example of how to do what I explained, I would appreciate it. I actually finished doing an import last night, but there is

Re: Mining strings from a HTML document.

2006-01-25 Thread Chris Lasher
I think Jay's advice is solid: you shouldn't rule out HTML parsing. It's not too scary and it's probably not overboard. Using a common HTML parsing library saves you from having to write and debug your own parser. Try looking at Dive Into Python's chapter on it, first. http://www.diveintopython.org

Re: Mining strings from a HTML document.

2006-01-25 Thread jay graves
Derick van Niekerk wrote: > What are the string functions I would use and how would I use them? I > saw something about html parsing in python, but that might be overkill. > Babysteps. Despite your reluctance, I would still recommend an HTML parsing module. I like BeautifulSoup. http://www.crummy.

Mining strings from a HTML document.

2006-01-25 Thread Derick van Niekerk
Hi, I am new to Python and have been doing most of my work with PHP until now. I find Python to be *much* nicer for the development of local apps (running on my machine) but I am very new to the Python way of thinking and I don't realy know where to start other than just by doing it...so far I'm j