Re: parse html rendered by js

2011-02-12 Thread Miki Tebeka
> There seems no Rhino for linux. Rhino is written in Java. "java -jar js.jar" works fine on my Linux machine. -- http://mail.python.org/mailman/listinfo/python-list

Re: parse html rendered by js

2011-02-12 Thread yanghq
There seems no Rhino for linux. Spidermonkey won't support document , window and something else in js, so it won't help me a lot. On Sat, 2011-02-12 at 05:57 -0800, john wrote: > Even though I've never tried it, you may want to look into running the html > thru a separate javascript engine, lik

Re: parse html rendered by js

2011-02-12 Thread john
Even though I've never tried it, you may want to look into running the html thru a separate javascript engine, like spidermonkey or rhino, and then parse the results of that. On Friday, February 11, 2011 2:20:32 AM UTC-6, yanghq wrote: > hi, > I wanna get attribute value like href,src... in

Re: parse html rendered by js

2011-02-12 Thread Javier Collado
Hello, 2011/2/11 yanghq : >    but for some pages rendered by js, like: You could use selenium or windmill to help you reproduce the contents of the web page in a browser so you can get the data from the DOM tree once the page has been rendered instead of by parsing the js. Best regards, Jav

Re: parse html rendered by js

2011-02-11 Thread yanghq
thank u for your reply. yeah, my end goal is something like screen scraping a web site. Duplicating the Javascript behaviour in my Python code will be a huge burden,I'm afraid time can't aford it. someone say that webkit / pamie and other browser engine can render js to html,but pamie is only wo

parse html rendered by js

2011-02-11 Thread yanghq
hi, I wanna get attribute value like href,src... in html. for simple html page libxml2dom can help me parse it into dom, and get what I want; but for some pages rendered by js, like: document.write( ''+ ''+ ''+ ''+ ''+ ''+ ''+ ''+ ''+ ''+ ''+ '' ) how can I get the atrribute v