Hi, 

I haven't made a servlet to do this, but I made a jsp-tag that can do this. 

If you don't want to move the images from one server to another (from google to yours) 
as a proxy would do it, then you must parse the HTML, and change all the urls for css, 
img, hrefs, javascripts and a lot more so that they are "fully qualified" urls such as 
http://google.com/images/logo.gif but not only /images/logo.gif or such. 

This is usually not very complicated, but it can be a little tricky, especially with 
javascripts and such. 
I used regular expression to do this, more specifically the jakarta-oro package.. I 
still recommend some serverside cacheing of parsed pages, as this can be quite process 
demanding procedure. 

If you find some library to do this, please tell us about it.

There are some libraries that might help doing the http-requests, so check that one 
out, its HTTPClient:
http://www.innovation.ch/java/HTTPClient/

Hope it helps, 
-reynir

> -----Original Message-----
> From: Jason Novotny [mailto:jdnovotny@;lbl.gov] 
> Sent: 9. nóvember 2002 22:44
> To: Tomcat Users List; Jetspeed Developers List
> Subject: retrieving remote web content
> 
> 
> 
>     Hi,
> 
>     I'm trying to develop a servlet that can act as a proxy 
> for another 
> web site-- lets' say I'm trying to provide the content of 
> www.google.com. It seems I can retrieve and cache the HTML using a 
> URLConnection, but what about the resources used by the HTML 
> like gif's 
> and jpg's. Somehow I need to parse the HTML and get those 
> separately? Is 
> there a library out there for doing what I describe? Maybe 
> I'm missing 
> something relaly simple...
> 
> 
>     Thanks, Jason
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:tomcat-user-> [EMAIL PROTECTED]>
> For 
> additional commands, 
> e-mail: <mailto:tomcat-user-help@;jakarta.apache.org>
> 
> 

--
To unsubscribe, e-mail:   <mailto:tomcat-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:tomcat-user-help@;jakarta.apache.org>

Reply via email to