[webkit-dev] Save Page - Ideas

2008-10-30 Thread zaheer ahmad
hi,

iam working on implementing save page functionality. Looks like its not
already supported in the core. Following are some high level ideas and iam
not sure if some or all of these are the right approaches to this problem

- write the page data to the file system as and when is received - but this
is not optimal since this incurs constant overhead on page load
- apis to retreive the source (html, js, css) and image/object data
(original form) from the document. I think the parsers/loaders incrementally
handle the data and throw off the parsed text - pls validate my
understanding here.
- parse and convert all the html absolute/relative URIs to relative URIs on
the file system
- any other optimized storage methods - e.g. storing the entire page as a
single file using multipart content

please advise.

thanks,
Zaheer
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-10-30 Thread David Kilzer
On Thu, 10/30/08, zaheer ahmad <[EMAIL PROTECTED]> wrote:

> iam working on implementing save page functionality. Looks
> like its not
> already supported in the core.

Apple's Mac port saves ".webarchive" files.  The format is specific to the 
CoreFoundation framework, but there is platform-specific code that does this 
nevertheless.

> Following are some high
> level ideas and iam
> not sure if some or all of these are the right approaches
> to this problem
> 
> - write the page data to the file system as and when is
> received - but this
> is not optimal since this incurs constant overhead on page
> load

Don't do this.

> - apis to retreive the source (html, js, css) and
> image/object data
> (original form) from the document. I think the
> parsers/loaders incrementally
> handle the data and throw off the parsed text - pls
> validate my
> understanding here.

There should be API to do this already.  Look at how content for .webarchive 
files is retrieved.

> - parse and convert all the html absolute/relative URIs to
> relative URIs on
> the file system

Bug 7211: Support save as "Web page, complete" in Firefox format
https://bugs.webkit.org/show_bug.cgi?id=7211

> - any other optimized storage methods - e.g. storing the
> entire page as a
> single file using multipart content

Bug 7169: Support exporting of MHTML web archives
https://bugs.webkit.org/show_bug.cgi?id=7169

I would strongly encourage you to reuse an existing format rather than 
inventing your own.  (In my opinion the Firefox format is preferred because 
it's readable by all web browsers.)

Dave


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-10-30 Thread Darin Fisher
On Thu, Oct 30, 2008 at 9:33 AM, David Kilzer <[EMAIL PROTECTED]> wrote:

> On Thu, 10/30/08, zaheer ahmad <[EMAIL PROTECTED]> wrote:
>
> > iam working on implementing save page functionality. Looks
> > like its not
> > already supported in the core.
>
> Apple's Mac port saves ".webarchive" files.  The format is specific to the
> CoreFoundation framework, but there is platform-specific code that does this
> nevertheless.
>
> > Following are some high
> > level ideas and iam
> > not sure if some or all of these are the right approaches
> > to this problem
> >
> > - write the page data to the file system as and when is
> > received - but this
> > is not optimal since this incurs constant overhead on page
> > load
>
> Don't do this.
>
> > - apis to retreive the source (html, js, css) and
> > image/object data
> > (original form) from the document. I think the
> > parsers/loaders incrementally
> > handle the data and throw off the parsed text - pls
> > validate my
> > understanding here.
>
> There should be API to do this already.  Look at how content for
> .webarchive files is retrieved.
>
> > - parse and convert all the html absolute/relative URIs to
> > relative URIs on
> > the file system
>
> Bug 7211: Support save as "Web page, complete" in Firefox format
> https://bugs.webkit.org/show_bug.cgi?id=7211


We have code to support this feature in the Chromium code base.  You can
find it here:
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup

It is something we would love to one day see as part of WebKit.

-Darin



>
> > - any other optimized storage methods - e.g. storing the
> > entire page as a
> > single file using multipart content
>
> Bug 7169: Support exporting of MHTML web archives
> https://bugs.webkit.org/show_bug.cgi?id=7169
>
> I would strongly encourage you to reuse an existing format rather than
> inventing your own.  (In my opinion the Firefox format is preferred because
> it's readable by all web browsers.)
>
> Dave
>
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-11-05 Thread zaheer ahmad
hi darin, dave,
thanks for your inputs.

i have few comments on the patches

1- 7211 - the patch seems to serialize only the html. what about external
resources? chrome seems to rely on the cache for these resources but this
may not work in case you want to use the save page as a feature to
store/view content offline.

2-7168 - this patch uses the archive resource which seems to cache all
resources loaded including external reference in memory. though i haven't
measured the extra memory, it could be an issue for resource constrained
devices. so storing to file system may be an option at the cost of little
slower load time?

thanks,
Zaheer



On Thu, Oct 30, 2008 at 10:17 PM, Darin Fisher <[EMAIL PROTECTED]> wrote:

>
>
> On Thu, Oct 30, 2008 at 9:33 AM, David Kilzer <[EMAIL PROTECTED]> wrote:
>
>> On Thu, 10/30/08, zaheer ahmad <[EMAIL PROTECTED]> wrote:
>>
>> > iam working on implementing save page functionality. Looks
>> > like its not
>> > already supported in the core.
>>
>> Apple's Mac port saves ".webarchive" files.  The format is specific to the
>> CoreFoundation framework, but there is platform-specific code that does this
>> nevertheless.
>>
>> > Following are some high
>> > level ideas and iam
>> > not sure if some or all of these are the right approaches
>> > to this problem
>> >
>> > - write the page data to the file system as and when is
>> > received - but this
>> > is not optimal since this incurs constant overhead on page
>> > load
>>
>> Don't do this.
>>
>> > - apis to retreive the source (html, js, css) and
>> > image/object data
>> > (original form) from the document. I think the
>> > parsers/loaders incrementally
>> > handle the data and throw off the parsed text - pls
>> > validate my
>> > understanding here.
>>
>> There should be API to do this already.  Look at how content for
>> .webarchive files is retrieved.
>>
>> > - parse and convert all the html absolute/relative URIs to
>> > relative URIs on
>> > the file system
>>
>> Bug 7211: Support save as "Web page, complete" in Firefox format
>> https://bugs.webkit.org/show_bug.cgi?id=7211
>
>
> We have code to support this feature in the Chromium code base.  You can
> find it here:
>
> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
>
> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup
>
> It is something we would love to one day see as part of WebKit.
>
> -Darin
>
>
>  
>>
>> > - any other optimized storage methods - e.g. storing the
>> > entire page as a
>> > single file using multipart content
>>
>> Bug 7169: Support exporting of MHTML web archives
>> https://bugs.webkit.org/show_bug.cgi?id=7169
>>
>> I would strongly encourage you to reuse an existing format rather than
>> inventing your own.  (In my opinion the Firefox format is preferred because
>> it's readable by all web browsers.)
>>
>> Dave
>>
>>
>> ___
>> webkit-dev mailing list
>> webkit-dev@lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-11-05 Thread Maciej Stachowiak


On Oct 30, 2008, at 9:47 AM, Darin Fisher wrote:



We have code to support this feature in the Chromium code base.  You  
can find it here:

http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup

It is something we would love to one day see as part of WebKit.


WebKit already includes code to serialize the DOM, in WebCore/editing/ 
markup.cpp. This is used by innerHTML, XMLSerializer, the clipboard  
code, Web archives, and other things. I think a better approach to a  
"save as Web page, complete" style feature would be to use the  
existing DOM serialization code (fixing bugs, if necessary), instead  
of adding completely separate DOM serialization code. The only tricky  
part is fixing up URL references in the markup to point to the right  
place for saved subresources.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-11-05 Thread Darin Fisher
On Wed, Nov 5, 2008 at 7:04 AM, Maciej Stachowiak <[EMAIL PROTECTED]> wrote:

>
> On Oct 30, 2008, at 9:47 AM, Darin Fisher wrote:
>
>
> We have code to support this feature in the Chromium code base.  You can
> find it here:
>
> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
>
> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup
>
> It is something we would love to one day see as part of WebKit.
>
>
> WebKit already includes code to serialize the DOM, in
> WebCore/editing/markup.cpp. This is used by innerHTML, XMLSerializer, the
> clipboard code, Web archives, and other things. I think a better approach to
> a "save as Web page, complete" style feature would be to use the existing
> DOM serialization code (fixing bugs, if necessary), instead of adding
> completely separate DOM serialization code. The only tricky part is fixing
> up URL references in the markup to point to the right place for saved
> subresources.
>
> Regards,
> Maciej
>
>

I agree.  I would rather see that happen too.  Our code was created because
we didn't want to fork WebCore, but we are happy to see it die in favor of
enchancements to markup.cpp.  There is more than just fixing up URLs.  We
also need to deal with charset encoding issues, base tags, and motw (for
windows).  There may be a few other subtle details to get right.

-Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-11-05 Thread Maciej Stachowiak


On Nov 5, 2008, at 10:22 AM, Darin Fisher wrote:

On Wed, Nov 5, 2008 at 7:04 AM, Maciej Stachowiak <[EMAIL PROTECTED]>  
wrote:


On Oct 30, 2008, at 9:47 AM, Darin Fisher wrote:



We have code to support this feature in the Chromium code base.   
You can find it here:

http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup

It is something we would love to one day see as part of WebKit.


WebKit already includes code to serialize the DOM, in WebCore/ 
editing/markup.cpp. This is used by innerHTML, XMLSerializer, the  
clipboard code, Web archives, and other things. I think a better  
approach to a "save as Web page, complete" style feature would be to  
use the existing DOM serialization code (fixing bugs, if necessary),  
instead of adding completely separate DOM serialization code. The  
only tricky part is fixing up URL references in the markup to point  
to the right place for saved subresources.


Regards,
Maciej



I agree.  I would rather see that happen too.  Our code was created  
because we didn't want to fork WebCore, but we are happy to see it  
die in favor of enchancements to markup.cpp.  There is more than  
just fixing up URLs.  We also need to deal with charset encoding  
issues, base tags, and motw (for windows).  There may be a few other  
subtle details to get right.


Your code also has some obvious bugs that are not in the WebCore code.  
For instance it serializes the following incorrectly:



bold line

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Save Page - Ideas

2008-11-05 Thread Darin Fisher
On Wed, Nov 5, 2008 at 11:06 AM, Maciej Stachowiak <[EMAIL PROTECTED]> wrote:

>
> On Nov 5, 2008, at 10:22 AM, Darin Fisher wrote:
>
> On Wed, Nov 5, 2008 at 7:04 AM, Maciej Stachowiak <[EMAIL PROTECTED]> wrote:
>
>>
>> On Oct 30, 2008, at 9:47 AM, Darin Fisher wrote:
>>
>>
>> We have code to support this feature in the Chromium code base.  You can
>> find it here:
>>
>> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.h?view=markup
>>
>> http://src.chromium.org/viewvc/chrome/trunk/src/webkit/glue/dom_serializer.cc?view=markup
>>
>> It is something we would love to one day see as part of WebKit.
>>
>>
>> WebKit already includes code to serialize the DOM, in
>> WebCore/editing/markup.cpp. This is used by innerHTML, XMLSerializer, the
>> clipboard code, Web archives, and other things. I think a better approach to
>> a "save as Web page, complete" style feature would be to use the existing
>> DOM serialization code (fixing bugs, if necessary), instead of adding
>> completely separate DOM serialization code. The only tricky part is fixing
>> up URL references in the markup to point to the right place for saved
>> subresources.
>>
>> Regards,
>> Maciej
>>
>>
>
> I agree.  I would rather see that happen too.  Our code was created because
> we didn't want to fork WebCore, but we are happy to see it die in favor of
> enchancements to markup.cpp.  There is more than just fixing up URLs.  We
> also need to deal with charset encoding issues, base tags, and motw (for
> windows).  There may be a few other subtle details to get right.
>
>
> Your code also has some obvious bugs that are not in the WebCore code. For
> instance it serializes the following incorrectly:
>
> 
> bold line
>
> Regards,
> Maciej
>
>

Thanks for pointing that out.

-Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev