Sure, I can push a caja.jar that splits off the html-sanitizer
depended javascript out of domita-minified.  I'm adopting the
following names:

* domita-minified.js (domita+caja without html sanitizer)
* html-sanitizer-minified.js (html4-defs + css-defs + html-sanitizer)

With this, I think there's minimal breakage if we bump the version of
caja in shindig/pom.xml in the same change that adds
html-sanitizer-minified to shindig/features/core/feature.txt.

Some features of html-sanitizer to be aware of... it expects and
outputs balanced set of tags.  So it will ignore extraneous close tags
or insert closing tags are necessary.  I can't find any documentation
on what sanitzeHTML is supposed to output other than that it is safe
to set innerHTML to.  If the behaviour of html-sanitizer is
acceptable, it should probably be added to the documentation
somewhere.

All urls are passed to the url_policy function - that's an excellent
place to check for javascript urls.

On Wed, Aug 13, 2008 at 4:18 PM, Brian Eaton <[EMAIL PROTECTED]> wrote:
> [+jasvir]
>
> Jas, can you have a look at the patch attached to SHINDIG-346?
> https://issues.apache.org/jira/browse/SHINDIG-346
>
> The problem we have is that there is no way to pull in just
> html_sanitize from Caja, if we do then we get the Caja object
> annotation that is Not Cool for gadgets that aren't ready for it.
>
> Can we modify the caja jar so that it exports something like
> html-sanitize-minified.js separately from domita-minified.js?
> domita-minified.js would need to assume that somebody else had already
> brought in html_sanitize.
>
> I think this is how it would work:
>
> features/core/feature.txt:
>
> <gadget>
> ...
>  <script src="res://com/google/caja/.../html-sanitize-minified.js">
>  <script src="util.js">
> ....
>
>
> util.js:
>  gadgets.util.sanitizeHtml = function() {
>      use html_sanitize from html-sanitize-minified.js
>
>
> features/caja/feature.xml
>  <script 
> src="res://com/google/caja/.../domita-minified-without-html-sanitize.js">
>
>
> Because the core gadgets APIs are present in every gadget, you're
> guaranteed that html_sanitize will be present when domita is brought
> in.
>
> Other Caja consumers would need a similar thing, pulling in
> html_sanitize from Caja first, then domita-without-html_sanitize next.
>
> Cheers,
> Brian
>
> On Wed, Aug 13, 2008 at 12:22 AM, Reema Sardana <[EMAIL PROTECTED]> wrote:
>> Thanks for the reference. I took a look at his implementation. Has been
>> implemented very neatly. I guess I can steal most of his implementation
>> then.
>>>
>>>
>>> The other function is for validating URLs.  He suggested that we
>>> implement that by using the regular expression from RFC 3986 Appendix
>>> B to parse the URLs, doing whatever checks we need, and then
>>> reassembling them with encodeURIComponent.
>>
>>
>> Pardon for my ignorance here. The purpose of html sanitizer is to return
>> something that can be safely assigned to innerHTML. Why do we need to
>> validate URL's?  Do we bother if a URL is not valid? In other words, can it
>> be unsafe in any ways?
>>
>> - Reema
>>
>>
>>> On Fri, Aug 8, 2008 at 12:23 PM, Ropu <[EMAIL PROTECTED]> wrote:
>>> > nor
>>> >
>>> > <iframe src="javascript:..." />
>>> >
>>> > On Fri, Aug 8, 2008 at 6:08 PM, Brian Eaton <[EMAIL PROTECTED]> wrote:
>>> >
>>> >> Hi Reema -
>>> >>
>>> >> Thanks for looking at this.  You can probably build your
>>> >> implementation on top of the html_sanitize function in
>>> >> features/caja/html-sanitizer.js.
>>> >>
>>> >> Questions answered inline:
>>> >>
>>> >> On Thu, Aug 7, 2008 at 11:58 AM, Reema Sardana <[EMAIL PROTECTED]>
>>> wrote:
>>> >> > The reference at
>>> >> >
>>> http://opensocial-resources.googlecode.com/svn/spec/0.8/gadgets/util.jsdoes
>>> >> > not give any details on how the HTML is to be sanitized. Whether it
>>> >> should
>>> >> > use a blacklist or a whitelist depends on how much flexibility we want
>>> to
>>> >> > give to the gadget.
>>> >>
>>> >> Whitelist, definitely a whitelist.
>>> >>
>>> >> > I was looking at implementing this but I am not sure If I am
>>> considering
>>> >> > everything that needs to be taken care of.
>>> >> >
>>> >> > 1. Strip all script tags of the form <script
>>> >>
>>> >> Yes.
>>> >>
>>> >> > 2. Strip tags of the form <a onclick="javascript:alert('foo')">bar</a>
>>> >>
>>> >> Yes.
>>> >>
>>> >> > 3. Applets ?
>>> >>
>>> >> Not allowed, likewise no flash/activex/anything similar.
>>> >>
>>> >> > 4. <div style="width: expression(alert(1))">hello</div>
>>> >>
>>> >> Also not allowed.
>>> >>
>>> >> Another case to be sure to block: <a href='javascript:something()'>
>>> >>
>>> >> Cheers,
>>> >> Brian
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > .-. --- .--. ..-
>>> > R  o  p  u
>>> >
>>>
>>
>

Reply via email to