Sure, I can push a caja.jar that splits off the html-sanitizer depended javascript out of domita-minified. I'm adopting the following names:
* domita-minified.js (domita+caja without html sanitizer) * html-sanitizer-minified.js (html4-defs + css-defs + html-sanitizer) With this, I think there's minimal breakage if we bump the version of caja in shindig/pom.xml in the same change that adds html-sanitizer-minified to shindig/features/core/feature.txt. Some features of html-sanitizer to be aware of... it expects and outputs balanced set of tags. So it will ignore extraneous close tags or insert closing tags are necessary. I can't find any documentation on what sanitzeHTML is supposed to output other than that it is safe to set innerHTML to. If the behaviour of html-sanitizer is acceptable, it should probably be added to the documentation somewhere. All urls are passed to the url_policy function - that's an excellent place to check for javascript urls. On Wed, Aug 13, 2008 at 4:18 PM, Brian Eaton <[EMAIL PROTECTED]> wrote: > [+jasvir] > > Jas, can you have a look at the patch attached to SHINDIG-346? > https://issues.apache.org/jira/browse/SHINDIG-346 > > The problem we have is that there is no way to pull in just > html_sanitize from Caja, if we do then we get the Caja object > annotation that is Not Cool for gadgets that aren't ready for it. > > Can we modify the caja jar so that it exports something like > html-sanitize-minified.js separately from domita-minified.js? > domita-minified.js would need to assume that somebody else had already > brought in html_sanitize. > > I think this is how it would work: > > features/core/feature.txt: > > <gadget> > ... > <script src="res://com/google/caja/.../html-sanitize-minified.js"> > <script src="util.js"> > .... > > > util.js: > gadgets.util.sanitizeHtml = function() { > use html_sanitize from html-sanitize-minified.js > > > features/caja/feature.xml > <script > src="res://com/google/caja/.../domita-minified-without-html-sanitize.js"> > > > Because the core gadgets APIs are present in every gadget, you're > guaranteed that html_sanitize will be present when domita is brought > in. > > Other Caja consumers would need a similar thing, pulling in > html_sanitize from Caja first, then domita-without-html_sanitize next. > > Cheers, > Brian > > On Wed, Aug 13, 2008 at 12:22 AM, Reema Sardana <[EMAIL PROTECTED]> wrote: >> Thanks for the reference. I took a look at his implementation. Has been >> implemented very neatly. I guess I can steal most of his implementation >> then. >>> >>> >>> The other function is for validating URLs. He suggested that we >>> implement that by using the regular expression from RFC 3986 Appendix >>> B to parse the URLs, doing whatever checks we need, and then >>> reassembling them with encodeURIComponent. >> >> >> Pardon for my ignorance here. The purpose of html sanitizer is to return >> something that can be safely assigned to innerHTML. Why do we need to >> validate URL's? Do we bother if a URL is not valid? In other words, can it >> be unsafe in any ways? >> >> - Reema >> >> >>> On Fri, Aug 8, 2008 at 12:23 PM, Ropu <[EMAIL PROTECTED]> wrote: >>> > nor >>> > >>> > <iframe src="javascript:..." /> >>> > >>> > On Fri, Aug 8, 2008 at 6:08 PM, Brian Eaton <[EMAIL PROTECTED]> wrote: >>> > >>> >> Hi Reema - >>> >> >>> >> Thanks for looking at this. You can probably build your >>> >> implementation on top of the html_sanitize function in >>> >> features/caja/html-sanitizer.js. >>> >> >>> >> Questions answered inline: >>> >> >>> >> On Thu, Aug 7, 2008 at 11:58 AM, Reema Sardana <[EMAIL PROTECTED]> >>> wrote: >>> >> > The reference at >>> >> > >>> http://opensocial-resources.googlecode.com/svn/spec/0.8/gadgets/util.jsdoes >>> >> > not give any details on how the HTML is to be sanitized. Whether it >>> >> should >>> >> > use a blacklist or a whitelist depends on how much flexibility we want >>> to >>> >> > give to the gadget. >>> >> >>> >> Whitelist, definitely a whitelist. >>> >> >>> >> > I was looking at implementing this but I am not sure If I am >>> considering >>> >> > everything that needs to be taken care of. >>> >> > >>> >> > 1. Strip all script tags of the form <script >>> >> >>> >> Yes. >>> >> >>> >> > 2. Strip tags of the form <a onclick="javascript:alert('foo')">bar</a> >>> >> >>> >> Yes. >>> >> >>> >> > 3. Applets ? >>> >> >>> >> Not allowed, likewise no flash/activex/anything similar. >>> >> >>> >> > 4. <div style="width: expression(alert(1))">hello</div> >>> >> >>> >> Also not allowed. >>> >> >>> >> Another case to be sure to block: <a href='javascript:something()'> >>> >> >>> >> Cheers, >>> >> Brian >>> >> >>> > >>> > >>> > >>> > -- >>> > .-. --- .--. ..- >>> > R o p u >>> > >>> >> >

