I have tracked down the source of the U+0000 characters in the Stripes code
base. They are being inserted into the output generated by the
LayoutWriter class, when changing whether the "Silent" status of the
writer. I am not sure quite what that status affects, but during the
toggling of the silent status, the setSilent(boolean, PageContext) method
includes the following statement:
pageContext.getOut().write(TOGGLE);
The value of TOGGLE is 0, which the writer inserts into the UTF-8 document
as \u0000.
The documentation indicates that TOGGLE is the control character that, when
encountered in the output stream, toggles the silent state.
If I comment out that statement, then the problematic \u0000 characters do
not appear in my HTML output and so software, like the Facebook crawler, is
able to understand my page content. I could change the character to
something else that is problematic in HTML 5 pages (The
http://html5.validator.nu/ picks them up as a problem.) but I am not sure
what could be used that might not already occur in a page for other
reasons, signaling a "toggle" when one is not appropriate.
There do not appear to be any other untoward side effects for my site that
result from eliminating this character insertion but making this kind of
change gives me concerns given that I still do not really understand what
the silent and non-silent states imply.
Some guidance on what the TOGGLE character is doing for me would be really
helpful at this stage.
Thanks for any thoughts on this issue.
Geoff Shuetrim
Background information:
I have been doing my testing using the source code for Stripes release
1.3.7 and Freemarker 2.3.19 and using a Tomcat 6 server. The combination
of Stripes 1.3.7 and Freemarker 2.3.19 seems to also raise another problem
in relation to dynamic attributes on the Stripes layout-render tag but I
will deal with that in a separate email.
On 29 May 2012 12:38, Geoff Shuetrim <ge...@galexy.net> wrote:
> I have only recently noticed that when I use Stripes layout tags, I am
> getting unexpected U+0000 characters added to the pages where the stripes
> layout tags were. In a hex editor, these characters are 00.
>
> You can see (or actually you need an editor like emacs to kind of see)
> these characters if you look at the page source for
> http://www.gaiaguide.info/do/Hierarchy
>
> That page is running on Stripes 1.5.6 but in a test environment I can
> replicate the generation of these characters with the latest versions of
> Stripes and Freemarker.
>
> These characters are causing me problems with page scrapers like that used
> by Facebook, which encounter the characters and then decide that they
> cannot see the page (and so links shared from the site get no useful
> information added to them by Facebook).
>
> You can see this by feeding the example URL to:
>
> http://developers.facebook.com/tools/debug
>
> and to:
>
> http://html5.validator.nu/
>
> This HTML5 validator picks out the U+0000 characters clearly.
>
> A working page on the same server is at http://www.gaiaguide.info/Test.jsp
>
> Any pointers to the parts of the Stripes source where I can start to look
> for how these are being generated would be much appreciated.
>
> Thanks
>
> Geoff Shuetrim
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Stripes-users mailing list
Stripes-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/stripes-users