In the spec preview it had a section about UTF-8 decoding and the handling of invalid byte sequences, http://dev.w3.org/html5/spec-preview/infrastructure.html#utf-8 . But I have noticed this section has been removed from the current version. So what algorithm is used for handling of invalid UTF-8 byte sequences? Or this no longer part of the HTML 5 specification?
My testing on firefox and chrome seems to indicate that they follow the algorithm of replacing the first byte of an invalid sequence with the replacement character <http://en.wikipedia.org/wiki/Replacement_character> "�" (U+FFFD) and then continue with the parsing of the next byte.