On Tue, Nov 10, 2020 at 5:50 PM Gergo Tisza <gti...@wikimedia.org> wrote:

> On Tue, Nov 3, 2020 at 1:59 AM Daniel Kinzler <dkinz...@wikimedia.org>
> wrote:
>
>> TemplateData already uses JSON serialization, but then compresses the
>> JSON output, to make the data fit into the page_props table. This results
>> in binary data in ParserOutput, which we can't directly put into JSON.
>
>
> I'm not sure I understand the problem. Binary data can be trivially
> represented as JSON, by treating it as a string. Is it an issue of storage
> size? JSON escaping of the control characters is (assuming binary data with
> a somewhat random distribution of bytes) an ~50% size increase, UTF-8
> encoding the top half of bytes is another 50%, so it will approximately
> double the length - certainly worse than the ~33% increase for base64, but
> not tragic. (And if size increase matters that much, you probably shouldn't
> be using base64 either.)
>

The binary aspect here refers to the gzip output buffer. While these are
represented in PHP as a string, the string is not encodable as UTF-8 or
indeed as JSON. Attempting to do so results in a PHP json error with
boolean false returned.

Condensed example: https://3v4l.org/cJttU
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to