Zibi,

I'm trying to parse and then serialize back the following entity with your
parser and serializer:

<nSpinnerSeconds[@cldr.plural(n)] {
  zero: "zero seconds",
  one: "one second",
  two: "{{ n }} seconds",
  few: "{{ n }} seconds",
  many: "{{ n }} seconds",
  other: "{{ n }} seconds"
}>

The parser gives me this:

{
    '$v': {
        'many': [{
            't': 'id',
            'v': 'n'
        }, ' seconds'],
        'two': [{
            't': 'id',
            'v': 'n'
        }, ' seconds'],
        'one': 'one second',
        'few': [{
            't': 'id',
            'v': 'n'
        }, ' seconds'],
        'zero': 'zero seconds',
        'other': [{
            't': 'id',
            'v': 'n'
        }, ' seconds']
    },
    '$x': [{
        'a': [{
            't': 'id',
            'v': 'n'
        }],
        't': 'call',
        'v': {
            't': 'glob',
            'v': 'cldr.plural'
        }
    }],
    '$i': 'nSpinnerSeconds'
}

Is this expected? Instead, I was hoping for something more l10n-tool
friendly:

{
    "$v": {
        "zero": "zero seconds",
        "one": "one second",
        "two": "{{ n }} seconds",
        "few": "{{ n }} seconds",
        "many": "{{ n }} seconds",
        "other": "{{ n }} seconds",
    },
    "$i": "nSpinnerSeconds"
}

-Matjaž

On Fri, May 1, 2015 at 2:35 AM, Zibi Braniecki <[email protected]
> wrote:

> Next update!
>
> I got both python [0] and js [1] serializers to work! I can't say they are
> complete, and I don't have tests yet, but from my hand testing they seem
> usable.
>
> I also added ./tools/serialize.js|py to both repositories.
>
> So now I have:
>  - two parsers that produce the same JSON AST
>  - serializers that can take that AST and reproduce L20n
>
> Which means that we should be able to freely interact between js and
> python and also read/write L20n for tools purposes.
> Axel, I also removed unescape dependency from JS Parser, so you should be
> able to use it in Aisle.
>
> Working on that brought three topics that I so far left unresolved:
>
> 1) Source notation. Currently both parsers don't store any information on
> syntax nodes positioning in the source. I believe it would be worth
> figuring out how we want to handle that. First idea that comes to mind is
> that we could just add a kvp on the node object like 'source': {'start':
> 49, 'end': 102', string: '...'} to use for an editor.
>
> 2) String notations. When a string is used it may be surrounded by ", ' or
> (in the future) """ or '''. Once we parser id, we don't store this
> information so on serialization we cannot reuse it.
>
> We could guess (for example: multiline uses triple-quotes, single line
> uses " unless it has " inside it, and no ' in which case it uses '), but we
> could also somehow store it on the string
>
> 3) Unescaping.
>
> Right now we do something very dummy - we unescape unicode and remove a
> quote from in front of any other character treating the following char as
> non-semantic.
>
> It works well enough, you can do: <foo "hey \" ho"> or <foo "hey \{{ var
> }} ho"> and it will all be stores as a simple string.
>
> But with serialization, problems arise.
>
> First, unicode \uXXXX will be turned into a unicode char by parser so the
> serializer will have no way to figure out what form of unicode has been
> used and will serialize it as a unicode char.
>
> Second, there is no way to sometimes know what unescape form has been
> used. Like:
>
> <foo "hey \{{ var }}"> and <foo "hey {\{ var }}"> will produce the same
> AST. During serialization we can identify that since the ast node is a
> simple string "hey {{ var }}" and not a complex string, we should unescape
> the {{ to remove the syntactic meaning, but we have no way to know which
> char should be unescaped.
>
> Third, all other chars just escaped, so <foo "hey \n"> will be turned into
> "hey n" and <foo "hey \l"> will be turned into <foo "hey l">
>
> That means that when serializing we will just write it back without a
> backslash.
>
> We can limit the backslash use, and raise errors in parser if \ precedes
> an unknown char, and then have rules in the serializer, to backslash a
> backslash, backslash {{ and backslash string closing mark, but for chars
> like "\n" we will hit the same problem as with unicode:
>
> <foo "hey
>  ho"> and <foo hey \n ho"> will produce the same AST. What should we
> serialize it into?
>
> Would love to get your feedback!
> zb.
>
> [0]
> https://github.com/l20n/python-l20n/blob/master/lib/l20n/format/serializer.py
> [1]
> https://github.com/zbraniecki/l20n.js/blob/v3-features/src/lib/format/l20n/serializer.js
> _______________________________________________
> tools-l10n mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/tools-l10n
>
_______________________________________________
tools-l10n mailing list
[email protected]
https://lists.mozilla.org/listinfo/tools-l10n

Reply via email to