Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-08-01 Thread Pavel Tupitsyn
Configuration) > > >>>> -- per-field level (BinaryTypeConfiguration) > > >>>> > > >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite > > >>> Developers] < > > >>>> ml+s2346864n20159...@n4.nabble.com>: > > >>>> > > >>>>> As Pavel mentioned, Marshaller should not be tied to cache, > > >>> BinaryObject > > >>>>> should be self-explanatory, i.e. containing all information > necessary > > >>> for > > >>>>> unmarshalling. This is an absolute requirement. > > >>>>> > > >>>>> We will have one extra byte for in serialized form, meaning that > > >>>> advantage > > >>>>> of custom encoding will become evident for all strings with length > >= > > >>> 1, > > >>>>> which is perfectly fine. I do not quite understand what are we > > >> arguing > > >>>>> about. > > >>>>> > > >>>>> As far as configuration, we can do it as follows: > > >>>>> > > >>>>> 1) Add global encoding, UTF8 by default. > > >>>>> 2) Add per-cache encoding. > > >>>>> 3) Add encoding to JDBC and ODBC driver properties. > > >>>>> > > >>>>> This should be enough. > > >>>>> > > >>>>> > > >>>> -- > > >>>> Best regards, > > >>>> Andrey Kuznetsov. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> View this message in context: > > >>>> http://apache-ignite-developers.2346864.n4.nabble. > > >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller- > > >>> IGNITE-5655-tp20024p20161.html > > >>>> Sent from the Apache Ignite Developers mailing list archive at > > >>> Nabble.com. > > >>> > > >> > > > > >

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-08-01 Thread Vladimir Ozerov
59...@n4.nabble.com>: > >>>> > >>>>> As Pavel mentioned, Marshaller should not be tied to cache, > >>> BinaryObject > >>>>> should be self-explanatory, i.e. containing all information necessary > >>> for > >>>>> u

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Artem Schitow
t;>>>> We will have one extra byte for in serialized form, meaning that >>>> advantage >>>>> of custom encoding will become evident for all strings with length >= >>> 1, >>>>> which is perfectly fine. I do not quite understand what ar

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Andrey Kuznetsov
Currently, marshaller determines the type of field (BYTE, INT, STRING etc.) only by the Class of data being serialized. It seems rather non-trivial to manage marshaling parameters at cache creation point. Alternatively, there exists simple and flexible way: just to introduce new Java type, say, Str

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
not quite understand what are we > arguing > > > > about. > > > > > > > > As far as configuration, we can do it as follows: > > > > > > > > 1) Add global encoding, UTF8 by default. > > > > 2) Add per-cache encoding. >

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Pavel Tupitsyn
F8 by default. > > > 2) Add per-cache encoding. > > > 3) Add encoding to JDBC and ODBC driver properties. > > > > > > This should be enough. > > > > > > > > -- > > Best regards, > > Andrey Kuznetsov. > > > > > > > > > > -- > > View this message in context: > > http://apache-ignite-developers.2346864.n4.nabble. > com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller- > IGNITE-5655-tp20024p20161.html > > Sent from the Apache Ignite Developers mailing list archive at > Nabble.com. >

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
t; > > > -- > Best regards, > Andrey Kuznetsov. > > > > > -- > View this message in context: > http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024p20161.html > Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Andrey Kuznetsov
this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024p20161.html Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject should be self-explanatory, i.e. containing all information necessary for unmarshalling. This is an absolute requirement. We will have one extra byte for in serialized form, meaning that advantage of custom encoding will beco

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Pavel Tupitsyn
Val, of course other options should be available, such as BinaryTypeConfiguration, and maybe field-level and class-level annotations. On Thu, Jul 27, 2017 at 9:07 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Pavel, > > This forces user to implement Binarylizable for whole typ

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Valentin Kulichenko
Pavel, This forces user to implement Binarylizable for whole type in case they want to change encoding for one-two fields, right? I really don't like it, why not add default encoding to BinaryTypeConfiguration? -Val On Thu, Jul 27, 2017 at 7:54 AM, Pavel Tupitsyn wrote: > > 1 byte for every fi

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Pavel Tupitsyn
> 1 byte for every field just for this GridBinaryMarshaller.STRING data type remains untouched. We add GridBinaryMarshaller.STRING_ENCODED, which has additional byte for encoding type. This means no overhead for existing code. I think the most common use case is English, which uses 1 byte per char

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Dmitriy Setrakyan
Pavel, what would be the size overhead? Are we adding 1 byte for every field just for this? If you would like to have this info in the binary object directly, can we in this case have some bitmap of field-to-encoding? D. On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn wrote: > I'm not sure I ud

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Pavel Tupitsyn
I'm not sure I uderstand how this "per field" configuration is supposed to be implemented. * Marshaller is not tied to a cache. It serializes all kinds of things, like compute job parameters and results. * Raw mode does not involve field names. Also it seems like a complicated and expensive soluti

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Igor Sapego
> Igor, it seems like you are advocating the per-cell configuration, not > per-field one. True, some terms mismatch here. > I see your point about C++ and .NET integrations however. Can't we provide > this info at node-join time or table-creation time? This way all nodes will > receive it and you

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Dmitriy Setrakyan
On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego wrote: > Just a note from the platforms guy: > > Solution with table-level configuration is going to be significantly > harder to implement for platforms and ODBC then field-level one. > Igor, it seems like you are advocating the per-cell configuratio

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Igor Sapego
Just a note from the platforms guy: Solution with table-level configuration is going to be significantly harder to implement for platforms and ODBC then field-level one. Also, what about binary objects, which are not stored in cache, but being marshalled? Best Regards, Igor On Wed, Jul 26, 201

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Dmitriy Setrakyan
On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur wrote: > > > Encoding must be set on per field basis. This will give us as most > flexible > > solution at the cost of 1-byte overhead. > > > Vova, I agree that the encoding should be set on per-field basis, but at > > the table level, not at a

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Andrey Kuznetsov
gt; <https://issues.apache.org/jira/browse/IGNITE-5655> . Is it > really > > > good > > > > > idea to introduce new flag (ENCODED_STRING) for existing String > > > datatype? > > > > > It's possible to use

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Vyacheslav Daradur
ss future changes related to > > > IGNITE-5655 > > > > > > <https://issues.apache.org/jira/browse/IGNITE-5655> . Is it > > really > > > > good > > > > > > idea to introduce new flag (ENCODED_STRING) for existing String > &

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Dmitriy Setrakyan
> > > <https://issues.apache.org/jira/browse/IGNITE-5655> . Is it > really > > > good > > > > > idea to introduce new flag (ENCODED_STRING) for existing String > > > datatype? > > > > > It's possible to use existing STRING flag at negligible performance > > > cost.

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
nNegativeIntStrLen bytes > > > This format can be backward compatibly extended to > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > Next, I suggest to add new BinaryConfiguration property for encoding > to > > use > > > instead of using global proper

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vladimir Ozerov
_STRING) for existing String > > datatype? > > > > It's possible to use existing STRING flag at negligible performance > > cost. > > > > Currently, utf-8-encoded string looks like > > > > byteFlag nonNegativeIntStrLen bytes > > > > This forma

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Dmitriy Setrakyan
Len bytes > > > Next, I suggest to add new BinaryConfiguration property for encoding to > > use > > > instead of using global property. It seems to be more convenient for > > > user.I'll appreciate your feedback. > > > > > > > > > > > > - > > > Best regards, > > > Andrey Kuznetsov. > > > -- > > > View this message in context: http://apache-ignite- > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding- > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html > > > Sent from the Apache Ignite Developers mailing list archive at > > Nabble.com. > > > > > > > > > > -- > > Best regards, > > Andrey Kuznetsov. > > >

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vyacheslav Daradur
> This format can be backward compatibly extended to > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > Next, I suggest to add new BinaryConfiguration property for encoding to > > use > > > instead of using global property. It seems to be

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vladimir Ozerov
gt; > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > Next, I suggest to add new BinaryConfiguration property for encoding to > use > > instead of using global property. It seems to be more convenient for > > user.I'll appreciate your feedback.

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
user.I'll appreciate your feedback. > > > > - > Best regards, > Andrey Kuznetsov. > -- > View this message in context: http://apache-ignite- > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding- > support-in-BinaryMarshaller-IGNITE-5655-tp20024.htm

Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
bal property. It seems to be more convenient for user.I'll appreciate your feedback. - Best regards, Andrey Kuznetsov. -- View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024.html