Maximilian Roos created AIRFLOW-3921: ----------------------------------------
Summary: Logging bytes fails in Python 2 Key: AIRFLOW-3921 URL: https://issues.apache.org/jira/browse/AIRFLOW-3921 Project: Apache Airflow Issue Type: Bug Components: utils Affects Versions: 1.10.2 Reporter: Maximilian Roos We just upgraded to 1.10.2. Thanks for the cadence of releases. We've hit one small but critical issue though: when we log a Python2 string (i.e. bytes) that contain non-ascii characters, airflow raises an error. This is because airflow uses a `\n` character that is unicode encoded here: [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L102,] because `from __future__import unicode_literals` is placed here: [https://github.com/apache/airflow/blob/master/airflow/utils/log/logging_mixin.py#L23] (I think this is why, and the repro below supports that, but I'm frequently hitting unicode issues, so please correct me if I'm mistaken) You can see the issue reproduced: {code:java} # non-ascii character In [16]: print(u"\u00E9") é # non-ascii encoded into bytes In [11]: u"\u00E9aoeu".encode('utf-8') Out[11]: '\xc3\xa9aoeu' # works fine when compared with `b"\n"` In [18]: u"\u00E9aoeu".encode('utf-8').endswith(b"\n") Out[18]: False # fails when compared with `u"\n"` In [15]: '\xc3\xa9aoeu'.endswith(u"\n") --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-15-93bd1ca7fa67> in <module>() ----> 1 '\xc3\xa9aoeu'.endswith(u"\n") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) {code} I'm not sure there's any workaround without something as drastic as removing the `from __future__import unicode_literals`, or changing all our logging to emit unicode (which would break lots of other processes in Python 2). Is there any temporary workaround? Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)