Prinn, Craig wrote:
I am trying to convert an EBCIDIC file to ASCII, when the records are fixed
length I can convert it fine, I have some files that are coming in as
variable length records, is there a way to convert the file in Python? I
tried using no length but then it just reads in to a fixed buffer size and
I can't seem to break the records properly


I'm afraid that I have no idea what you mean here. What are you actually doing? What does "tried using no length" mean?

Converting from one encoding to another should have nothing to do with whether they are fixed-length records, variable-length records, or free-form text. First you read the file as bytes, then use the encoding to convert to text, then process the file however you like.

Using Python 3, I prepared an EBCIDIC file. If I open it in binary mode, you get the raw bytes, which are a mess:

py> raw = open('/home/steve/ebcidic.text', 'rb').read()
py> print(raw)
b'\xe3\x88\x89\xa2@\x89\xa2@\\\xa2\x96\x94\x85\\@\xe3 ...

For brevity, I truncated the output.

But if you open in text mode, and set the encoding correctly, Python automatically converts the bytes into text according to the rules of EBCIDIC:


py> text = open('/home/steve/ebcidic.text', 'r', encoding='cp500').read()
py> print(text)
This is *some* Text containing "punctuation" & other things(!) which
may{?} NOT be the +++same+++ when encoded into ASCII|EBCIDIC.


This is especially useful if you need to process the file line by line. Simple open the file with the right encoding, then loop over the file as normal.


f = open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
    print(line)


In this case, I used IBM's standard EBCIDIC encoding for Western Europe. Python knows about some others, see the documentation for the codecs module for the list.

http://docs.python.org/library/codecs.html
http://docs.python.org/py3k/library/codecs.html

Once you have the text, you can then treat it as fixed width, variable width, or whatever else you might have.


Python 2 is a little trickier. You can manually decode the bytes:

# not tested
text = open('/home/steve/ebcidic.text', 'rb').read().decode('cp500')

or you can use the codecs manual to get very close to the same functionality as Python 3:

# also untested
import codecs
f = codecs.open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
    print line



--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to