Re: [Tutor] beginner question

Steven D'Aprano Tue, 01 Nov 2011 09:40:31 -0700

Mayo Adams wrote:

When writing a simple for loop like so:


     for x in f

where f is the name of a file object, how does Python "know" to interpret
the variable x as a line of text, rather than,say, an individual character
in the file? Does it automatically
treat text files as sequences of lines?

Nice question! But the answer is a little bit complicated. The shortanswer is:

File objects themselves are programmed to iterate line by line ratherthan character by character. That is a design choice made by thedevelopers of Python, and it could have been different, but this choicewas made because it is the most useful.


The long answer requires explaining how for-loops work. When you say

    for x in THINGY: ...

Python first asks THINGY to convert itself into a iterator. It does thatby calling the special method THINGY.__iter__(), which is expected toreturn an iterator object (which may or may not be THINGY itself). Ifthere is no __iter__ method, then Python falls back on an older sequenceprotocol which isn't relevant to files. If that too fails, then Pythonraises an error.

So what's an iterator object? An iterator object must have a methodcalled "next" (in Python 2), or "__next__" (in Python 3), which returns"the next item". The object is responsible for knowing what value toreturn each time next() is called. Python doesn't need to know anythingabout the internal details of the iterator, all it cares about is thatwhen it calls THINGY.next() or THINGY.__next__(), the next item will bereturned. All the "intelligence" is inside the object, not in Python.

When there are no more items left to return, next() should raiseStopIteration, which the for loop detects and treats as "loop is nowfinished" rather than as an error.

So, the end result of all this is that Python doesn't care what THINGYis, so long as it obeys the protocol. So anyone can create new kinds ofdata that can be iterated over. In the case of files, somebody hasalready done that for you: files are built into Python.

Built-in file objects, like you get from f = open("some file", "r"),obey the iterator protocol. We can run over it by hand, doing exactlywhat Python does in a for-loop, only less conveniently.

Suppose we have a file containing "fee fi fo fum" split over four lines.Now let's iterate over it by hand. File objects are already iterators,so in Python 3 they have their own __next__ method and there's no needto call __iter__ first:


>>> f = open('temp.txt', 'r')
>>> f.__next__()
'fee\n'
>>> f.__next__()
'fi\n'
>>> f.__next__()
'fo\n'
>>> f.__next__()
'fum\n'
>>> f.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So the file object itself keeps track of how much of the file has beenread, and the Python interpreter doesn't need to know anything aboutfiles. It just needs to know that the file object is iterable. I alreadyknow this, so I took a short-cut, calling f.__next__() directly. ButPython doesn't know that, it performs one extra step: it callsf.__iter__ to get an iterator object:


>>> f.__iter__()
<_io.TextIOWrapper name='temp.txt' encoding='UTF-8'>

In this case, that iterator object is f itself, and now the Pythoninterpreter goes on to call __next__() repeatedly.

File objects are actually written in C for speed, but if they werewritten in pure Python, they might look something vaguely like this:



class File(object):
    def __init__(self, name, mode='r'):
        self.name = name
        if mode == 'r':
            ... # open the file in Read mode
        elif mode == 'w':
            ... # open in Write mode
        else:
            # actually there are other modes too
            raise ValueError('bad mode')

    def __iter__(self):
        return self  # I am my own iterator.

    def read(self, n=1):
        # Read n characters. All the hard work is in here.
        ...

    def readline(self):
        # Read a line, up to and including linefeed.
        buffer = []
        c = self.read()
        buffer.append(c)
        while c != '' and c != '\n':
            c = self.read()  # Read one more character.
            buffer.append(c)
        return ''.join(buffer)

    def __next__(self):
        line = self.readline()
        if line == '':
            # End of File
            raise StopIteration
        else:
            return line


--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] beginner question

Reply via email to