Mayo Adams wrote:
When writing a simple for loop like so:

     for x in f

where f is the name of a file object, how does Python "know" to interpret
the variable x as a line of text, rather than,say, an individual character
in the file? Does it automatically
treat text files as sequences of lines?

Nice question! But the answer is a little bit complicated. The short answer is:

File objects themselves are programmed to iterate line by line rather than character by character. That is a design choice made by the developers of Python, and it could have been different, but this choice was made because it is the most useful.

The long answer requires explaining how for-loops work. When you say

    for x in THINGY: ...

Python first asks THINGY to convert itself into a iterator. It does that by calling the special method THINGY.__iter__(), which is expected to return an iterator object (which may or may not be THINGY itself). If there is no __iter__ method, then Python falls back on an older sequence protocol which isn't relevant to files. If that too fails, then Python raises an error.

So what's an iterator object? An iterator object must have a method called "next" (in Python 2), or "__next__" (in Python 3), which returns "the next item". The object is responsible for knowing what value to return each time next() is called. Python doesn't need to know anything about the internal details of the iterator, all it cares about is that when it calls THINGY.next() or THINGY.__next__(), the next item will be returned. All the "intelligence" is inside the object, not in Python.

When there are no more items left to return, next() should raise StopIteration, which the for loop detects and treats as "loop is now finished" rather than as an error.

So, the end result of all this is that Python doesn't care what THINGY is, so long as it obeys the protocol. So anyone can create new kinds of data that can be iterated over. In the case of files, somebody has already done that for you: files are built into Python.

Built-in file objects, like you get from f = open("some file", "r"), obey the iterator protocol. We can run over it by hand, doing exactly what Python does in a for-loop, only less conveniently.

Suppose we have a file containing "fee fi fo fum" split over four lines. Now let's iterate over it by hand. File objects are already iterators, so in Python 3 they have their own __next__ method and there's no need to call __iter__ first:

>>> f = open('temp.txt', 'r')
>>> f.__next__()
'fee\n'
>>> f.__next__()
'fi\n'
>>> f.__next__()
'fo\n'
>>> f.__next__()
'fum\n'
>>> f.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration



So the file object itself keeps track of how much of the file has been read, and the Python interpreter doesn't need to know anything about files. It just needs to know that the file object is iterable. I already know this, so I took a short-cut, calling f.__next__() directly. But Python doesn't know that, it performs one extra step: it calls f.__iter__ to get an iterator object:

>>> f.__iter__()
<_io.TextIOWrapper name='temp.txt' encoding='UTF-8'>

In this case, that iterator object is f itself, and now the Python interpreter goes on to call __next__() repeatedly.

File objects are actually written in C for speed, but if they were written in pure Python, they might look something vaguely like this:


class File(object):
    def __init__(self, name, mode='r'):
        self.name = name
        if mode == 'r':
            ... # open the file in Read mode
        elif mode == 'w':
            ... # open in Write mode
        else:
            # actually there are other modes too
            raise ValueError('bad mode')

    def __iter__(self):
        return self  # I am my own iterator.

    def read(self, n=1):
        # Read n characters. All the hard work is in here.
        ...

    def readline(self):
        # Read a line, up to and including linefeed.
        buffer = []
        c = self.read()
        buffer.append(c)
        while c != '' and c != '\n':
            c = self.read()  # Read one more character.
            buffer.append(c)
        return ''.join(buffer)

    def __next__(self):
        line = self.readline()
        if line == '':
            # End of File
            raise StopIteration
        else:
            return line


--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to