Re: tail

2022-05-19 Thread Cameron Simpson
On 19May2022 19:50, Marco Sulla wrote: >On Wed, 18 May 2022 at 23:32, Cameron Simpson wrote: >> You're measuring different things. timeit() tries hard to measure >> just >> the code snippet you provide. It doesn't measure the startup cost of the >> whole python interpreter. Try: >> >> time p

Re: tail

2022-05-19 Thread Marco Sulla
On Wed, 18 May 2022 at 23:32, Cameron Simpson wrote: > > On 17May2022 22:45, Marco Sulla wrote: > >Well, I've done a benchmark. > timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, > number=10) > >1.5963431186974049 > timeit.timeit("tail('/home/marco/lorem.t

Re: tail

2022-05-18 Thread Cameron Simpson
On 17May2022 22:45, Marco Sulla wrote: >Well, I've done a benchmark. timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, number=10) >1.5963431186974049 timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, number=10) >2.52406043745577

Re: tail

2022-05-18 Thread Marco Sulla
Well, I've done a benchmark. >>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, >>> number=10) 1.5963431186974049 >>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, >>> number=10) 2.5240604374557734 >>> timeit.timeit("tail('/home/marco/lorem.

Re: tail

2022-05-16 Thread Marco Sulla
On Fri, 13 May 2022 at 12:49, <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2022-05-13 at 12:16:57 +0200, > Marco Sulla wrote: > > > On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > > [...] > > > > This is nearly the worst "specification" I have ever seen. > > > You're lucky. I've seen

Re: tail

2022-05-13 Thread 2QdxY4RzWzUUiLuE
On 2022-05-13 at 12:16:57 +0200, Marco Sulla wrote: > On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: [...] > > This is nearly the worst "specification" I have ever seen. > You're lucky. I've seen much worse (or no one). At least with *no* documentation, the source code stands for itsel

Re: tail

2022-05-13 Thread Marco Sulla
On Fri, 13 May 2022 at 00:31, Cameron Simpson wrote: > On 12May2022 19:48, Marco Sulla wrote: > >On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > >> There's no spec/doc, so one can't even test it. > > > >Excuse me, you're very right. > > > >""" > >A function that "tails" the file. If you don

Re: tail

2022-05-12 Thread Cameron Simpson
On 12May2022 19:48, Marco Sulla wrote: >On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: >> There's no spec/doc, so one can't even test it. > >Excuse me, you're very right. > >""" >A function that "tails" the file. If you don't know what that means, >google "man tail" > >filepath: the file path

Re: tail

2022-05-12 Thread Dennis Lee Bieber
On Thu, 12 May 2022 22:45:42 +0200, Marco Sulla declaimed the following: > >Maybe. Maybe not. What if the file ends with no newline? https://github.com/coreutils/coreutils/blob/master/src/tail.c Lines 567-569 (also lines 550-557 for "bytes_read" determination) -- Wulfraed

Re: tail

2022-05-12 Thread Marco Sulla
Thank you very much. This helped me to improve the function: import os _lf = b"\n" _err_n = "Parameter n must be a positive integer number" _err_chunk_size = "Parameter chunk_size must be a positive integer number" def tail(filepath, n=10, chunk_size=100): if (n <= 0): raise ValueErr

Re: tail

2022-05-12 Thread Marco Sulla
On Thu, 12 May 2022 at 00:50, Stefan Ram wrote: > > Marco Sulla writes: > >def tail(filepath, n=10, chunk_size=100): > >if (n <= 0): > >raise ValueError(_err_n) > ... > > There's no spec/doc, so one can't even test it. Excuse me, you're very right. """ A function that "tails" the

Re: tail

2022-05-11 Thread Avi Gross via Python-list
needed but for smaller files, KISS. -Original Message- From: Dennis Lee Bieber To: python-list@python.org Sent: Wed, May 11, 2022 6:15 pm Subject: Re: tail On Thu, 12 May 2022 06:07:18 +1000, Chris Angelico declaimed the following: >I don't understand why this wants to b

Re: tail

2022-05-11 Thread Avi Gross via Python-list
than how to stepwise make changes in a pipeline so reading from the beginning to end was not an issue. -Original Message- From: Marco Sulla To: Chris Angelico Cc: python-list@python.org Sent: Wed, May 11, 2022 5:27 pm Subject: Re: tail On Wed, 11 May 2022 at 22:09, Chris Angelico wrote

Re: tail

2022-05-11 Thread Dennis Lee Bieber
On Thu, 12 May 2022 06:07:18 +1000, Chris Angelico declaimed the following: >I don't understand why this wants to be in the standard library. > Especially as any Linux distribution probably includes the compiled "tail" command, so this would only be of use on Windows. Under recen

Re: tail

2022-05-11 Thread Chris Angelico
On Thu, 12 May 2022 at 07:27, Marco Sulla wrote: > > On Wed, 11 May 2022 at 22:09, Chris Angelico wrote: > > > > Have you actually checked those three, or do you merely suppose them to be > > true? > > I only suppose, as I said. I should do some benchmark and some other > tests, and, frankly, I

Re: tail

2022-05-11 Thread Marco Sulla
On Wed, 11 May 2022 at 22:09, Chris Angelico wrote: > > Have you actually checked those three, or do you merely suppose them to be > true? I only suppose, as I said. I should do some benchmark and some other tests, and, frankly, I don't want to. I don't want to because I'm quite sure the impleme

Re: tail

2022-05-11 Thread Chris Angelico
On Thu, 12 May 2022 at 06:03, Marco Sulla wrote: > I suppose this function is fast. It reads the bytes from the file in chunks > and stores them in a bytearray, prepending them to it. The final result is > read from the bytearray and converted to bytes (to be consistent with the > read method). >

Re: tail

2022-05-11 Thread Marco Sulla
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber wrote: > > On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla > declaimed the following: > > >Nevertheless, tail is a fundamental tool in *nix. It's fast and > >reliable. Also the tail command can't handle different encodings? > > Based upon > ht

Re: tail

2022-05-09 Thread Alan Bawden
Marco Sulla writes: On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: ... Nevertheless, tail is a fundamental tool in *nix. It's fast and reliable. Also the tail command can't handle different encodings? It definitely can't. It works for UTF-8, and all the ASCII compatible single

Re: tail

2022-05-09 Thread Dennis Lee Bieber
On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla declaimed the following: >Nevertheless, tail is a fundamental tool in *nix. It's fast and >reliable. Also the tail command can't handle different encodings? Based upon https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY th

Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 07:07, Barry wrote: > POSIX tail just prints the bytes to the output that it finds between \n bytes. > At no time does it need to care about encodings as that is a problem solved > by the terminal software. I would not expect utf-16 to work with tail on > linux systems. UTF

Re: tail

2022-05-09 Thread Barry
> On 9 May 2022, at 20:14, Marco Sulla wrote: > > On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: >> >>> On Tue, 10 May 2022 at 03:47, Marco Sulla >>> wrote: >>> >>> On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: The point here is that text is a very different thing. B

Re: tail

2022-05-09 Thread Barry
> On 9 May 2022, at 17:41, r...@zedat.fu-berlin.de wrote: > > Barry Scott writes: >> Why use tiny chunks? You can read 4KiB as fast as 100 bytes > > When optimizing code, it helps to be aware of the orders of > magnitude That is true and we’ll know to me, now show how what I said is wrong.

Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 05:12, Marco Sulla wrote: > > On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > > > On Tue, 10 May 2022 at 03:47, Marco Sulla > > wrote: > > > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > > > The point here is that text is a very different t

Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 19:53, Chris Angelico wrote: > > On Tue, 10 May 2022 at 03:47, Marco Sulla > wrote: > > > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > > > The point here is that text is a very different thing. Because you > > > cannot seek to an absolute number of charact

Re: tail

2022-05-09 Thread 2QdxY4RzWzUUiLuE
On 2022-05-08 at 18:52:42 +, Stefan Ram wrote: > Remember how recently people here talked about how you cannot copy > text from a video? Then, how did I do it? Turns out, for my > operating system, there's a screen OCR program! So I did this OCR > and then manually corrected a few wro

Re: tail

2022-05-09 Thread Chris Angelico
On Tue, 10 May 2022 at 03:47, Marco Sulla wrote: > > On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > > > The point here is that text is a very different thing. Because you > > cannot seek to an absolute number of characters in an encoding with > > variable sized characters. _If_ you did a

Re: tail

2022-05-09 Thread Marco Sulla
On Mon, 9 May 2022 at 07:56, Cameron Simpson wrote: > > The point here is that text is a very different thing. Because you > cannot seek to an absolute number of characters in an encoding with > variable sized characters. _If_ you did a seek to an arbitrary number > you can end up in the middle of

Re: tail

2022-05-09 Thread Dennis Lee Bieber
On Sun, 8 May 2022 22:48:32 +0200, Marco Sulla declaimed the following: > >Emh. I re-quote > >seek(offset, whence=SEEK_SET) >Change the stream position to the given byte offset. > >And so on. No mention of differences between text and binary mode. You ignore that, underneath, Python is j

Re: tail

2022-05-09 Thread Greg Ewing
On 9/05/22 7:47 am, Marco Sulla wrote: It will fail if the contents is not ASCII. Why? For some encodings, if you seek to an arbitrary byte position and then read, it may *appear* to succeed but give you complete gibberish. Your method might work for a certain subset of encodings (those that

Re: tail

2022-05-08 Thread Cameron Simpson
On 08May2022 22:48, Marco Sulla wrote: >On Sun, 8 May 2022 at 22:34, Barry wrote: >> >> In text mode you can only seek to a value return from f.tell() >> >> otherwise the behaviour is undefined. >> > >> > Why? I don't see any recommendation about it in the docs: >> > https://docs.python.org/3/li

Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 22:34, Barry wrote: > > > On 8 May 2022, at 20:48, Marco Sulla wrote: > > > > On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > >> > On 8 May 2022, at 17:05, Marco Sulla > wrote: > >>> > >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100):

Re: tail

2022-05-08 Thread Barry
> On 8 May 2022, at 20:48, Marco Sulla wrote: > > On Sun, 8 May 2022 at 20:31, Barry Scott wrote: >> On 8 May 2022, at 17:05, Marco Sulla wrote: >>> >>> def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): >>> n_chunk_size = n * chunk_size >> >> Why use tiny chunk

Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 22:02, Chris Angelico wrote: > > Absolutely not. As has been stated multiple times in this thread, a > fully general approach is extremely complicated, horrifically > unreliable, and hopelessly inefficient. Well, my implementation is quite general now. It's not complicated a

Re: tail

2022-05-08 Thread Chris Angelico
On Mon, 9 May 2022 at 05:49, Marco Sulla wrote: > Anyway, apart from my implementation, I'm curious if you think a tail > method is worth it to be a method of the builtin file objects in > CPython. Absolutely not. As has been stated multiple times in this thread, a fully general approach is extre

Re: tail

2022-05-08 Thread Marco Sulla
On Sun, 8 May 2022 at 20:31, Barry Scott wrote: > > > On 8 May 2022, at 17:05, Marco Sulla wrote: > > > > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): > >n_chunk_size = n * chunk_size > > Why use tiny chunks? You can read 4KiB as fast as 100 bytes as its typically >

Re: tail

2022-05-08 Thread MRAB
On 2022-05-08 19:15, Barry Scott wrote: On 7 May 2022, at 22:31, Chris Angelico wrote: On Sun, 8 May 2022 at 07:19, Stefan Ram wrote: MRAB writes: On 2022-05-07 19:47, Stefan Ram wrote: ... def encoding( name ): path = pathlib.Path( name ) for encoding in( "utf_8", "latin_1", "cp1

Re: tail

2022-05-08 Thread Barry Scott
> On 8 May 2022, at 17:05, Marco Sulla wrote: > > I think I've _almost_ found a simpler, general way: > > import os > > _lf = "\n" > _cr = "\r" > > def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): >n_chunk_size = n * chunk_size Why use tiny chunks? You can read 4K

Re: tail

2022-05-08 Thread Chris Angelico
On Mon, 9 May 2022 at 04:15, Barry Scott wrote: > > > > > On 7 May 2022, at 22:31, Chris Angelico wrote: > > > > On Sun, 8 May 2022 at 07:19, Stefan Ram wrote: > >> > >> MRAB writes: > >>> On 2022-05-07 19:47, Stefan Ram wrote: > >> ... > def encoding( name ): > path = pathlib.Path(

Re: tail

2022-05-08 Thread Barry Scott
> On 7 May 2022, at 22:31, Chris Angelico wrote: > > On Sun, 8 May 2022 at 07:19, Stefan Ram wrote: >> >> MRAB writes: >>> On 2022-05-07 19:47, Stefan Ram wrote: >> ... def encoding( name ): path = pathlib.Path( name ) for encoding in( "utf_8", "latin_1", "cp1252" ):

Re: tail

2022-05-08 Thread Barry Scott
> On 7 May 2022, at 14:40, Stefan Ram wrote: > > Marco Sulla writes: >> So there's no way to reliably read lines in reverse in text mode using >> seek and read, but the only option is readlines? > > I think, CPython is based on C. I don't know whether > Python's seek function directly call

Re: tail

2022-05-08 Thread Marco Sulla
I think I've _almost_ found a simpler, general way: import os _lf = "\n" _cr = "\r" def tail(filepath, n=10, newline=None, encoding=None, chunk_size=100): n_chunk_size = n * chunk_size pos = os.stat(filepath).st_size chunk_line_pos = -1 lines_not_found = n with open(filepath

Re: tail

2022-05-08 Thread Barry
> On 7 May 2022, at 17:29, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry wrote: >> You need to handle the file in bin mode and do the handling of line endings >> and encodings yourself. It’s not that hard for the cases you wanted. > "\n".encode("utf-16") > b'\xff\xfe\n\x00'

Re: tail

2022-05-07 Thread Chris Angelico
On Sun, 8 May 2022 at 07:19, Stefan Ram wrote: > > MRAB writes: > >On 2022-05-07 19:47, Stefan Ram wrote: > ... > >>def encoding( name ): > >>path = pathlib.Path( name ) > >>for encoding in( "utf_8", "latin_1", "cp1252" ): > >>try: > >>with path.open( encoding=encoding

Re: tail

2022-05-07 Thread Chris Angelico
On Sun, 8 May 2022 at 04:37, Marco Sulla wrote: > > On Sat, 7 May 2022 at 19:02, MRAB wrote: > > > > On 2022-05-07 17:28, Marco Sulla wrote: > > > On Sat, 7 May 2022 at 16:08, Barry wrote: > > >> You need to handle the file in bin mode and do the handling of line > > >> endings and encodings yo

Re: tail

2022-05-07 Thread MRAB
On 2022-05-07 19:47, Stefan Ram wrote: Marco Sulla writes: Well, ok, but I need a generic method to get LF and CR for any encoding an user can input. "LF" and "CR" come from US-ASCII. It is theoretically possible that there might be some encodings out there (not for Unicode) that are

Re: tail

2022-05-07 Thread MRAB
On 2022-05-07 19:35, Marco Sulla wrote: On Sat, 7 May 2022 at 19:02, MRAB wrote: > > On 2022-05-07 17:28, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry wrote: > >> You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard

Re: tail

2022-05-07 Thread Dennis Lee Bieber
On Sat, 7 May 2022 20:35:34 +0200, Marco Sulla declaimed the following: >Well, ok, but I need a generic method to get LF and CR for any >encoding an user can input. Other than EBCDIC, and AS BYTES should appear as x0A and x0D in any of the 8-bit encodings (ASCII, ISO-8859-x, CP, UT

Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 19:02, MRAB wrote: > > On 2022-05-07 17:28, Marco Sulla wrote: > > On Sat, 7 May 2022 at 16:08, Barry wrote: > >> You need to handle the file in bin mode and do the handling of line > >> endings and encodings yourself. It’s not that hard for the cases you > >> wanted. > >

Re: tail

2022-05-07 Thread MRAB
On 2022-05-07 17:28, Marco Sulla wrote: On Sat, 7 May 2022 at 16:08, Barry wrote: You need to handle the file in bin mode and do the handling of line endings and encodings yourself. It’s not that hard for the cases you wanted. "\n".encode("utf-16") b'\xff\xfe\n\x00' "".encode("utf-16") b

Re: tail

2022-05-07 Thread Dan Stromberg
I believe I'd do something like: #!/usr/local/cpython-3.10/bin/python3 """ Output the last 10 lines of a potentially-huge file. O(n). But technically so is scanning backward from the EOF. It'd be faster to use a dict, but this has the advantage of working for huge num_lines. """ import d

Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 16:08, Barry wrote: > You need to handle the file in bin mode and do the handling of line endings > and encodings yourself. It’s not that hard for the cases you wanted. >>> "\n".encode("utf-16") b'\xff\xfe\n\x00' >>> "".encode("utf-16") b'\xff\xfe' >>> "a\nb".encode("utf-16

Re: tail

2022-05-07 Thread Barry
> On 7 May 2022, at 14:24, Marco Sulla wrote: > > On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber wrote: >> >>Windows also uses for the EOL marker, but Python's I/O system >> condenses that to just internally (for TEXT mode) -- so using the >> length of a string so read to compute a

Re: tail

2022-05-07 Thread Avi Gross via Python-list
general purpose tool, internationalization from ASCII has created a challenge for lots of such tools. -Original Message- From: Marco Sulla To: Dennis Lee Bieber Cc: python-list@python.org Sent: Sat, May 7, 2022 9:21 am Subject: Re: tail On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber wrote: > >

Re: tail

2022-05-07 Thread Marco Sulla
On Sat, 7 May 2022 at 01:03, Dennis Lee Bieber wrote: > > Windows also uses for the EOL marker, but Python's I/O system > condenses that to just internally (for TEXT mode) -- so using the > length of a string so read to compute a file position may be off-by-one for > each EOL in the stri

Re: tail

2022-05-06 Thread Dennis Lee Bieber
On Fri, 6 May 2022 21:19:48 +0100, MRAB declaimed the following: >Is the file UTF-8? That's a variable-width encoding, so are any of the >characters > U+007F? > >Which OS? On Windows, it's common/normal for UTF-8 files to start with a >BOM/signature, which is 3 bytes/1 codepoint. Windo

Re: tail

2022-05-06 Thread MRAB
On 2022-05-06 20:21, Marco Sulla wrote: I have a little problem. I tried to extend the tail function, so it can read lines from the bottom of a file object opened in text mode. The problem is it does not work. It gets a starting position that is lower than the expected by 3 characters. So the f

Re: tail

2022-05-06 Thread Marco Sulla
I have a little problem. I tried to extend the tail function, so it can read lines from the bottom of a file object opened in text mode. The problem is it does not work. It gets a starting position that is lower than the expected by 3 characters. So the first line is read only for 2 chars, and th

Re: tail

2022-05-02 Thread Marco Sulla
On Mon, 2 May 2022 at 00:20, Cameron Simpson wrote: > > On 01May2022 18:55, Marco Sulla wrote: > >Something like this is OK? > [...] > >def tail(f): > >chunk_size = 100 > >size = os.stat(f.fileno()).st_size > > I think you want os.fstat(). It's the same from py 3.3 > >chunk_line_pos

Re: tail

2022-05-02 Thread Marco Sulla
Ok, I suppose \n and \r are enough: readline(size=- 1, /) Read and return one line from the stream. If size is specified, at most size bytes will be read. The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line

Re: tail

2022-05-02 Thread Chris Angelico
On Tue, 3 May 2022 at 04:38, Marco Sulla wrote: > > On Mon, 2 May 2022 at 18:31, Stefan Ram wrote: > > > > |The Unicode standard defines a number of characters that > > |conforming applications should recognize as line terminators:[7] > > | > > |LF:Line Feed, U+000A > > |VT:Vertical Tab,

Re: tail

2022-05-02 Thread Marco Sulla
On Mon, 2 May 2022 at 18:31, Stefan Ram wrote: > > |The Unicode standard defines a number of characters that > |conforming applications should recognize as line terminators:[7] > | > |LF:Line Feed, U+000A > |VT:Vertical Tab, U+000B > |FF:Form Feed, U+000C > |CR:Carriage Return, U+0

Re: tail

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 11:54, Cameron Simpson wrote: > > On 01May2022 23:30, Stefan Ram wrote: > >Dan Stromberg writes: > >>But what about Unicode? Are all 10 bytes newlines in Unicode encodings? > > It seems in UTF-8, when a value is above U+007F, it will be > > encoded with bytes that always

Re: tail

2022-05-01 Thread Cameron Simpson
On 01May2022 23:30, Stefan Ram wrote: >Dan Stromberg writes: >>But what about Unicode? Are all 10 bytes newlines in Unicode encodings? > It seems in UTF-8, when a value is above U+007F, it will be > encoded with bytes that always have their high bit set. Aye. Design festure enabling easy resy

Re: tail

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 09:19, Dan Stromberg wrote: > > On Sun, May 1, 2022 at 3:19 PM Cameron Simpson wrote: > > > On 01May2022 18:55, Marco Sulla wrote: > > >Something like this is OK? > > > > Scanning backward for a byte == 10 in ASCII or ISO-8859 seems fine. > > But what about Unicode? Are al

Re: tail

2022-05-01 Thread Dan Stromberg
On Sun, May 1, 2022 at 3:19 PM Cameron Simpson wrote: > On 01May2022 18:55, Marco Sulla wrote: > >Something like this is OK? > Scanning backward for a byte == 10 in ASCII or ISO-8859 seems fine. But what about Unicode? Are all 10 bytes newlines in Unicode encodings? If not, and you have a hu

Re: tail

2022-05-01 Thread Cameron Simpson
On 01May2022 18:55, Marco Sulla wrote: >Something like this is OK? [...] >def tail(f): >chunk_size = 100 >size = os.stat(f.fileno()).st_size I think you want os.fstat(). >positions = iter(range(size, -1, -chunk_size)) >next(positions) I was wondering about the iter, but this mak

Re: tail

2022-05-01 Thread Marco Sulla
Something like this is OK? import os def tail(f): chunk_size = 100 size = os.stat(f.fileno()).st_size positions = iter(range(size, -1, -chunk_size)) next(positions) chunk_line_pos = -1 pos = 0 for pos in positions: f.seek(pos) chars = f.read(chunk_si

Re: tail

2022-04-25 Thread dn
On 26/04/2022 10.54, Cameron Simpson wrote: > On 25Apr2022 08:08, DL Neil wrote: >> Thus, the observation that the OP may find that a serial, >> read-the-entire-file approach is faster is some situations (relatively >> short files). Conversely, with longer files, some sort of 'last chunk' >> appro

Re: tail

2022-04-25 Thread Cameron Simpson
On 25Apr2022 08:08, DL Neil wrote: >Thus, the observation that the OP may find that a serial, >read-the-entire-file approach is faster is some situations (relatively >short files). Conversely, with longer files, some sort of 'last chunk' >approach would be superior. If you make the chunk big enou

Re: tail

2022-04-24 Thread dn
On 25/04/2022 04.21, pjfarl...@earthlink.net wrote: >> -Original Message- >> From: dn >> Sent: Saturday, April 23, 2022 6:05 PM >> To: python-list@python.org >> Subject: Re: tail >> > >> NB quite a few of IBM's (extensively researched) a

Re: tail

2022-04-24 Thread Dennis Lee Bieber
On Sun, 24 Apr 2022 12:21:36 -0400, declaimed the following: > >WRT the mentioned IBM utility program[me]s, the non-Posix part of the IBM >mainframe file system has always provided record-managed storage since the >late 1960's (as opposed to the byte-managed storage of *ix systems) so >searchi

RE: tail

2022-04-24 Thread pjfarley3
> -Original Message- > From: dn > Sent: Saturday, April 23, 2022 6:05 PM > To: python-list@python.org > Subject: Re: tail > > NB quite a few of IBM's (extensively researched) algorithms which formed > utility > program[me]s on mainframes, made similar

Re: tail

2022-04-24 Thread Marco Sulla
On Sun, 24 Apr 2022 at 11:21, Roel Schroeven wrote: > dn schreef op 24/04/2022 om 0:04: > > Disagreeing with @Chris in the sense that I use tail very frequently, > > and usually in the context of server logs - but I'm talking about the > > Linux implementation, not Python code! > If I understand

Re: tail

2022-04-24 Thread Chris Angelico
On Mon, 25 Apr 2022 at 01:47, Marco Sulla wrote: > > > > On Sat, 23 Apr 2022 at 23:18, Chris Angelico wrote: >> >> Ah. Well, then, THAT is why it's inefficient: you're seeking back one >> single byte at a time, then reading forwards. That is NOT going to >> play nicely with file systems or buffer

Re: tail

2022-04-24 Thread Marco Sulla
On Sun, 24 Apr 2022 at 00:19, Cameron Simpson wrote: > An approach I think you both may have missed: mmap the file and use > mmap.rfind(b'\n') to locate line delimiters. > https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind > Ah, I played very little with mmap, I didn't know about this.

Re: tail

2022-04-24 Thread Marco Sulla
On Sat, 23 Apr 2022 at 23:18, Chris Angelico wrote: > Ah. Well, then, THAT is why it's inefficient: you're seeking back one > single byte at a time, then reading forwards. That is NOT going to > play nicely with file systems or buffers. > > Compare reading line by line over the file with readline

Re: tail

2022-04-24 Thread Avi Gross via Python-list
thon-list@python.org Sent: Sun, Apr 24, 2022 5:19 am Subject: Re: tail dn schreef op 24/04/2022 om 0:04: > Disagreeing with @Chris in the sense that I use tail very frequently, > and usually in the context of server logs - but I'm talking about the > Linux implementation, not Python co

Re: tail

2022-04-24 Thread Chris Angelico
On Sun, 24 Apr 2022 at 21:11, Antoon Pardon wrote: > > > > Op 23/04/2022 om 20:57 schreef Chris Angelico: > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > > wrote: > >> What about introducing a method for text streams that reads the lines > >> from the bottom? Java has also a ReversedLinesFileRea

Re: tail

2022-04-24 Thread Antoon Pardon
Op 23/04/2022 om 20:57 schreef Chris Angelico: On Sun, 24 Apr 2022 at 04:37, Marco Sulla wrote: What about introducing a method for text streams that reads the lines from the bottom? Java has also a ReversedLinesFileReader with Apache Commons IO. 1) Read the entire file and decode bytes to

Re: tail

2022-04-24 Thread Roel Schroeven
dn schreef op 24/04/2022 om 0:04: Disagreeing with @Chris in the sense that I use tail very frequently, and usually in the context of server logs - but I'm talking about the Linux implementation, not Python code! If I understand Marco correctly, what he want is to read the lines from bottom to t

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 10:04, Cameron Simpson wrote: > > On 24Apr2022 08:21, Chris Angelico wrote: > >On Sun, 24 Apr 2022 at 08:18, Cameron Simpson wrote: > >> An approach I think you both may have missed: mmap the file and use > >> mmap.rfind(b'\n') to locate line delimiters. > >> https://docs.

Re: tail

2022-04-23 Thread Cameron Simpson
On 24Apr2022 08:21, Chris Angelico wrote: >On Sun, 24 Apr 2022 at 08:18, Cameron Simpson wrote: >> An approach I think you both may have missed: mmap the file and use >> mmap.rfind(b'\n') to locate line delimiters. >> https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind > >Yeah, I made a v

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 08:18, Cameron Simpson wrote: > > On 24Apr2022 07:15, Chris Angelico wrote: > >On Sun, 24 Apr 2022 at 07:13, Marco Sulla > >wrote: > >> Emh, why chunks? My function simply reads byte per byte and compares > >> it to b"\n". When it find it, it stops and do a readline(): >

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 08:06, dn wrote: > > On 24/04/2022 09.15, Chris Angelico wrote: > > On Sun, 24 Apr 2022 at 07:13, Marco Sulla > > wrote: > >> > >> On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > > This is quite inefficient in general. > > Why inefficient? I think that

Re: tail

2022-04-23 Thread Cameron Simpson
On 24Apr2022 07:15, Chris Angelico wrote: >On Sun, 24 Apr 2022 at 07:13, Marco Sulla wrote: >> Emh, why chunks? My function simply reads byte per byte and compares >> it to b"\n". When it find it, it stops and do a readline(): [...] >> This is only for one line and in utf8, but it can be general

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 08:03, Peter J. Holzer wrote: > > On 2022-04-24 04:57:20 +1000, Chris Angelico wrote: > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > > wrote: > > > What about introducing a method for text streams that reads the lines > > > from the bottom? Java has also a ReversedLinesFi

Re: tail

2022-04-23 Thread dn
On 24/04/2022 09.15, Chris Angelico wrote: > On Sun, 24 Apr 2022 at 07:13, Marco Sulla > wrote: >> >> On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > This is quite inefficient in general. Why inefficient? I think that readlines() will be much slower, not only more time c

Re: tail

2022-04-23 Thread Peter J. Holzer
On 2022-04-24 04:57:20 +1000, Chris Angelico wrote: > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > wrote: > > What about introducing a method for text streams that reads the lines > > from the bottom? Java has also a ReversedLinesFileReader with Apache > > Commons IO. > > It's fundamentally diffi

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 07:13, Marco Sulla wrote: > > On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > > > > This is quite inefficient in general. > > > > > > Why inefficient? I think that readlines() will be much slower, not > > > only more time consuming. > > > > It depends on which is more

Re: tail

2022-04-23 Thread Marco Sulla
On Sat, 23 Apr 2022 at 23:00, Chris Angelico wrote: > > > This is quite inefficient in general. > > > > Why inefficient? I think that readlines() will be much slower, not > > only more time consuming. > > It depends on which is more costly: reading the whole file (cost > depends on size of file) o

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 06:41, Marco Sulla wrote: > > On Sat, 23 Apr 2022 at 20:59, Chris Angelico wrote: > > > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > > wrote: > > > > > > What about introducing a method for text streams that reads the lines > > > from the bottom? Java has also a Reversed

Re: tail

2022-04-23 Thread Marco Sulla
On Sat, 23 Apr 2022 at 20:59, Chris Angelico wrote: > > On Sun, 24 Apr 2022 at 04:37, Marco Sulla > wrote: > > > > What about introducing a method for text streams that reads the lines > > from the bottom? Java has also a ReversedLinesFileReader with Apache > > Commons IO. > > It's fundamentally

Re: tail

2022-04-23 Thread Chris Angelico
On Sun, 24 Apr 2022 at 04:37, Marco Sulla wrote: > > What about introducing a method for text streams that reads the lines > from the bottom? Java has also a ReversedLinesFileReader with Apache > Commons IO. It's fundamentally difficult to get precise. In general, there are three steps to reading

Re: Tail recursion to while iteration in 2 easy steps

2013-10-09 Thread Charles Hixson
On 10/08/2013 02:22 AM, Steven D'Aprano wrote: On Mon, 07 Oct 2013 20:27:13 -0700, Mark Janssen wrote: But even putting that aside, even if somebody wrote such a description, it would be reductionism gone mad. What possible light on the problem would be shined by a long, long list of machine co

Re: Tail recursion to while iteration in 2 easy steps

2013-10-08 Thread Jussi Piitulainen
Alain Ketterlin writes: > Antoon Pardon writes: > > > Op 07-10-13 19:15, Alain Ketterlin schreef: > > [...] > >> That's fine. My point was: you can't at the same time have full > >> dynamicity *and* procedural optimizations (like tail call opt). > >> Everybody should be clear about the trade-off.

Re: Tail recursion to while iteration in 2 easy steps

2013-10-08 Thread Antoon Pardon
Op 07-10-13 23:27, random...@fastmail.us schreef: > On Sat, Oct 5, 2013, at 3:39, Antoon Pardon wrote: >> What does this mean? >> >> Does it mean that a naive implementation would arbitrarily mess up >> stack traces and he wasn't interested in investigating more >> sophisticated implementations? >>

Re: Tail recursion to while iteration in 2 easy steps

2013-10-08 Thread Steven D'Aprano
On Mon, 07 Oct 2013 20:27:13 -0700, Mark Janssen wrote: But even putting that aside, even if somebody wrote such a description, it would be reductionism gone mad. What possible light on the problem would be shined by a long, long list of machine code operations, even if written

Re: Tail recursion to while iteration in 2 easy steps

2013-10-08 Thread Antoon Pardon
Op 08-10-13 01:50, Steven D'Aprano schreef: > On Mon, 07 Oct 2013 15:47:26 -0700, Mark Janssen wrote: > >> I challenge you to get >> down to the machine code in scheme and formally describe how it's doing >> both. > > For which machine? > > Or are you assuming that there's only one machine code

Re: Tail recursion to while iteration in 2 easy steps

2013-10-08 Thread Alain Ketterlin
Antoon Pardon writes: > Op 07-10-13 19:15, Alain Ketterlin schreef: [...] >> That's fine. My point was: you can't at the same time have full >> dynamicity *and* procedural optimizations (like tail call opt). >> Everybody should be clear about the trade-off. > > Your wrong. Full dynamics is not i

  1   2   >