program to generate data helpful in finding duplicate large files

2014-09-18 Thread David Alban
greetings, i'm a long time perl programmer who is learning python. i'd be interested in any comments you might have on my code below. feel free to respond privately if you prefer. i'd like to know if i'm on the right track. the program works, and does what i want it to do. is there a differen

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Chris Kaynor
On Thu, Sep 18, 2014 at 11:11 AM, David Alban wrote: > *#!/usr/bin/python* > > *import argparse* > *import hashlib* > *import os* > *import re* > *import socket* > *import sys* > > *from stat import ** > Generally, from import * imports are discouraged as they tend to populate your namespace and

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Chris Angelico
On Fri, Sep 19, 2014 at 4:11 AM, David Alban wrote: > i'm a long time perl programmer who is learning python. i'd be interested > in any comments you might have on my code below. feel free to respond > privately if you prefer. i'd like to know if i'm on the right track. Sure! Happy to help out

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Chris Angelico
On Fri, Sep 19, 2014 at 4:45 AM, Chris Kaynor wrote: >> from stat import * > > > Generally, from import * imports are discouraged as they tend to populate > your namespace and have issues with accidentally overriding imported > functions/variables. Generally, its more Pythonic to use the other imp

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Peter Otten
David Alban wrote: > *sep = ascii_nul* > > *print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep, > dev, sep, ino, sep, nlink, sep, size, sep, file_path )* file_path may contain newlines, therefore you should probably use "\0" to separate the records. The other fields may n

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Gregory Ewing
Chris Angelico wrote: On Fri, Sep 19, 2014 at 4:45 AM, Chris Kaynor wrote: from stat import * I was going to say the same thing, except that this module specifically is documented as recommending that. I still don't like "import *", but either this is a special case, or the docs need to be c

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Steven D'Aprano
David Alban wrote: > *#!/usr/bin/python* > > *import argparse* > *import hashlib* > *import os* > *import re* > *import socket* > *import sys* Um, how did you end up with leading and trailing asterisks? That's going to stop your code from running. > *from stat import ** "import *" is slightly

Re: program to generate data helpful in finding duplicate large files

2014-09-18 Thread Chris Angelico
On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano wrote: > David Alban wrote: >> *import sys* > > Um, how did you end up with leading and trailing asterisks? That's going to > stop your code from running. They're not part of the code, they're part of the mangling of the formatting. So this isn't a

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Steven D'Aprano
Chris Angelico wrote: > On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano > wrote: >> s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path]) >> print s > > That won't work on its own; several of the values are integers. Ah, so they are! > So > either they need to be str(

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Fri, Sep 19, 2014 at 9:04 PM, Steven D'Aprano wrote: >> Hmm, you sure exit won't work? > > In the interactive interpreter, exit is bound to a special helper object: > > py> exit > Use exit() or Ctrl-D (i.e. EOF) to exit > > Otherwise, you'll get NameError. It's not the interactive interpreter

Fwd: program to generate data helpful in finding duplicate large files

2014-09-19 Thread David Alban
here is my reworked code in a plain text email. -- Forwarded message -- From: Date: Thu, Sep 18, 2014 at 3:58 PM Subject: Re: program to generate data helpful in finding duplicate large files To: python-list@python.org thanks for the responses. i'm having quite a good

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Fri, Sep 19, 2014 at 11:32 PM, David Alban wrote: > thanks for the responses. i'm having quite a good time learning python. Awesome! But while you're at it, you may want to consider learning English on the side; capitalization does make your prose more readable. Also, it makes you look carel

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Steven D'Aprano
Chris Angelico wrote: > On Fri, Sep 19, 2014 at 11:32 PM, David Alban wrote: >> thanks for the responses. i'm having quite a good time learning python. > > Awesome! But while you're at it, you may want to consider learning > English on the side; capitalization does make your prose more > reada

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Sat, Sep 20, 2014 at 2:22 AM, Steven D'Aprano wrote: > I heard one of them mention that even though he sees the words are > misspelled, he deliberately doesn't bother fixing them because its not > important. I guess he just liked the look of his text having highlighted > words scattered through

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Ian Kelly
On Fri, Sep 19, 2014 at 12:45 AM, Chris Angelico wrote: > On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano >> s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path]) >> print s > > That won't work on its own; several of the values are integers. So > either they need to be str()

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Sat, Sep 20, 2014 at 3:20 AM, Ian Kelly wrote: > On Fri, Sep 19, 2014 at 12:45 AM, Chris Angelico wrote: >> On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano >>> s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path]) >>> print s >> >> That won't work on its own; several of

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Steven D'Aprano
Chris Angelico wrote: > On Fri, Sep 19, 2014 at 9:04 PM, Steven D'Aprano > wrote: >>> Hmm, you sure exit won't work? >> >> In the interactive interpreter, exit is bound to a special helper object: >> >> py> exit >> Use exit() or Ctrl-D (i.e. EOF) to exit >> >> Otherwise, you'll get NameError. >

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Cameron Simpson
On 19Sep2014 23:59, Chris Angelico wrote: On Fri, Sep 19, 2014 at 11:32 PM, David Alban wrote: if you omit the exit statement it in this example, and $report_mode is not set, your shell program will give a non-zero return code and appear to have terminated with an error. in shell the last exp

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Cameron Simpson
On 20Sep2014 02:22, Steven D'Aprano wrote: [...] I used to work with programmers whose spelling is awful. [...] nevertheless their commit messages and documentation was full of things like "make teh function reqire a posative index". [...] I heard one of them mention that even though he sees th

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Sat, Sep 20, 2014 at 9:33 AM, Steven D'Aprano wrote: > It's a bad idea to rely on features added to site.py, since they aren't > necessarily going to be available at all sites or in all implementations: > > steve@orac:/home/steve$ ipy > IronPython 2.6 Beta 2 DEBUG (2.6.0.20) on .NET 2.0.50727.1

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Chris Angelico
On Sat, Sep 20, 2014 at 10:27 AM, Cameron Simpson wrote: > IMO, it is good that the shell is like that. It isn't Python. > > A great many small shell scripts are one liner wrappers, and this serves > them well. A great many more are a lot of prep work followed by a major (and > final) command. The

Re: program to generate data helpful in finding duplicate large files

2014-09-19 Thread Ben Finney
Steven D'Aprano writes: > I heard one [programmer] mention that even though he sees the words > are misspelled, he deliberately doesn't bother fixing them because its > not important. I guess he just liked the look of his text having > highlighted words scattered throughout the editor. If it's w

Re: program to generate data helpful in finding duplicate large files

2014-09-21 Thread bizcor
thanks for the responses. i'm having quite a good time learning python. On Thu, Sep 18, 2014 at 11:45 AM, Chris Kaynor wrote: > > Additionally, you may want to specify binary mode by using open(file_path, > 'rb') to ensure platform-independence ('r' uses Universal newlines, which > means on Win

Re: program to generate data helpful in finding duplicate large files

2014-09-22 Thread random832
On Thu, Sep 18, 2014, at 14:45, Chris Kaynor wrote: > Additionally, you may want to specify binary mode by using > open(file_path, > 'rb') to ensure platform-independence ('r' uses Universal newlines, which > means on Windows, Python will convert "\r\n" to "\n" while reading the > file). Additional

Re: program to generate data helpful in finding duplicate large files

2014-09-22 Thread Chris Kaynor
I went and looked up the PEPs regarding universal new-lines, and it seems it would be platform-independent - all of "\r\n", "\r", and "\n" will always be converted to "\n" in Python, unless explicitly modified on the file object (or Universal newlines are disabled). It still stands that for platfo

Re: program to generate data helpful in finding duplicate large files

2014-09-22 Thread Terry Reedy
On 9/22/2014 3:34 PM, random...@fastmail.us wrote: On Thu, Sep 18, 2014, at 14:45, Chris Kaynor wrote: Additionally, you may want to specify binary mode by using open(file_path, 'rb') to ensure platform-independence ('r' uses Universal newlines, which means on Windows, Python will convert "\r\n"