On Fri, Nov 11, 2011 at 1:59 PM, Cranky Frankie <cranky.fran...@gmail.com> wrote: > Thank you for your help on this. Now for "the rest of the story." > > I'm trying to build a script to parse IBM AIX DB2 DDL to line up the > data types (it drives me crazy when the column data types are not > lined up). For example, typical create table DDL might be hundreds of > lines long but will look like this: > > -- > -- table create DDL > -- > CREATE TABLE FRANK.TEST ( > COLUMN1 DECIMAL(8), > COLUMN2 CHAR(20), > COLUMN3 TIMESTAMP, > COLUMN4 INTEGER, > COLUMN5 DATE NOT NULL WITH DEFAULT, > -- repeat for hundreds of columns > ); > COMMENT ON TABLE FRANK.TEST IS 'TEST TABLE'; > > This is just a small sample, there are many other possible lines, but > I'm only concerned about the column lines like COLUMN1 through COLUMN5 > above. > > I have a script on Windows that reads in the DDL file and writes out > each line to a new file. What I'm doing is using the split() function > to test for the presence of any DB2 standard data type, like CHAR, > INTEGER, SMALINT, etc. If I find one I want to use positional output > to make each like look like: > > COLUMN1 DECIMAL(8), > COLUMN2 CHAR(20), > COLUMN3 TIMESTAMP, > COLUMN4 INTEGER, > COLUMN5 DATE NOT NULL WITH DEFAULT, > > where all the COLUMNs would be in column 1 of the output file, the > data types would be in column 40, and the comma would be next. > > The problem is handling lines that have NOT NULL WITH DEFAULT at the > end. The thing is there could be other valid DDL in that position, and > there may or may not be a comma after the data type. What I want to do > is just take anything after the datatype, which would be element(1) in > the split() output, and just write it out. I thought I could use > rsplit() to do this, but you can't put the output of split() in > rsplit() - I tried. > > I need to do something like, after verifying that element(1) is a > valid DB2 datatype, just take everything else on the line after it, > which may be a single comma, or NOT NULL WITH DEFAULT, or something > else, and place it on the output line to be written out. > > So, to reiterate: I'm trying to build a script to line up the data > types in a create table DDL file. Splitting each line into individual > space separated elements, then checking for a valid data type, the > rebuilding the line positionally seems to be the way to go. If there's > an easyier way to do it I'm all ears. > > I don't have my script available at the moment but I could send it if > it would be helpful. >
I would suggest using str.split(None, 1) to limit the amount of splits done to only 1, which means you get a list like ['COLUMN', 'DATA TYPE ETC']. Then you use the str.startswith function to check the second entry for data types, something like so (note startswith accepts a tuple of strings to look for: if splitted_line[1].startswith(data_types): print "%s %s" % (splitted_line[0].ljust(39), splitted_line[1]) HTH, Hugo _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor