jarod_v6--- via Tutor wrote: > Dear All, > sorry for my not good presentation of the code. > > I read a txt file and I prepare a ditionary > > files = os.listdir(".") > tutto={} > annotatemerge = {} > for i in files:
By the way `i` is the one of the worst choices to denote a filename, only to be beaten by `this_is_not_a_filename` ;) > with open(i,"r") as f: > for it in f: > lines = it.rstrip("\n").split("\t") > > if len(lines) >2 and lines[0] != '#CHROM': > > conte = [lines[0],lines[1],lines[3],lines[4]] > > > tutto.setdefault(i+"::"+"-".join(conte)+"::"+str(lines),[]).append(1) > annotatemerge.setdefault("-".join(conte),set()).add(i) > > > > I create two dictionary one > > annotatemerge with use as key some coordinate ( chr3-195710967-C-CG) and > connect with a set container with the name of file names > 'chr3-195710967-C-CG': {'M8.vcf'}, > 'chr17-29550645-T-C': {'M8.vcf'}, > 'chr7-140434541-G-A': {'M8.vcf'}, > 'chr14-62211578-CGTGT-C': {'M8.vcf', 'R76.vcf'}, > 'chr3-197346770-GA-G': {'M8.vcf', 'R76.vcf'}, > 'chr17-29683975-C-T': {'M8.vcf'}, > 'chr13-48955585-T-A': {'R76.vcf'}, > > the other dictionary report more information with as key a list of > separated > using this symbol "::" > > > {["M8.vcf::chr17-29665680-A-G::['chr17', '29665680', '.', 'A', 'G', > {['70.00', > 'PASS', 'DP=647;TI=NM_001042492,NM_000267;GI=NF1,NF1;FC=Silent,Silent', > 'GT:GQ: AD:VF:NL:SB:GQX', '0/1:70:623,24:0. > 0371:20:-38.2744:70']": [1],...} > > > What I want to obtaine is a list whith this format: > > coordinate\tM8.vcf\tR76.vcf\n > chr3-195710967-C-CG\t1\t0\n > chr17-29550645-T-C\t1\t0\n > chr3-197346770-GA-G\t\1\t1\n > chr13-48955585-T-A\t0\t1\n > > > When I have that file I want to traspose that table so have the coordinate > on columns and names of samples on rows (1) Here's a generic way to create a pivot table: def add(x, y): return x + y def pivot( data, get_column, get_row, get_value=lambda item: 1, accu=add, default=0, empty="-/-"): rows = {} columnkeys = set() for item in data: rowkey = get_row(item) columnkey = get_column(item) value = get_value(item) column = rows.setdefault(rowkey, {}) column[columnkey] = accu(column.get(columnkey, default), value) columnkeys.add(columnkey) columnkeys = sorted(columnkeys) result = [ [""] + columnkeys ] for rowkey in sorted(rows): row = rows[rowkey] result.append([rowkey] + [row.get(ck, empty) for ck in columnkeys]) return result if __name__ == "__main__": import csv import sys from operator import itemgetter data = [ ("alpha", "one"), ("beta", "two"), ("gamma", "three"), ("alpha", "one"), ("gamma", "one"), ] csv.writer(sys.stdout, delimiter="\t").writerows( pivot( data, itemgetter(0), itemgetter(1))) print("") csv.writer(sys.stdout, delimiter="\t").writerows( pivot( data, itemgetter(1), itemgetter(0))) As you can see when you run the above code transposing the table is done by swapping the get_column() and get_row() arguments. Instead of the sample data you can feed it something like # Untested. This is basically a copy of the code you posted wrapped into a # generator. I used csv.reader() instead of splitting the lines manually. import csv def gen_data(): for filename in os.listdir(): with open(filename, "r") as f: for fields in csv.reader(f, delimiter="\t"): if len(fields) > 2 and fields[0] != '#CHROM': conte = "-".join( [fields[0], fields[1], fields[3], fields[4]]) yield conte, filename (2) What you want to do with the other dict is still unclear to me. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor