On 03.06.2014 18:24, jarod...@libero.it wrote:
HI there!!!
I have  afile like this:
file.txt
programs        sample  gene
program1        sample1 TP53
program1        sample1 TP53
program1        sample2 PRNP
program1        sample2 ATF3
program2        sample1 TP53
program2        sample1 PRNP
program2        sample2 TRIM32
program2        sample2 TLK1
program2        sample2 KIT


with open("prova.csv") as p:
     for i in p:
    ...:         lines = i.rstrip("\n").split("\t")
    ...:         print lines
    ...:
['programs ', 'sample', 'gene', 'values']
['program1', 'sample1', 'TP53', '2']
['program1', 'sample1', 'TP53', '3']
['program1', 'sample2', 'PRNP', '4']
['program1', 'sample2', 'ATF3', '3']
['program2', 'sample1', 'TP53', '2']
['program2', 'sample1', 'PRNP', '5']
['program2', 'sample2', 'TRIM32', '4']
['program2', 'sample2', 'TLK1', '4']


Be exact / do not provide approximate information if you are looking for adequate answers !!

Your file did not look like the one you showed, there was an additional 'values' column in it.
What do you want to do with it ??


I want to create a dictionary with set data with the names of the genes:

example:
dic = {}


dic['program1-sample1] = set(TP53)
dic['program1-sample2] = set(TP53,PRNP,ATF3)


Again, this is nothing you were ever really trying in a python shell since that would raise errors for several reasons, just try it yourself!

I would not build dictionary keys by concatenating the 'programs' and 'sample' strings - rather use a tuple of the two (any immutable object works as a dict key), e.g.:

dic[('program1', 'sample1')] = {'TP53'}

Essentially, what you need to do is:

- instead of printing each individual list you've parsed from the input file, use the first two elements as a tuple for the dict key, then add the third element (the gene) to the set stored under that key (use set.add() for that purpose.

- the tricky part is what to do with keys that are encountered for the first time and, thus, don't have a set associated with them yet. Here, dict.setdefault() will help you (https://docs.python.org/2.7/library/stdtypes.html?highlight=setdefault#dict.setdefault). hint: your_dict(your_key, set()).add(the_gene) will work whether or not the key has been encountered before or not.

So If I have a dictionary like that I can compare two set  I will compare the
capacity of the programs in function of the gene show.

I have no idea what you are trying to do, so I can't tell you whether the data structure will be good for it.

Wolfgang
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to