Hello! I am wondering about the best way to handle sorting some data from some of my results.
I have an file in the form shown at the end (please forgive any wrapparounds due to the width of the screen here- the lines starting with ENS end with the e-12 or what have you on same line.) What I would like is to generate an output file of any other ENSE000...e-4 (or whathaveyou) lines that appear in more than one place and for each of those the queries they appear related to. So if the first line ENSE00001098330.2|ENSG00000013573.6|ENST00000350437.2 assembly=N... etc appears as a result in any other query I would like it and the queries it appears as a result to (including the score if possible). My data set the below is taken from is over 2.4 gb so speed and memory considerations come into play. Are sets more effective than lists for this? To save space in the new file I really only need the name of the result up to the | and the score at the end for each. to simplify things, the score could be dropped, and I could check it out as needed later. As always all feedback is very appreciated. Thanks, Scott FILE: This is the number 1 query tested. Results for scoring against Query= hg17_chainMm5_chr17 range=chr1:2040-3330 5'pad=0 3'pad=0 are: ENSE00001098330.2|ENSG00000013573.6|ENST00000350437.2 assembly=N... 72 1e-12 ENSE00001160046.1|ENSG00000013573.6|ENST00000251758.3 assembly=N... 72 1e-12 ENSE00001404464.1|ENSG00000013573.6|ENST00000228264.4 assembly=N... 72 1e-12 ENSE00001160046.1|ENSG00000013573.6|ENST00000290818.3 assembly=N... 72 1e-12 ENSE00001343865.2|ENSG00000013573.6|ENST00000350437.2 assembly=N... 46 8e-05 ENSE00001160049.1|ENSG00000013573.6|ENST00000251758.3 assembly=N... 46 8e-05 ENSE00001343865.2|ENSG00000013573.6|ENST00000228264.4 assembly=N... 46 8e-05 ENSE00001160049.1|ENSG00000013573.6|ENST00000290818.3 assembly=N... 46 8e-05 This is the number 2 query tested. Results for scoring against Query= hg17_chainMm5_chr1 range=chr1:82719-95929 5'pad=0 3'pad=0 are: ENSE00001373792.1|ENSG00000175182.4|ENST00000310585.3 assembly=N... 80 6e-14 ENSE00001134144.2|ENSG00000160013.2|ENST00000307155.2 assembly=N... 78 2e-13 ENSE00001433065.1|ENSG00000185480.2|ENST00000358383.1 assembly=N... 78 2e-13 ENSE00001422761.1|ENSG00000183160.2|ENST00000360503.1 assembly=N... 74 4e-12 ENSE00001431410.1|ENSG00000139631.6|ENST00000308926.3 assembly=N... 74 4e-12 ENSE00001433065.1|ENSG00000185480.2|ENST00000358383.1 assembly=N... 72 1e-11 ENSE00001411753.1|ENSG00000126882.4|ENST00000358329.1 assembly=N... 72 1e-11 ENSE00001428167.1|ENSG00000110497.4|ENST00000314823.4 assembly=N... 72 1e-11 ENSE00001401130.1|ENSG00000160828.5|ENST00000359898.1 assembly=N... 72 1e-11 ENSE00001414900.1|ENSG00000176920.4|ENST00000356650.1 assembly=N... 72 1e-11 ENSE00001428167.1|ENSG00000110497.4|ENST00000314823.4 assembly=N... 72 1e-11 ENSE00001400942.1|ENSG00000138670.5|ENST00000356373.1 assembly=N... 72 1e-11 ENSE00001400116.1|ENSG00000120907.6|ENST00000356368.1 assembly=N... 70 6e-11 ENSE00001413546.1|ENSG00000184209.6|ENST00000344033.2 assembly=N... 70 6e-11 ENSE00001433572.1|ENSG00000124243.5|ENST00000355583.1 assembly=N... 70 6e-11 ENSE00001423154.1|ENSG00000125875.4|ENST00000354200.1 assembly=N... 70 6e-11 ENSE00001400109.1|ENSG00000183785.3|ENST00000339190.2 assembly=N... 70 6e-11 ENSE00001268950.4|ENSG00000084112.4|ENST00000303438.2 assembly=N... 68 2e-10 ENSE00001057279.1|ENSG00000161270.6|ENST00000292886.2 assembly=N... 68 2e-10 ENSE00001434317.1|ENSG00000171453.2|ENST00000304004.2 assembly=N... 68 2e-10 _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor