Hi,

I'm a beginner at Python and would really appreciate some help in how to 
extract information from a vcf file. 

The attached file consists of a lot of information on mutations, this one 
though is just 2 rows and 10 columns (the real one has a lot more rows). 

I want to extract the mRNA ID only if the mutation is missense. These two rows 
(mutations) that I have attached happens to be missense but how do I say that 
I'm not interested in the mutations that's not missense (they might be e.g. 
synonymous). Also, how do I say that if a mutation starts with a # symbol I 
don't want to include it (sometimes the chr starts with a hash).

vcf file: 2 rows, 10 columns. 
   
col 0                         col 1            col 2                  col 3     
         col 4      col5            col6                       col7             
                        col8                     col9
chromosome          position           .                  Reference       ALT   
   position          .          some statistics and the ID:s         not 
important        not important

The important column is 7 where the ID is, i.e. 
refseq.functionalClass=missense. It's a missense mutation, so then I want to 
extract refseq.name=NM_003137492, or I want to extract only the ID, which in 
this case is NM_003137492. 

Then I want to do exactly the same thing for all the other mutations, but only 
for the missense mutations not the other ones. How do I accomplish that? Where 
do I start? 

Best,
Anna

                                                                                
  
4	69345	.	C	T	32	.	1kg.AD=0;1kg.AF=0.8865;1kg.AN=345;AC=1;AF=0.50;AN=2;BaseQRankSum=-6.432;DP=327;DS;Dels=0.00;FS=6.435;HRun=0;HaplotypeScore=5.0380;MQ=54.34;MQ0=13;MQRankSum=-5.3457;QD=2.65;ReadPosRankSum=-0.321;SB=-321.04;dbsnp.ID=rs43032118;dbsnp.dbSNPBuildID=112;hgid.AF=0.6754;refseq.changesAA=true;refseq.codingCoordStr=c.345C>T;refseq.codonCoord=234;refseq.functionalClass=missense;refseq.inCodingRegion=true;refseq.mrnaCoord=321;refseq.name=NM_003137492;refseq.name2=PQ3D2;refseq.positionType=CFD;refseq.proteinCoordStr=p.C132T;refseq.referenceAA=BRT;refseq.referenceCodon=DGF;refseq.spliceDist=321;refseq.transcriptStrand=+;refseq.variantAA=Ala;refseq.variantCodon=GCA	GT:AD:DP:GQ:PL	1/1:134,32:765:99:56,1,4576
4	87342	.	A	G	57.7	.	1kg.AD=0;1kg.AF=1.0;1kg.AN=345;AC=2;AF=0.00;AN=4;DP=2;Dels=0.00;FS=0.000;HRun=2;HaplotypeScore=0.0000;MQ=29.00;MQ0=0;QD=54.97;SB=-52.65;dbsnp.ID=rs6732096;dbsnp.dbSNPBuildID=136;hgid.AF=0.9567;refseq.changesAA=true;refseq.codingCoordStr=c.3207A>G;refseq.codonCoord=349;refseq.functionalClass=missense;refseq.inCodingRegion=true;refseq.mrnaCoord=4509;refseq.name=NM_132768;refseq.name2=PQR321;refseq.positionType=CDS;refseq.proteinCoordStr=p.O125R;refseq.referenceAA=Trp;refseq.referenceCodon=ATG;refseq.spliceDist=-45;refseq.transcriptStrand=+;refseq.variantAA=Arg;refseq.variantCodon=CTT	GT:AD:DP:GQ:PL	0/1:0,5:5:2.02:67,9,1
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to