Dear Elena, Dear JunJun,
Many thanks to both of you for having taken the time to address my
request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my
issue was not very bright! Apologies.
I must say that things have changed substantially since then and look
much better today. The output file generated after retrivieving
"EntrezGene.ID"is a lot more consistent than a month ago. There are
fewer duplicates (and actually no duplicate rows in the output table as
there used to be), much fewer "NA" entries from Entrez (although I
checked these null entries have an associated gene name). I guess these
are issues with the Entrez database. Perhaps, what is most important:
all input Ensembl.Gene.ID are present in the outpout table.
My concern now is an issue with the retrieval of Ensembl.Transcript.ID,
the default attributes of Biomart:
Actually I am working with a list of 22308 Ensembl gene ID mapped on the
Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output as a TSV file ("export all results to", not
checking "unique results only").
I then go to R to check this output file.
The output table has 75966 row, of which 59458 are unique. In other
words, 42950 rows are unique and 16508 are duplicated. Why some rows are
duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved
and are missing from the output table. These are bona fide genes with
regular associated gene names. If I upload the list of these missing
guys, I now get the corresponding transcripts. All of them are retrieved
and there are no duplicate rows!
In anticipation I thank you very much for your valuable help and comments
Best regards
Henri-Jean
Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon,
The reason for only seeing a subset of EntrezGeneID is b/c only some
transcripts do not have EntrezGene ID associated with them. If you
select Ensembl Transcript ID as an attribute, you will se which
transcripts correspond to which EntrezGene ID.
For example.
ENSMUSG00000026073 (Illr2) - only one of transcripts
(ENSMUST00000027243) has EntrezGene ID
And
ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although
only one transcript (ENSMUST00000038141).
I hope it helps.
Elena
From: Henri-Jean GARCHON <[email protected]
<mailto:[email protected]>>
Reply-To: "[email protected]
<mailto:[email protected]>" <[email protected]
<mailto:[email protected]>>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[email protected] <mailto:[email protected]>" <[email protected]
<mailto:[email protected]>>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details
-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem <[email protected]>
Pour : <[email protected]>
The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]
----- The following addresses had permanent fatal errors -----
[email protected]
(reason: 550 Host unknown)
(expanded from:<[email protected]>)
----- Transcript of session follows -----
550 [email protected]... Host unknown (Name server:
biomart.org.redirect: host not found)
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users