[galaxy-user] Which Input FASTQ quality scores type should I choose when run FASTQ Groomer?

2013-08-30 Thread Du, Jianguang
Hi All, I downloaded some RNA-seq datasets from NCBI. The datasets were generated by Illumina Hiseq 2000. I am not sure which Input FASTQ quality scores type I should choose when run FASTQ Groomer. Below shows the scores of 2 reads of a dataset, I renamed them as read 1 and read 2. 1)

[galaxy-user] View details of Tophat alignment

2013-05-30 Thread Du, Jianguang
Hi All, After I finshed Tophat alignment for RNA-seq, I took look at the details of parameters by clicking the icon View details, and I got the information as shown below: Input Parameter Value Note for rerun RNA-Seq FASTQ file 73: Filtered Groomed data1_rep2 Use a built in reference

[galaxy-user] Which Library Type should I use for single-end reads

2013-04-15 Thread Du, Jianguang
Hi All, I have a very basic question about parameters for running TopHat. I have datasets of single-end reads. These datasets were generated with Illumina Genome Analyzer IIx. Which Library Type should I choose to run Tophat? Thanks. Best, Jianguang

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-11 Thread Du, Jianguang
? Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Wednesday, April 10, 2013 3:16 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions

Re: [galaxy-user] Parameters for merging BAM files

2013-04-10 Thread Du, Jianguang
Hi Jen, Thanks for the information. I used this setting and the merged BAM files (.accepted hits) worked quite well for the downstream analysis. Best, Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Tuesday, April 09, 2013 4:10 PM To: Du, Jianguang Cc

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-10 Thread Du, Jianguang
be 33 nucleotides). So my understanding is that setting the Anchor length at 3 does not increase the inaccuracy of the alignment. Am I correct? Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Tuesday, April 09, 2013 1:57 PM To: Du, Jianguang

Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-09 Thread Du, Jianguang
in the .splicing junctions output. Is my understanding correct? Does the regions mean the number of mapped splicing junctions? Thanks. Best, Jianguang From: Jeremy Goecks [jeremy.goe...@emory.edu] Sent: Tuesday, April 09, 2013 9:03 AM To: Du, Jianguang Cc

[galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions?

2013-04-08 Thread Du, Jianguang
Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent

[galaxy-user] Parameters for merging BAM files

2013-04-05 Thread Du, Jianguang
Hi All, I want to merge the Tophat output (Accepted Hits) of Several datasets. I want the merged BAM file has the exact format as the individual input BAM files, should I check Merge all component bam file headers into the merged bam file? Thanks. Have a nice weekend. Jianguang

[galaxy-user] is there size limit of dataset for running Tophat?

2013-03-27 Thread Du, Jianguang
Hi All, Is there a size limit of dataset for running Tophat at Galaxy? If there is, how many reads is the limit? Thanks. Jianguang ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the

[galaxy-user] Do I need to specify the file format when I upload datasets using FTP method?

2013-03-21 Thread Du, Jianguang
Hi Everyone, When I upload my datasets onto my history via FTP method (using FileZilla), do I need to specify the file format under File Format of Upload File from your computer? I noticed that the screencast of how to upload datasets via FTP just leaves the File Format as Auto-detect.

[galaxy-user] please restore my account

2012-10-08 Thread Du, Jianguang
Dear Sir or Madam, I had onpened multiple accounts at Galaxy Main, I did not know that it is against policy. I noticed this policy when I found that all the accounts are blocked. Would you please restore the account with email address jia...@iupui.edumailto:jia...@iupui.edu? If you are not

[galaxy-user] How much FPKM can be take into consideration when compare gene expression

2012-09-19 Thread Du, Jianguang
Dear All, I am comparing the gene expression between two cell types by examining the Cufflink output file -- gene differential expression testingjavascript:void(0);. The file lists the FPKM of genes in two cell types and log2 of fold. I want to look for genes that have more than 2-flod of

[galaxy-user] Does Tophat output *.accepted hits file contain headers?

2012-09-13 Thread Du, Jianguang
Dear All, I want to use the Tophat output files with .accepted hits to do analysis outside Galaxy. However, the program I am using requires the Tophat output to be indexed, sorted BAM files that contain headers. Do the Tophat ouputs with .accepted hits produced at Galaxy contain headers? Will

[galaxy-user] Tophat settings

2012-09-06 Thread Du, Jianguang
Dear All, I am not so sure about two Tophat settings. Please help. 1) Number of mismatches allowed in the initial read mapping Based on the documantation, my understanding is: the reads are re-aligned to transcriptome/genome if the mismatches in the initial alignment is more than the set

Re: [galaxy-user] Please help to understand the square root of Jensen-Shannon divergence

2012-09-06 Thread Du, Jianguang
, and then compare between conditions. Thanks in advance, Jianguang From: Jennifer Jackson [j...@bx.psu.edu] Sent: Thursday, September 06, 2012 12:38 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu; closetic...@galaxyproject.org Subject: Re: [galaxy

[galaxy-user] Number of mismatches allowed in the initial read mapping

2012-09-06 Thread Du, Jianguang
Dear All, I tested how to set the Number of mismatches allowed in the initial read mapping as follows. At first, I ran FASTQ Groomer on a dataset to get the number of total reads. The total number of the reads is 17510227. Then I ran Tophat after set Number of mismatches allowed in the

[galaxy-user] Please help to understand the square root of Jensen-Shannon divergence

2012-09-04 Thread Du, Jianguang
Dear All, I am looking for the differential splicing events between cell types. However the Cuffdiff gives output using the square root of Jensen-shannon divergence to measure the difference. Although I tried my best to understand the definition of the square root of Jensen-shannon

[galaxy-user] Should I use raw junction and Only look for supplied junctions

2012-08-28 Thread Du, Jianguang
Dear All, I have two more questions about settings for Tophat. My aim is to look for the defferential splicing events between cell types. After I checked Use Own Junctions, three more options came out: 1) Use Gene Annotation Model 2) Use raw Junctions 3) Only look for supplied junctions

[galaxy-user] Please help me check the quality of the Tophat mapping to reference genome

2012-08-27 Thread Du, Jianguang
Dear All, I ran Flagstat under NGS: SAM Tools to check the quality of the Tophat output (the file of accepted hits). I got the diagnosis results as follow: 9471730 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 9471730 + 0 mapped (100.00%:-nan%) 0 + 0 paired in sequencing 0

[galaxy-user] Please help with the settings for Cufflink, Cuffmerge and Cuffdiff

2012-08-27 Thread Du, Jianguang
Dear All, I am looking for the differential splicing events between cell types. Although I got a lot of helps from Jen and from protocols found online, I am still not sure about some settings for Cufflink, Cuffmerge and Cuffdiff. 1) For Cufflink: There is a setting for Bias Correction. I made

[galaxy-user] How to decide if the deference is significant

2012-08-27 Thread Du, Jianguang
Dear All, I am looking for the deferential splicing events between cell types. I have run the Cuffdiff and I am going through the output file splicing differential expression testing. I have read the documentation and protocols about how Cuffdiff test for differential expression and

[galaxy-user] How much can I trimm my reads

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for the differential splicing events between cell types. My reads are 36bp long. In order to increase the quality of reads, I need to trim some nucleotides from ends. How many nucleotides can I trim? I am afraid that if I trim too much, the reliability

[galaxy-user] What is the minimum Quality should I set for Filter FASTQ?

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. Some of my reads contain bed nucleotides, should I run Filter FASTQ to remove these not so good reads? If I do need to, what is the Minimum Quality should I set for the Filter? Thanks. Jianguang

[galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Dear All, I am analysing RNA-seq datasets for differential splicing events between cell types. These are mouse cells. Jen suggested me to use the iGenomes version of reference GTF to take full advantage of the options in CuffDiff. My question is: should I use this iGenome version reference GTF

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
From: Jennifer Jackson [j...@bx.psu.edu] Sent: Thursday, August 23, 2012 11:46 AM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat? Hello Jianguang, When in the analysis

Re: [galaxy-user] Should I use iGenomes verson of a reference GTF for Tophat?

2012-08-23 Thread Du, Jianguang
Use a built-in index. How can I solve this problem? Thanks in advance. Jianguang From: galaxy-user-boun...@lists.bx.psu.edu [galaxy-user-boun...@lists.bx.psu.edu] on behalf of Du, Jianguang [jia...@iupui.edu] Sent: Thursday, August 23, 2012 4:01 PM

[galaxy-user] How to find the alternatively spliced segment of genes in Cuffdiff output

2012-08-21 Thread Du, Jianguang
Dear All, I have run programs from Tophat to Cuffdiff of Galaxy to look for the difference in alternative splicing events between cell types. However I do not know how to find the detail information (such as the sequence and the genomic coordinates) of the alternatively spliced part of a

[galaxy-user] Minimum length of read segments

2012-08-16 Thread Du, Jianguang
Dear All, I am going to run Tophat with RNA-seq dataset to observe alternative splicing events. There is a parameter for Tophat: Minimum length of read segment. According to implemented Tophat options, the description for Minimum length of read segment is Each read is cut up into segments,

[galaxy-user] run Bowtie to estimate Mean Inner Distance between Mate Pairs

2012-08-16 Thread Du, Jianguang
Dear All, In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put

[galaxy-user] How to decide Mean Inner Distance between Mate Pairs?

2012-08-15 Thread Du, Jianguang
Dear All, I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets. Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: Layout: PAIRED,

[galaxy-user] Do I need to allow indel search?

2012-08-15 Thread Du, Jianguang
Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA-seq datasets. Do I need to allow indel search when I run Tophat? What is the indel search for? I could not find detail information about indel search through the documentation of Tophat. Thanks. Jianguang Du

[galaxy-user] Use Own Junctions or not

2012-08-15 Thread Du, Jianguang
Dear All, I want to compare the pre-mRNA alternaive splicing events between RNA-seq datasets. Should I use own junctions when I run Tophat? What does Own Junctions mean? Thanks. Jianguang DU ___ The Galaxy User list should be used for the

[galaxy-user] whixh setting should be used to upload mouse reference genome?

2012-08-14 Thread Du, Jianguang
Dear All, I am going to search the alternative splicing events bentween datasets. I am not sure about the settings of mouse reference genome (mm9) when I upload it from UCSC Main. Would you please tell me the settings for 1) group: 2) Track: 3) Table: 4) Output format: Thanks.

[galaxy-user] FASTQ splitter produced empty dataset, please help

2012-08-10 Thread Du, Jianguang
I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly, I list the detail of what I did with my dataset: Step 1) My aim is to compare datasets for the differential alternative splicing. I downloaded paired-end datasets at FASTQ

[galaxy-user] (no subject)

2012-08-10 Thread Du, Jianguang
dataset into two datasets, how should I choose the settings when I run Manipulte FASTQ? Thanks. Jianguang / On 8/10/12 7:21 AM, Du, Jianguang wrote: I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly

[galaxy-user] need help to split paired-end dataset

2012-08-10 Thread Du, Jianguang
dataset into two datasets, how should I choose the settings when I run Manipulte FASTQ? Thanks. Jianguang / On 8/10/12 7:21 AM, Du, Jianguang wrote: I have problem to split a paired-end FASTQ dataset into two separate datasets. In order to explain the problem clearly