[R] R eat my data

2010-05-25 Thread Changbin Du
HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes? c...@nuuk:~/operon$ wc -l id_name_gh5.txt 1932 id_name_gh5.txt gene_name-read.table(/home/cdu/operon/id_name_gh5.txt, sep=\t, skip=0, header=F, fill=T) dim(gene_name) [1] 1068

Re: [R] R eat my data

2010-05-25 Thread Mohamed Lajnef
Hi Changbin, Try to use this code in R to count the lines of your file without open it length(count.fields(id_name_gh5.txt)) Regards Mohamed Changbin Du a écrit : HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes?

Re: [R] R eat my data

2010-05-25 Thread David Winsemius
On May 25, 2010, at 11:42 AM, Changbin Du wrote: HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes? We are being asked to investigate this quest, how? Have you looked at the last line to see if it looks like gene_name?

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
length(count.fields(/home/cdu/operon/id_name_gh5.txt)) [1] 1932 It is 1932 lines when count in R On Tue, May 25, 2010 at 8:52 AM, Mohamed Lajnef mohamed.laj...@inserm.frwrote: Hi Changbin, Try to use this code in R to count the lines of your file without open it

Re: [R] R eat my data

2010-05-25 Thread Barry Rowlingson
On Tue, May 25, 2010 at 4:42 PM, Changbin Du changb...@gmail.com wrote: HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes? c...@nuuk:~/operon$ wc -l id_name_gh5.txt 1932 id_name_gh5.txt

Re: [R] R eat my data

2010-05-25 Thread Mohamed Lajnef
cheks the comments sent by David! M Changbin Du a écrit : length(count.fields(/home/cdu/operon/id_name_gh5.txt)) [1] 1932 It is 1932 lines when count in R On Tue, May 25, 2010 at 8:52 AM, Mohamed Lajnef mohamed.laj...@inserm.fr mailto:mohamed.laj...@inserm.fr wrote: Hi Changbin,

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical proteinconserved hypothetical protein Here is the last two lines of the file id_name_gh5.txt. On Tue, May 25, 2010 at 8:57 AM, David Winsemius dwinsem...@comcast.netwrote: On May 25, 2010,

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
c...@nuuk:~/operon$ grep '^#' id_name_gh5.txt c...@nuuk:~/operon$ no lines starts with # On Tue, May 25, 2010 at 9:11 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Tue, May 25, 2010 at 4:42 PM, Changbin Du changb...@gmail.com wrote: HI, Dear R community, My original file

Re: [R] R eat my data

2010-05-25 Thread Sarah Goslee
Without the actual file to look at, this is like playing 20 questions, only not so much fun. However, this kind of problem is most often caused by the presence in your file of something that R interprets as a special character, usually # or ' or . Can you open the file in a spreadsheet? Can you

Re: [R] R eat my data

2010-05-25 Thread Joris Meys
without any clue about your data-file this is definitely unsolvable. But some things to consider : Where is the dataset coming from? Did you check for special characters? Is there an apostrophe somewhere in a string? (That messed up things for me once). Is the delimiter placed correctly

Re: [R] R eat my data

2010-05-25 Thread Kevin E. Thorpe
When I encounter problems like this, I make sure each row has the expected number of columns. Something like the following awk code is useful. awk -F\t '{print NF}' id_name_gh5.txt | sort | uniq -c Note: I'm not sure is the \t will work with the -F switch as above. Kevin Changbin Du wrote:

Re: [R] R eat my data

2010-05-25 Thread David Winsemius
Have you compared them to tail(gene)_name, 2) Come on, man, show some initiative. On May 25, 2010, at 12:12 PM, Changbin Du wrote: 644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical proteinconserved hypothetical protein Here is the last

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
Thanks you all for the contributions! I will send the data back to the computer guys who collect data yesterday. Actually, the data can be open in excel and txt editor. after replace some ;, gene_name-read.table(/home/cdu/operon/id_name_gh5.txt, sep=\t, skip=0, header=FALSE, fill=TRUE)

Re: [R] R eat my data

2010-05-25 Thread Shi, Tao
: [R] R eat my data 644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical protein conserved hypothetical protein Here is the last two lines of the file id_name_gh5.txt. On Tue, May 25, 2010 at 8:57 AM, David Winsemius href=mailto:dwinsem

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
To: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org Sent: Tue, May 25, 2010 9:12:58 AM Subject: Re: [R] R eat my data 644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical protein conserved hypothetical protein Here is the last

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
: Re: [R] R eat my data 644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical protein conserved hypothetical protein Here is the last two lines of the file id_name_gh5.txt. On Tue, May 25, 2010 at 8:57 AM, David Winsemius href

Re: [R] R eat my data

2010-05-25 Thread Joris Meys
the last entries in the dataframe, how do they look? On Tue, May 25, 2010 at 6:12 PM, Changbin Du changb...@gmail.com wrote: 644727344ABC-2 type transporterABC-2 type transporter 644727345conserved hypothetical proteinconserved hypothetical protein Here is the last two lines

Re: [R] R eat my data

2010-05-25 Thread Chris Stubben
Gene names often have single quotes like 5'-methylthioadenosine phosphorylase ATP synthase B' chain ppGpp 3'-pyrophosphohydrolase so maybe try adding quote= to the read table options. Chris Stubben -- View this message in context:

Re: [R] R eat my data

2010-05-25 Thread Changbin Du
id_gname-read.table(/home/cdu/operon/id_name_gh5.txt, sep=\t, quote=, skip=0, header=F, fill=T) dim(id_gname) [1] 19323 Yes, it works after adding quote= to the read table options. Thanks, Chris! On Tue, May 25, 2010 at 9:34 AM, Chris Stubben stub...@lanl.gov wrote: Gene names often