Re: [R] comparing two strings from data

2017-10-13 Thread Jeff Newmiller
data_2 <- read.csv("excel_data.csv",stringsAsFactors=FALSE) column_1 <- data_2$data1 column_2 <- data_2$data2 result <- match( column_1, column_2 ) Please read the Posting Guide mentioned at the bottom of this and every posting, in particular about posting plain text so that what we see will be

Re: [R] comparing two strings from data

2017-10-12 Thread Eric Berger
One additional comment. If you want 0 instead of NA when there is no match then the match statement should read: match_list <- match( data_2$data1, data_2$data2, nomatch=0) On Fri, Oct 13, 2017 at 7:39 AM, Eric Berger wrote: > Combining and completing the advice from Greg and Boris the comple

Re: [R] comparing two strings from data

2017-10-12 Thread Eric Berger
Combining and completing the advice from Greg and Boris the complete solution is two lines: data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE) match_list <- match( data_2$data1, data_2$data2 ) The vector match_list will have the matching position when it exists and NA's otherwise. Its

Re: [R] comparing two strings from data

2017-10-12 Thread Boris Steipe
It's generally a very good idea to examine the structure of data after you have read it in. str(data2) would have shown you that read.csv() turned your strings into factors, and that's why the == operator no longer does what you think it does. use ... data_2 <- read.csv("excel_data.csv", strin

Re: [R] comparing two strings from data

2017-10-12 Thread Greg Snow
The error is because the read.csv function converted both columns to factors. The simplest thing to do is to set stringsAsFactors=FALSE is the call to read.csv so that they are compared as strings. You could also call as.character on each of the columns if you don't want to read the data in again

[R] comparing two strings from data

2017-10-12 Thread Yasin Gocgun
Hi, I have two columns that contain numbers along with letters (as shown below) and have different lengths. Each entry in the first column is likely to be found in the second column at most once. For each entry of the first column, if that entry is found in the second column, I would like to get