Hi All, I am working on a twitter analysis using the TM package. Below are some codes:
1- Here i am creating a data frame of the data collected from twitter chennai=as.data.frame(cbind(tweet=jallitext,date=jallidate,lat=jallilat,lon=jallilon, isretweet=isretweet,retweeted=retweeted, retweetcount=retweetcount,favorite=favoritesCount, favorited=favorited)) 2- corpus<- Corpus(VectorSource(chennai$tweet)) The output gives me: Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 6000 However while changing the text to lower using the tm package i get this error: Error in FUN(content(x), ...) : invalid input 'RT @Aariactor: Officially #jallikattu protest is over yesterday we won í ½í²ªí ¼í¿» thx to government í ½í¹ í ¼í¿»' in 'utf8towcs'. After researching a lot i am using this code:- tryTolower = function(x) { # create missing value # this is where the returned value will be y = NA # tryCatch error try_error = tryCatch(tolower(x), error = function(e) e) # if not an error if (!inherits(try_error, "error")) y = tolower(x) return(y) } corpus<- sapply(corpus, function(x) tryTolower(x)) This makes the tweets case sensitive but when i create a document term matrix i get this error: Jalli<- DocumentTermMatrix(corpus) Error in UseMethod("TermDocumentMatrix", x) : no applicable method for 'TermDocumentMatrix' applied to an object of class "character" Request you to please assist with this error. Thank you. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.