Re: [R] How to pre-process fwf or csv files to remove unexpected characters in R?

2016-11-06 Thread Lucas Ferreira Mation
Thank you Bert, Jeff and David for great answers. Let me provide more context to clarify the question: - I am running this on a large server (512GB), so the data still fits into memory (and I also know how to process in chunks if necessary) - I agree that DBMS and other software would me better

Re: [R] How to pre-process fwf or csv files to remove unexpected characters in R?

2016-11-06 Thread Jim Lemon
Hi Lucas, This is a rough outline of something I programmed years ago for data cleaning (that was programmed in C). The basic idea is to read the file line by line and check for a problem (in the initial application this was a discrepancy between two lines that were supposed to be identical).

Re: [R] How to pre-process fwf or csv files to remove unexpected characters in R?

2016-11-06 Thread David Winsemius
> On Nov 6, 2016, at 5:36 AM, Lucas Ferreira Mation > wrote: > > I have some large .txt files about ~100GB containing a dataset in fixed > width file. This contains some errors: > - character characters in column that are supposed to be numeric, > - invalid characters >

Re: [R] How to pre-process fwf or csv files to remove unexpected characters in R?

2016-11-06 Thread Jeff Newmiller
?readLines ... given the large size of file you may need to process chunks by specifying a file connection rather than a character string file name and using the "n" argument. ?grepl ?Extract ?tools::showNonASCII There are many ways for data to be corrupted... in particular when invalid

[R] How to pre-process fwf or csv files to remove unexpected characters in R?

2016-11-06 Thread Lucas Ferreira Mation
I have some large .txt files about ~100GB containing a dataset in fixed width file. This contains some errors: - character characters in column that are supposed to be numeric, - invalid characters - rows with too many characters, possibly due to invalid characters or some missing end of line