Perhaps someone here can point me in the direction of an answer. I loaded a 123.5 M row table onto Hadoop using a SAS data step. After completion the reported rows on Hadoop are 212M. With some investigation, the additional 89M rows are coming from an embedded ASCII 13 character. If the data table is cleaned of "off-keyboard" characters, ASCII < 32 and > 126, the data step loads and the correct rows are reported.
We cannot clean 100's of Tb of data. Is there a system parameter on Hadoop that could help? Thanks so much, Steve LEGAL DISCLAIMER The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer. By replying to this e-mail, you consent to SunTrust's monitoring activities of all communication that occurs on SunTrust's systems. SunTrust is a federally registered service mark of SunTrust Banks, Inc. [ST:XCL]