Re: Trouble with split/tokenize on linux

Christian Müller Fri, 02 Nov 2012 15:09:18 -0700

The following should work for you:

from("file://src/test/data?fileName=crm.sample.csv&noop=true&charset=ISO-8859-1")
  .split(body().tokenize("\n"))


it's important to use the charset "ISO-8859-1".


I will open a JIRA to improve the current behavior in Camel. If you use the
wrong charset (e.g. "UTF-8" in this case), the Scanner will stop working
and store the exception which we can access by calling
"scanner.ioException()". We have to check this and may rethrow the
exception. The following test shows this behavior:

succeed:
    @Test
    public void scannerTest() throws Exception {
        Scanner scanner = new Scanner(new
File("src/test/data/crm.sample.csv"), "ISO-8859-1");
        scanner.useDelimiter("\n");

        int counter = 0;
        while (scanner.hasNext()) {
            scanner.next();
            ++counter;
        }

        assertEquals(289, counter);
    }

fails:
    @Test
    public void scannerTest() throws Exception {
        Scanner scanner = new Scanner(new
File("src/test/data/crm.sample.csv"), "UTF-8");
        scanner.useDelimiter("\n");

        int counter = 0;
        while (scanner.hasNext()) {
            scanner.next();
            ++counter;
        }

        assertEquals(289, counter);
    }


Hope this helps.

Best,
Christian


On Wed, Oct 31, 2012 at 11:04 PM, Denis S <dsoukhoros...@yahoo.com> wrote:

> crm.sample.csv
> <http://camel.465427.n5.nabble.com/file/n5721918/crm.sample.csv>
>
> this is a small portion of the file. see around 260..280 lines. I'm not
> sure
> if the file will help you reproduce my issue: now it is in win format.
>
> Thanks, Denis.
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Trouble-with-split-tokenize-on-linux-tp5721677p5721918.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>



--

Re: Trouble with split/tokenize on linux

Reply via email to