Many years ago (2004 ?) I posted code to do something like his using split/combine to differentiate between 'inside' and 'outside' field delimiters. It was very fast - but pretty hard to follow, and I don't remember now which obscure cases it handled (we haven't even mentioned doubled characters and backslash escaped field delimiters yet :-)

The code Paul posted is very clear and easy to follow - but it will suffer pretty severe performance issues on large data sets. If you need to worry about that, you could get pretty good speed up just by replacing the "repeat with ..." by "repeat for each ..." - see below. The time to run this will grow (approx) linearly with the number of lines in the input data, whereas the previous version was a bit worse than N-squared growth ....

function csvToArray pData, pRecordDelimiter, pFieldDelimiter, pEncapsulationDelimiter
   local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray

# Initialize the temporary record and field delimiters. Change these if your CSV file may contain them. put charToNum(1) into tReservedRecordDelimiter; put charToNum(2) into tReservedFieldDelimiter;

# Step 1: Replace any Record or Field delimiters that are encapsulated with temporary characters
   set itemdel to pEncapsulationDelimiter
   put false into tIsEven
   repeat for each item itm in pData
      if tIsEven then
         replace pFieldDelimiter with tReservedFieldDelimiter in itm
         replace pRecordDelimiter with tReservedRecordDelimiter in itm
      end if
      put itm & itemDelimiter after tData
      put not tIsEven into tIsEven
   end repeat
   delete the last char of tData
   put tData into pData
   # Step 2: Replace all occurances of the encapsulation delimiter
   replace pEncapsulationDelimiter with empty in pData
   -- put pData into field "F"

# Step 3: Parse records and fields into the array, replace any occurances of the reserved record and field delimiters for each element
   set itemdel to pFieldDelimiter
   set lineDel to pRecordDelimiter
   put 0 into i
   repeat for each line L in pData
      add 1 to i
      put 0 into j
      repeat for each item itm in L
         add 1 to j
         replace tReservedRecordDelimiter with pRecordDelimiter in itm
         replace tReservedFieldDelimiter with pFieldDelimiter in itm
         put itm into tArray[i][j]
      end repeat
   end repeat

   # Step 4: return the array
   return tArray
end csvToArray



On 17/02/2011 20:01, Paul Dupuis wrote:
First, thanks to everyone who replied, but especially to Nosanity. Your code reminded me that you can effectively tell when you are inside an encapsulated bit of data by an odd/even count of the encapsulation character. So, for anyone who wants it, here is a generalized function that I just wrote to parse a CSV file, regardless of the field or record delimiters (commas, tabs or whatever) and to deal with encapsulation appropriately.

This assumes you read the entire CSV file into a variable you pass into pData, so a call would look like:

put csvToArray(myEntireCSVData,return,comma,quote) into myDataAsArray

I have tested it a bit in the last 30 minutes and it working in the cases I tried, but did not test exhaustively and have not checked performance on large datasets. If any one uses this and run into an issue, please let me know.

function csvToArray pData, pRecordDelimiter, pFieldDelimiter, pEncapsulationDelimiter
  local tReservedRecordDelimiter, tReservedFieldDelimiter, tArray

# Initialize the temporary record and field delimiters. Change these if your CSV file may contain them. put charToNum(1) into tReservedRecordDelimiter; put charToNum(2) into tReservedFieldDelimiter;

# Step 1: Replace any Record or Field delimiters that are encapsulated with temporary characters
  set itemdel to pEncapsulationDelimiter
  repeat with i = 1 to the number of items in pData
    if trunc(i/2) = (i/2) then
replace pFieldDelimiter with tReservedFieldDelimiter in item i of pData replace pRecordDelimiter with tReservedRecordDelimiter in item i of pData
    end if
  end repeat

  # Step 2: Replace all occurances of the encapsulation delimiter
  replace pEncapsulationDelimiter with empty in pData

# Step 3: Parse records and fields into the array, replace any occurances of the reserved record and field delimiters for each element
  set itemdel to pFieldDelimiter
  set lineDel to pRecordDelimiter
  repeat with i = 1 to the number of lines in pData
      repeat with j = 1 to the number of items in line i of pData
         get item j of line i of pData
         replace tReservedRecordDelimiter with pRecordDelimiter in it
         replace tReservedFieldDelimiter with pFieldDelimiter in it
         put it into tArray[i][j]
      end repeat
   end repeat

   # Step 4: return the array
   return tArray
end csvToArray




_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to