On 29/10/2015 14:41, Mike Kerner wrote:
Belay that.  Let's do this on the list.

Sure ...
On Thu, Oct 29, 2015 at 10:22 AM, Mike Kerner <m...@mikekerner.com <mailto:m...@mikekerner.com>> wrote:

    1) In v3, why did you remove the <HT> substitution?  That just bit me.


Short answer : A bug.
Long answer : 2 bugs, but on the same line of code - so kind of just one bug really :-)
Very Long Answer :
I had a version (say, 2.9) which I tested properly. Then I added some more parameterization, and while doing that I thought "This line is wrong, it shouldn't be doing "replace TAB with ...", it should be using one of these new parameters". This was just plain wrong, so that's bug number 1.

Then I later realized that there was no case where I would need to do the "replace" as written - so I commented out the line (also, wrong - that's bug number 2).


Solution:
I enclose below a new version, csvToTab4. Only change (in the card script) is that line 37 changed from
    -- replace pOldItemDelim with pNewTAB in theInsideStringSoFar
to
    replace TAB with pNewTAB in theInsideStringSoFar

And with that change it does (AFAIK) properly produce <GS> (or whatever you pass in as pNewTAB) for any embedded TAB chars.

2) I'm not sure we should bore everyone else with the details on the list, but I'd like to pick your brain about some of the details of what you're thinking in various parts of this as I intend to do some tweaking and commenting for future reference.
Yeah, it would be great to improve the comments, and hopefully explain what it's doing.

On 29/10/2015 15:01, Mike Kerner wrote:
So beyond the embedded <HT>, I found another issue.  Let's say the string is
"test<CR>"""


The <CR> is not handled.
Hmmm - in my testing it is, I give it ( last line is same as this example you give )

INPUT

a,"b
c"
"c<TAB>d"
"e<CR>"""

and get OUTPUT
a<TAB>b<VT>c
c<GS>d
e<VT>"

which I think is correct. Do you have a more complex test case, or do you get different results ? Can you send me thae case where you see the problem (off-list) ? Thanks.

Should you perhaps do your substitutions on the "inside", instead of on the
"passedQuote"?

Hmmm - tempting, but no.

Firstly, it would need to do the replace in the current item both for status = 'inside' and 'passedquote' because if you have input like
   "one<TAB> two""three""four<TAB>five"
the status goes from 'inside' to 'passedquote' to 'inside' to 'passedquote' to etc. and for the latter TAB character it is 'passedquote'.

More generally, I want to do these substitutions in as few places as possible (i.e. so that I am passing the longest possible string to the engine to do a speedy 'replace'), so the best time to do that after 'passedquote'.

New version
function CSVToTab4 pData, pOldLineDelim, pOldItemDelim, pNewCR, pNewTAB
   -- fill in defaults
   if pOldLineDelim is empty then put CR into pOldLineDelim
   if pOldItemDelim is empty then put COMMA into pOldItemDelim
if pNewCR is empty then put numtochar(11) into pNewCR -- Use <VT> for quoted CRs if pNewTAB is empty then put numtochar(29) into pNewTAB -- Use <GS> (group separator) for quoted TABs

   local tNuData                         -- contains tabbed copy of data

   local tStatus, theInsideStringSoFar

   -- Normalize line endings: REMOVED
-- Will normaly be correct already, only binfile: or similar chould make this necessary
   -- and that exceptional case should be the caller's responsibility

   put "outside" into tStatus
   set the itemdel to quote
   repeat for each item k in pData
      -- put tStatus && k & CR after msg
      switch tStatus

         case "inside"
            put k after theInsideStringSoFar
            put "passedquote" into tStatus
            next repeat

         case "passedquote"
-- decide if it was a duplicated escapedQuote or a closing quote
            if k is empty then   -- it's a duplicated quote
               put quote after theInsideStringSoFar
               put "inside" into tStatus
               next repeat
            end if
-- not empty - so we remain inside the cell, though we have left the quoted section -- NB this allows for quoted sub-strings within the cell content !!
            replace pOldLineDelim with pNewCR in theInsideStringSoFar
            replace TAB with pNewTAB in theInsideStringSoFar
            put theInsideStringSoFar after tNuData

         case "outside"
            replace pOldItemDelim with TAB in k
            -- and deal with the "empty trailing item" issue in Livecode
replace (pNewTAB & pOldLineDelim) with pNewTAB & pNewTAB & CR in k
            put k after tNuData
            put "inside" into tStatus
            put empty into theInsideStringSoFar
            next repeat
         default
            put "defaulted"
            break
      end switch
   end repeat

   -- and finally deal with the trailing item isse in input data
-- i.e. the very last char is a quote, so there is no trigger to flush the
   --      last item
   if the last char of pData = quote then
      put theInsideStringSoFar after tNuData
   end if

   return tNuData
end CSVToTab4

-- Alex.
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to