Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/16/2023 6:38 PM, Rob van der Heij wrote: > KEEP to include the line end characters in the output stream? Would you > also need a BEFORE and AFTER, or is it good enough to just have them at the > end? You could use STRIP TRAILING ANYOF with the same set. I picture wanting them at the end of th

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 23:29, Glenn Knickerbocker wrote: > On 3/16/2023 4:46 PM, Rob van der Heij wrote: > > alternatives (with preference for the first one) > > - line end is any unique sequence of the specified characters, so if you > > specify the CR and LF as candidate, then CR, LF, CR LF, an

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/16/2023 4:46 PM, Rob van der Heij wrote: > alternatives (with preference for the first one) > - line end is any unique sequence of the specified characters, so if you > specify the CR and LF as candidate, then CR, LF, CR LF, and LF CR are all > one single end of line, but CR CR would imply a n

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 22:33, Paul Gilmartin wrote: > > Otherwise, the format of the record separator might be an optional > parameter to your program. > We already have the ability to specify the line-end character or string. The discussion was about when you don't know in advance what convent

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Paul Gilmartin
On 3/16/23 14:46:04, Rob van der Heij wrote: ... Yes, I think we all realized the ambiguity. I was considering these alternatives (with preference for the first one) - line end is any unique sequence of the specified characters, so if you specify the CR and LF as candidate, then CR, LF, CR L

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 21:29, Glenn Knickerbocker wrote: > Figuring out the reasonable assumptions to make to make that decision is > the biggest part of what I meant by "getting it right." > Yes, I think we all realized the ambiguity. I was considering these alternatives (with preference for th

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/13/2023 7:41 PM, Paul Gilmartin wrote: > The problem is not well-posed.  Consider >     Foo >     >     >     Bar > Is that two records, or three with a null record between the and > the . Figuring out the reasonable assumptions to make to make that decision is the biggest part of what I m

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/15/2023 1:28 AM, Rob van der Heij wrote: > Splitting a record with no words will pass the record (see example in usage > note :-) Right, and in that case it was a real null line in the input ...0a0a..., so it hasn't added any *extra* null records. ¬R

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
Splitting a record with no words will pass the record (see example in usage note :-) On Wed, 15 Mar 2023 at 01:13, Glenn Knickerbocker wrote: > On Tue, 14 Mar 2023 08:07:15 +0100, Rob wrote: > >On Tue, 14 Mar 2023 at 00:44, Donald Russell > wrote: > >> —> ... | deblock linend 0a | split 0d |

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Glenn Knickerbocker
On Tue, 14 Mar 2023 08:07:15 +0100, Rob wrote: >On Tue, 14 Mar 2023 at 00:44, Donald Russell wrote: >> —> ... | deblock linend 0a | split 0d | ... >> Could that cause extra lines? >Yes, it does. I didn't find a case where SPLIT created any extra null records. It will *lose* null lines if ther

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 16:59, John P. Hartmann wrote: > On 3/14/23 16:05, Rob van der Heij wrote: > > The repeating "range" doesn't really do that, but I just checked that > > strip x0d 1 will take one from either or both sides of the record. > > You're right. It gives an error because it su

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread John P. Hartmann
On 3/14/23 16:05, Rob van der Heij wrote: The repeating "range" doesn't really do that, but I just checked that strip x0d 1 will take one from either or both sides of the record. You're right. It gives an error because it supports only ranges relative to the beginning of the record and fro

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 15:56, John P. Hartmann wrote: > A more reliable way to remove just one cr from either end might be > something like > > change (1 -1) x0d // > The repeating "range" doesn't really do that, but I just checked that strip x0d 1 will take one from either or both sides of

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread John P. Hartmann
On 3/14/23 00:44, Donald Russell wrote: If you know the data has crlf or lfcr or just lf but never just cr then … deblock 0a | strip both 0d | … A more reliable way to remove just one cr from either end might be something like change (1 -1) x0d //

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 00:44, Donald Russell wrote: > —> ... | deblock linend 0a | split 0d | ... > > Could that cause extra lines? > Yes, it does. I agree removing a trailing x0d would have done. But the challenge we see is that a file with just CR will buffer the input before splitting. I've

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-13 Thread Donald Russell
—> ... | deblock linend 0a | split 0d | ... Could that cause extra lines? If you know the data has crlf or lfcr or just lf but never just cr then … deblock 0a | strip both 0d | … I suspect strip is more efficient than split because split has to scan entire record, where as strip starts at each

Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-13 Thread Paul Gilmartin
On 3/13/23 16:41:47, Glenn Knickerbocker wrote: Anyone have an idiom for deblocking and translating a file in ASCII that may have either or both of CR and LF (*), ... The problem is not well-posed. Consider Foo Bar Is that two records, or three with a null record between the

[CMS-PIPELINES] deblocking with various possible linends

2023-03-13 Thread Glenn Knickerbocker
(Copied here from an IBM internal discussion because I should have come here first anyway:) Anyone have an idiom for deblocking and translating a file in ASCII that may have either or both of CR and LF (*), and may be split into records, without unnecessarily buffering the whole file? The possibi