Re: Ideas for stripping tags from document

2021-01-19 Thread Eric Hawicz
On 1/17/2021 6:21 PM, Todd Gruhn wrote: HEY Johnny, that thing with tr -d did not work. When I read the manpage I got and idea: character classes (in this case [:cntrl;]). It turns out that one can do s/[[:cntrl]]/\n/g using PERL. That fixed the prob with \x{d}. I still need to fix \x{92} , \x

Re: Ideas for stripping tags from document

2021-01-19 Thread Todd Gruhn
Its OK -- I have it figgered out now On Tue, Jan 19, 2021 at 7:09 PM Eric Hawicz wrote: > > On 1/17/2021 6:21 PM, Todd Gruhn wrote: > > HEY Johnny, that thing with tr -d did not work. When I read the > > manpage I got and idea: > > character classes (in this case [:cntrl;]). It turns out that on

Re: Ideas for stripping tags from document

2021-01-17 Thread Johnny Billquist
On 2021-01-18 00:21, Todd Gruhn wrote: HEY Johnny, that thing with tr -d did not work. When I read the manpage I got and idea: [...] That's weird. tr -d should definitely work. But... character classes (in this case [:cntrl;]). It turns out that one can do s/[[:cntrl]]/\n/g using PERL. Th

Re: Ideas for stripping tags from document

2021-01-17 Thread Todd Gruhn
HEY Johnny, that thing with tr -d did not work. When I read the manpage I got and idea: character classes (in this case [:cntrl;]). It turns out that one can do s/[[:cntrl]]/\n/g using PERL. That fixed the prob with \x{d}. I still need to fix \x{92} , \x{93}, etc It would be nice to do: system(

Re: Ideas for stripping tags from document

2021-01-17 Thread Johnny Billquist
On 2021-01-17 10:57, Ignatios Souvatzis (GSG) wrote: Am 17. Januar 2021 00:01:23 MEZ schrieb Johnny Billquist : On 2021-01-16 19:45, Todd Gruhn wrote: I have a large document (18,000L). It is full of tags such as <93> ,<94> , <95> . If I view the doc in a PERL editor I see \x{93} , \x{94} ,

Re: Ideas for stripping tags from document

2021-01-17 Thread Ignatios Souvatzis (GSG)
Am 17. Januar 2021 00:01:23 MEZ schrieb Johnny Billquist : >On 2021-01-16 19:45, Todd Gruhn wrote: >> I have a large document (18,000L). It is full of tags such as <93> >> ,<94> , <95> . >> >> If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ... >> >> Is there a pkg or command

Re: Ideas for stripping tags from document

2021-01-16 Thread Johnny Billquist
On 2021-01-16 19:45, Todd Gruhn wrote: I have a large document (18,000L). It is full of tags such as <93> ,<94> , <95> . If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ... Is there a pkg or command to strip these tags and leave the text ? tr -d "\223\224\225" < infile > outf

Re: Ideas for stripping tags from document

2021-01-16 Thread Todd Gruhn
thanks for the idea, Ignatios. I will try this. On Sat, Jan 16, 2021 at 3:00 PM wrote: > > Hi, > > On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote: > > I have a large document (18,000L). It is full of tags such as <93> > > ,<94> , <95> . > > > > If I view the doc in a PERL editor I se

Re: Ideas for stripping tags from document

2021-01-16 Thread ignatios
Hi, On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote: > I have a large document (18,000L). It is full of tags such as <93> > ,<94> , <95> . > > If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ... Ahem - are you sure (have you looked at as few of them with hexdump -C)?

Ideas for stripping tags from document

2021-01-16 Thread Todd Gruhn
I have a large document (18,000L). It is full of tags such as <93> ,<94> , <95> . If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ... Is there a pkg or command to strip these tags and leave the text ?