Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Danil Osipchuk
I didn't try to bring examples very close and also I've simplified the code leaving only general idea as illustration. The real code asserts that positions of opening and closing tags are alternating and later there is a normalization step checking if the tag is indeed inside of a container record

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread bill lam
your regex is not an exact duplicate of your J. In your J, opn and cls are searched separately and allows malformed tags, whereas your regex searches opn and cls pair together. That said, I would try xml utilities or library if the size of xml data is large. On 11 Jun, 2017 11:17 pm, "Danil Osipch

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Henry Rich
There has been a good bit written about this, and some of it has made it into System/Interpreter/Requests. If you are asking for more than is asked for there, please add to that page. Henry Rich On 6/11/2017 10:17 AM, 'Pascal Jasmin' via Programming wrote: emit empty word is strongly needed

Re: [Jprogramming] Empty lists?

2017-06-11 Thread Raul Miller
I imagine that "itemless" is a word for 'itemless'. That said, applications which both (a) use empty items, and (b) test for their non-existence are beyond rare. Contrived might be a more accurate description. So, for most practical purposes, using "empty" as shorthand for "itemless" seems reason

Re: [Jprogramming] Empty lists?

2017-06-11 Thread Henry Rich
Empty means no atoms. No question about it. Perhaps we need a word for 'itemless'. Most of the time in my code I test for items rather than atoms. Henry Rich On 6/10/2017 12:38 AM, Ian Clark wrote: It's a nice point: is a noun "empty" because it has no atoms – or because it has no items?

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Danil Osipchuk
Frankly I even didn't consider to use regexp for parsing of production data (gigabytes/millions of records). Regexp must have a plenty of overhead to pass back and forth between the library and j engine, not to mention the whole complex logic supporting its complex semantics. This is all compared

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread 'Pascal Jasmin' via Programming
emit empty word is strongly needed IMO. I will probably write a mini-regex implementation for ;: (with *?+ support) suitable for tag extraction that should be faster than regex. But the basics of that are to start word after opening string, use ev (no start) at start of closing string, but then

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Raul Miller
I believe readability is the reason ;: was not made more general. Though efficiency might also have been a part of that. -- Raul On Sun, Jun 11, 2017 at 9:59 AM, Danil Osipchuk wrote: > I could not find a cutP definition with a quick look, but from your example > it seems like you mean a charac

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Raul Miller
But that is not more general than arbitrary indices. -- Raul On Sun, Jun 11, 2017 at 9:16 AM, 'Pascal Jasmin' via Programming wrote: > odd positions means where indexes are an odd number. > > > > > > > From: Raul Miller > To: Programming forum > Sent: Sunday,

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Danil Osipchuk
I could not find a cutP definition with a quick look, but from your example it seems like you mean a character separator by token. It is not general enough. Also, imagine a fluffy xml file, with millions of records, where only a minority of fields of different type in records are interesting, some

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread 'Pascal Jasmin' via Programming
odd positions means where indexes are an odd number. From: Raul Miller To: Programming forum Sent: Sunday, June 11, 2017 8:54 AM Subject: Re: [Jprogramming] Apply at start/lengths pairs Personally, I do not understand what you mean by "odd positions". My

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Raul Miller
Personally, I do not understand what you mean by "odd positions". My first cut at understanding would be odd as in even numbers vs. odd numbers. But that's not what I would think of as more general. I took a look at your code, but there was quite a lot of it, and a quick glance did not suggest wh

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread 'Pascal Jasmin' via Programming
A more general procedure than your request is to cut your data such that your start/end segments are in odd positions in jpp, https://github.com/Pascal-J/jpp cutP is a process for cutting on start and end tokens, though there are faster methods in included fsm.ijs file. And that process could

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Raul Miller
Your doSL is roughly what I would expect to use for this kind of thing. (I think I would have phrased it doSL=: 1 :'(u;.0~ , $~ ,&1@$)~' but I don't know if that's any better, performance wise.) It's possible that the interpreter could be improved here... Good luck, -- Raul On Sun, Jun 11, 2

Re: [Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Danil Osipchuk
It seems unrealistic to expect more than I have - sorry, I will try to tune other parts. The first instance of doSL is already as close to the cut conjunction as it can get (I thought ravel items applied with rank"1 is costly - it is not) 6!:2 '(1e6 $ ''ab''xmlTagContentSL XML) #doSL XML' 0.87

[Jprogramming] Apply at start/lengths pairs

2017-06-11 Thread Danil Osipchuk
Hi all, I wonder if there is an idiomatic way to apply a verb using an array of start and length pairs. This is a recurring pattern when extracting data from files. I've tried 3 adverbs (the example at the end), and the first one is slightly better on big files, but I'm still looking for possible