I didn't try to bring examples very close and also I've simplified the code
leaving only general idea as illustration. The real code asserts that
positions of opening and closing tags are alternating and later there is a
normalization step checking if the tag is indeed inside of a container
record
your regex is not an exact duplicate of your J. In your J, opn and cls are
searched separately and allows malformed tags, whereas your regex searches
opn and cls pair together. That said, I would try xml utilities or library
if the size of xml data is large.
On 11 Jun, 2017 11:17 pm, "Danil Osipch
There has been a good bit written about this, and some of it has made it
into System/Interpreter/Requests. If you are asking for more than is
asked for there, please add to that page.
Henry Rich
On 6/11/2017 10:17 AM, 'Pascal Jasmin' via Programming wrote:
emit empty word is strongly needed
I imagine that "itemless" is a word for 'itemless'.
That said, applications which both (a) use empty items, and (b) test
for their non-existence are beyond rare. Contrived might be a more
accurate description.
So, for most practical purposes, using "empty" as shorthand for
"itemless" seems reason
Empty means no atoms. No question about it.
Perhaps we need a word for 'itemless'. Most of the time in my code I
test for items rather than atoms.
Henry Rich
On 6/10/2017 12:38 AM, Ian Clark wrote:
It's a nice point: is a noun "empty" because it has no atoms – or because
it has no items?
Frankly I even didn't consider to use regexp for parsing of production data
(gigabytes/millions of records). Regexp must have a plenty of overhead to
pass back and forth between the library and j engine, not to mention the
whole complex logic supporting its complex semantics. This is all compared
emit empty word is strongly needed IMO.
I will probably write a mini-regex implementation for ;: (with *?+ support)
suitable for tag extraction that should be faster than regex.
But the basics of that are to start word after opening string, use ev (no
start) at start of closing string, but then
I believe readability is the reason ;: was not made more general.
Though efficiency might also have been a part of that.
--
Raul
On Sun, Jun 11, 2017 at 9:59 AM, Danil Osipchuk
wrote:
> I could not find a cutP definition with a quick look, but from your example
> it seems like you mean a charac
But that is not more general than arbitrary indices.
--
Raul
On Sun, Jun 11, 2017 at 9:16 AM, 'Pascal Jasmin' via Programming
wrote:
> odd positions means where indexes are an odd number.
>
>
>
>
>
>
> From: Raul Miller
> To: Programming forum
> Sent: Sunday,
I could not find a cutP definition with a quick look, but from your example
it seems like you mean a character separator by token. It is not general
enough.
Also, imagine a fluffy xml file, with millions of records, where only a
minority of fields of different type in records are interesting, some
odd positions means where indexes are an odd number.
From: Raul Miller
To: Programming forum
Sent: Sunday, June 11, 2017 8:54 AM
Subject: Re: [Jprogramming] Apply at start/lengths pairs
Personally, I do not understand what you mean by "odd positions".
My
Personally, I do not understand what you mean by "odd positions".
My first cut at understanding would be odd as in even numbers vs. odd
numbers. But that's not what I would think of as more general.
I took a look at your code, but there was quite a lot of it, and a
quick glance did not suggest wh
A more general procedure than your request is to cut your data such that your
start/end segments are in odd positions
in jpp, https://github.com/Pascal-J/jpp
cutP is a process for cutting on start and end tokens, though there are faster
methods in included fsm.ijs file. And that process could
Your doSL is roughly what I would expect to use for this kind of
thing. (I think I would have phrased it
doSL=: 1 :'(u;.0~ , $~ ,&1@$)~' but I don't know if that's any better,
performance wise.)
It's possible that the interpreter could be improved here...
Good luck,
--
Raul
On Sun, Jun 11, 2
It seems unrealistic to expect more than I have - sorry, I will try to
tune other parts.
The first instance of doSL is already as close to the cut conjunction as it
can get (I thought ravel items applied with rank"1 is costly - it is not)
6!:2 '(1e6 $ ''ab''xmlTagContentSL XML) #doSL XML'
0.87
Hi all,
I wonder if there is an idiomatic way to apply a verb using an array of
start and length pairs. This is a recurring pattern when extracting data
from files.
I've tried 3 adverbs (the example at the end), and the first one is
slightly better on big files, but I'm still looking for possible
16 matches
Mail list logo