With zero tolerance, the O(n^2) problem does not arise.
Henry Rich
On 8/19/2014 10:14 PM, bill lam wrote:
2.
for numeric matrix and zero tolerance, will ~. perform better than O(n^2)
if it computes hash for each row? (or it has already done so)
On Aug 20, 2014 7:43 AM, "Roger Hui" wrote:
0.
2.
for numeric matrix and zero tolerance, will ~. perform better than O(n^2)
if it computes hash for each row? (or it has already done so)
On Aug 20, 2014 7:43 AM, "Roger Hui" wrote:
> 0. For a floating point vector, ~.!.0 is faster than ~., but not to the
> extent that Bill Lam implied.
>
>t
Conceptually speaking, at least two strings are unique in (almost) every
row. Others are duplicated, yes.
Meanwhile, a significant part of this code also needs to run on 32 bit J
(because j602 is apparently the only version of J which supports xml/sax).
It's actually grinding through this data one
I guess the benchmarks show that you are right. But the thing to worry
about are the O(n^2) monsters, not mere factors of 2 or 3 or 10.
But regarding the benchmarks: the interpreter may have infelicities in
treating the types of empty things.
array =. (10 2 ?@$ 200 10) (<;.0~ ,.)~"1 _
Is it possible to describe a significant amount of the strings occupying memory
as coming from a "small" universe?
In other words, are symbols (of the s: variety) an option for you? If you can
describe your main table as a collection of symbols (in their integer form, ie
6 s: s:), numeric valu
My experiments show ~. faster than ~.!.0 for boxed character arrays of
rank 2 (disagreeing with Roger's point 1)
array =. (10 2 ?@$ 200 10) (<;.0~ ,.)~"1 _ a.
50 {."1 ": array
+--++---+---+--++---+++---
|YZ||���|ABCDEFG| || |||���
+--++-
Good to know.
Thank you,
--
Raul
On Tue, Aug 19, 2014 at 7:43 PM, Roger Hui
wrote:
> 0. For a floating point vector, ~.!.0 is faster than ~., but not to the
> extent that Bill Lam implied.
>
>timer=: 6!:2
>x=: 1e6 ?@$ 0
>
>10 timer '~.x'
> 0.136236
>10 timer '~.!.0 x'
> 0.08
0. For a floating point vector, ~.!.0 is faster than ~., but not to the
extent that Bill Lam implied.
timer=: 6!:2
x=: 1e6 ?@$ 0
10 timer '~.x'
0.136236
10 timer '~.!.0 x'
0.0814054
Same comments apply to other functions in the index-of family.
1. For non-numeric arguments (and boxe
I should've mentioned that -.@-:"n also enjoys integrated rank support ("IRS").
-Dan
PS: A few more details on IRS are available in Roger's paper
http://www.jsoftware.com/papers/rank1.htm , and -.@-:"n is explicitly
enumerated among IRS-supported verbs in the general index to J's special code
I updated the code in the live session and it's working much better now.
Or at least, that part is.
I'm also getting interface errors from 2!:0 and I am having to work around
that issue also. :/ (This issue, I think, represents kernel memory
fragmentation - I guess linux is not tuned for processe
There is also integrated rank support (a specific category special code) for
dyad -:"n , especially when n=1 (ie matching rows of tables has been made
particularly efficient).
That said, it's probably worth doing a few performance tests on medium-sized
data sets to compare the performance of
I did some tests and I find NO improvement by switching to ~.!.0 for
nonnumeric rank > 1. In fact it is slower.
For numeric with rank > 1, ~.!.0 is a little faster.
Henry Rich
On 8/19/2014 6:38 PM, Raul Miller wrote:
I'd want to see some detailed reference on this issue (~.!.0 on non-numeric
I'd want to see some detailed reference on this issue (~.!.0 on non-numeric
arrays) before I'd want to blow another day or longer trying to reproduce
the problem with that change.
Alternatively, I'd want to get into the C implementation and find how this
could happen. That maybe should be done as
~.!.0 as I understand it uses a different algorithm from ~. even on
nonnumerics, and might be worth trying.
I am sure that ~.!.0 is much faster than ~. of floating-point arrays of
rank > 1. I think ~. is OK when the rank is 1.
Henry Rich
On 8/19/2014 2:11 PM, Raul Miller wrote:
Please incl
Please include the current time in the sequence of timestamps. The code was
still running at the point in time where I posted my email.
That said, at this point, my attempt to interrupt succeeded, and I have
found the line of code which was stalled:
data=. ~.data
And, here is what it looks like
I have tried to look at the time differences between your logged timestamps
to see what you mean by "stalling". What, if anything, in the pattern
produced below translates to "stalling"? Is it the time intervals in excess
of 100?
Sorry, btw, that I have no suggestions for debugging, but the ones o
Bill's ideas are much more likely relevant than the following, but sticking
with the "24 hour" bit, and even if i. or ~. is highly suspect, thoughts:
a .0001% branch would normally have a lot of variance even if the expected hit
rate is once every 24 hours. For regular near 24 hour period crash
~. (or i.) on floating numbers is _very_ slow because of []ct.
Perhaps you can make some timing measurement on ~. on boxed string
array similar to the size of that in your production.
is ~. an O(n^2) operation?
--
regards,
GPG key 1024D/4434B
I'm using ~. on boxed strings.
But I have used ~. on floating point numbers in the past (years ago), where
comparison tolerance was a good heuristic. So I guess I am not sure what
you are trying to tell me.
Thanks,
--
Raul
On Tue, Aug 19, 2014 at 10:35 AM, bill lam wrote:
> I hoped you wer
I hoped you were not doing ~. or i. on floating point numbers, the
comparison tolerance can beat you even on another faster machine.
On Aug 19, 2014 10:26 PM, "Raul Miller" wrote:
> I am doing intensive calculation.
>
> Here is what top says about this j process:
>
> PID USER PR NI VIRT
I am doing intensive calculation.
Here is what top says about this j process:
PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND
26894 ubuntu20 0 9846m 9.1g 2276 R 100.0 30.9 597:01.69 ijconsole
And actually... J has not been running for 597 hours. The machine has only
b
1. I will add some more logging diagnostics. Thanks. This will
hopefully help me find the line where it fails.
2. It has been failing well after midnight. Here is the last logged timestamp:
2014 8 19 6 33 49.8605
That's UTC and the machine is in Virginia.
I expect some variation between logged t
J seldom stalled for me unless memory full or doing intensive
computation. Your trace showed it run = i. or ~. that may take a
long time to completion. You may try setting a much lower value
for memory limit and execution time limit to force it break
sooner.
Вт, 19 авг 2014, Raul Miller написал(
I am using require 'task' which I believe uses 15!:0.
I believe I have switched to using 2!:0 for all my external process needs.
And, of course, the machine is hooked to the internet, which is
"unsafe" and relies on people being well behaved or something
approximating that.
So, yes, I am "doing
Some ideas, sorry in advance if they are unhelpful,
1. step 1 would be finding if it stalls deterministically from code. Perhaps
liberally sprinkling console/logfile output can help.
2. "almost 24 hours" - are there time comparisons going on? utc time?
something that might fail close to mid
Does it do anything 'unsafe' ? Dynamic memory allocation like memw or 15!:0
? I've had crashes, not stalls. The only stalls I've had are on windows due
to deadlocks. As far as I can remember, J is single threaded so a deadlock
seems unlikely unless you have logic that is waiting on some other reso
I have a J program that keeps stalling. My impression is that it has
been stalling in a random location, but I might be wrong about that.
(1) It takes almost 24 hours to get to the point where it stalls, so
so far my tests have been few.
(2) When it stalls, it ignores signals for attention, so de
The bugs in J64 com are unlikely to be fixed soon (at least months away).
You'll need to use a workaround.
Perhaps the simplest would be to use 3!:4 (3 ic 4) to convert the 64
bit ints to bytes, transfer those, and then do a cast on the c# side.
Another way would be to pass pointers but this wo
28 matches
Mail list logo