Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
Hmm... poking at this with your original data, I do see an issue -- the groups aren't quite long enough. Also, I should note that my earlier suggestion of ndx {each ORPnoid does not work -- there, 'each' should be replaced by 'L:0' Anyways, the result is still useful, but to get the grouping tha

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
Execution time is going to depend on how big the groups are. That said, I *do* make mistakes at times, and it's good practice to test and manually verify against a small sample. Anyways, good luck, -- Raul On Mon, Sep 6, 2021 at 12:13 PM Pablo Landherr wrote: > > This appears to work! I'm ext

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Pablo Landherr
This appears to work! I'm extremely impressed. Truly awesome. Execution time is unknown at this point as it's been running for a while without completing yet. :-) After all, there are literally billions of comparisons being done by i. (Should have used a subset first to verify, but I was so keen

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
I should also note that gmail decided that part of the code was not code but a part of the previous reply. Here's a version which hopefully works around that issue: cnGroupIndices=:{{ NB. x: ORPnoid NB. y: ORPioid assert. 1=>./x#/.y valid=. y e. x cns=. (1+x i.valid#y) (1+I.valid)} i.1+

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
Oh... I didn't think this through. Not all of these values are connected. I should also verify an assumption: that connections occur when a value in ORPnoid matches a value in ORPioid (and vice versa). If this assumption is invalid, then this won't be a useful result. Anyways, here's a fixed vers

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Pablo Landherr
Unfortunately, it fails. Given time I might realize the problem but I suspect it's obvious to sharper minds than mine. #x=: ORPnoid 31636439 #y=: ORPioid 31636439 1=>./x#/.y 1 #cns=. (1+x i.y) (1+i.#x)} i.1+#x NB. connections 31636440 r=: ( wrote: > Ok.. > > So we can use indices into ORP

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
Ok.. So we can use indices into ORPnoid to uniquely identify each relevant connection, and then use Michal Wallace's algorithm. (This means that in our connection list, we'll add 1 to each of those indexes, since that's what we had working.) In other words, I think that this should work: cnGroup

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Pablo Landherr
Raul, True to form, your approach might offer a way forward. To add some details to the actual case: #ORPnoid 31636439 #ORPioid 31636439 #ORPioid -. ORPnoid NB. some ioid do not connect to any other 563228 10{.ORPioid NB. the original data is hex in character form, but I convert it to symbol

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Raul Miller
If you get 1 from either >./ NID #/.OID or >/. OID #/. NID then you can represent the connection matrix with a connection list, which would be about the same size as your OID or NID list. I hope this makes sense, -- Raul On Mon, Sep 6, 2021 at 3:53 AM Pablo Landherr wrote: > > Thank

Re: [Jprogramming] Connecting the connections

2021-09-06 Thread Pablo Landherr
Thank you for your clever solutions. Unfortunately I underestimated the problem of the size of the data I'm processing. As nid and oid of my test data each have a tally of 3.16e7, the connection matrix of my data contains 1e15 bits, which naturally gives me a limit error. I'll have to chew through

Re: [Jprogramming] Connecting the connections

2021-09-03 Thread Raul Miller
Your approach is more compact than mine, which would be a significant advantage for large collections. Also, I'd discard the 0 value before grouping (since 0 here is an artifact of the representation and not a part of the original data set). This gives: OID=: 1 9 6 2 10 7 3 11 4 NID=: 2 10 7 3 1

Re: [Jprogramming] Connecting the connections

2021-09-03 Thread Michal Wallace
Oh, Raul's version using +. (or) on the connection matrix is way nicer than my version, and gets rid of my bug with directions. Do that instead. :) On Fri, Sep 3, 2021 at 12:31 PM Raul Miller wrote: > I should note that your example connection matrix does not seem to > match the oid, nid values

Re: [Jprogramming] Connecting the connections

2021-09-03 Thread Michal Wallace
You don't need both arrays to store this graph. You can combine them into one array that maps from an index to the "next" index. [ g =: nid oid } i. 13 0 2 3 4 5 5 7 12 8 10 11 8 12 If an index points to itself, there's no next index. Now you can take the "transitive closure" of this relation

Re: [Jprogramming] Connecting the connections

2021-09-03 Thread Raul Miller
I should note that your example connection matrix does not seem to match the oid, nid values you displayed. OID=: 1 9 6 2 10 7 3 11 4 NID=: 2 10 7 3 11 12 4 8 5 Here's the connection matrix I see represented: ]CM=: 1 (<:OID,.NID)} 0$~,~>./OID,NID 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

[Jprogramming] Connecting the connections

2021-09-03 Thread Pablo Landherr
I want to group items that are linked to each other. I tried to use some kind of connection matrix nid =/ oid NB. an example 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0