Hmm... poking at this with your original data, I do see an issue --
the groups aren't quite long enough.
Also, I should note that my earlier suggestion of ndx {each ORPnoid
does not work -- there, 'each' should be replaced by 'L:0'
Anyways, the result is still useful, but to get the grouping tha
Execution time is going to depend on how big the groups are.
That said, I *do* make mistakes at times, and it's good practice to
test and manually verify against a small sample.
Anyways, good luck,
--
Raul
On Mon, Sep 6, 2021 at 12:13 PM Pablo Landherr wrote:
>
> This appears to work! I'm ext
This appears to work! I'm extremely impressed. Truly awesome. Execution
time is unknown at this point as it's been running for a while without
completing yet. :-)
After all, there are literally billions of comparisons being done by i.
(Should have used a subset first to verify, but I was so keen
I should also note that gmail decided that part of the code was not
code but a part of the previous reply.
Here's a version which hopefully works around that issue:
cnGroupIndices=:{{
NB. x: ORPnoid
NB. y: ORPioid
assert. 1=>./x#/.y
valid=. y e. x
cns=. (1+x i.valid#y) (1+I.valid)} i.1+
Oh... I didn't think this through. Not all of these values are connected.
I should also verify an assumption: that connections occur when a
value in ORPnoid matches a value in ORPioid (and vice versa). If this
assumption is invalid, then this won't be a useful result.
Anyways, here's a fixed vers
Unfortunately, it fails. Given time I might realize the problem but I
suspect it's obvious to sharper minds than mine.
#x=: ORPnoid
31636439
#y=: ORPioid
31636439
1=>./x#/.y
1
#cns=. (1+x i.y) (1+i.#x)} i.1+#x NB. connections
31636440
r=: ( wrote:
> Ok..
>
> So we can use indices into ORP
Ok..
So we can use indices into ORPnoid to uniquely identify each relevant
connection, and then use Michal Wallace's algorithm. (This means that
in our connection list, we'll add 1 to each of those indexes, since
that's what we had working.)
In other words, I think that this should work:
cnGroup
Raul,
True to form, your approach might offer a way forward. To add some details
to the actual case:
#ORPnoid
31636439
#ORPioid
31636439
#ORPioid -. ORPnoid NB. some ioid do not connect to any other
563228
10{.ORPioid NB. the original data is hex in character form, but I convert
it to symbol
If you get 1 from either
>./ NID #/.OID
or
>/. OID #/. NID
then you can represent the connection matrix with a connection list,
which would be about the same size as your OID or NID list.
I hope this makes sense,
--
Raul
On Mon, Sep 6, 2021 at 3:53 AM Pablo Landherr wrote:
>
> Thank
Thank you for your clever solutions. Unfortunately I underestimated the
problem of the size of the data I'm processing. As nid and oid of my test
data each have a tally of 3.16e7, the connection matrix of my data contains
1e15 bits, which naturally gives me a limit error. I'll have to chew
through
Your approach is more compact than mine, which would be a significant
advantage for large collections.
Also, I'd discard the 0 value before grouping (since 0 here is an
artifact of the representation and not a part of the original data
set).
This gives:
OID=: 1 9 6 2 10 7 3 11 4
NID=: 2 10 7 3 1
Oh, Raul's version using +. (or) on the connection matrix is way nicer than
my version, and gets rid of my bug with directions. Do that instead. :)
On Fri, Sep 3, 2021 at 12:31 PM Raul Miller wrote:
> I should note that your example connection matrix does not seem to
> match the oid, nid values
You don't need both arrays to store this graph.
You can combine them into one array that maps from an index to the "next"
index.
[ g =: nid oid } i. 13
0 2 3 4 5 5 7 12 8 10 11 8 12
If an index points to itself, there's no next index.
Now you can take the "transitive closure" of this relation
I should note that your example connection matrix does not seem to
match the oid, nid values you displayed.
OID=: 1 9 6 2 10 7 3 11 4
NID=: 2 10 7 3 11 12 4 8 5
Here's the connection matrix I see represented:
]CM=: 1 (<:OID,.NID)} 0$~,~>./OID,NID
0 1 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
I want to group items that are linked to each other. I tried to use some
kind of connection matrix
nid =/ oid NB. an example
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
15 matches
Mail list logo