GitHub user ep9io edited a comment on the discussion: Further LLM Support
@usbrandon Only to extend on the cosine similarity, that could fit into the
existing HOP's calculator step? It already has other distance measuring
calculations in there such as hamming and levenshtein. Cosine similarity could
be added to that list, might not need to be created in another plugin.
Below is a pseudo-code of the function. Of course, it might not perform as
fast as a database that has it better implemented or in C++/Rust.
```
FUNCTION cosine_similarity(vector1, vector2):
dot_product = 0
magnitude1 = 0
magnitude2 = 0
FOR i FROM 0 TO LENGTH(vector1) - 1:
dot_product += vector1[i] * vector2[i]
magnitude1 += vector1[i]^2
magnitude2 += vector2[i]^2
magnitude1 = SQRT(magnitude1)
magnitude2 = SQRT(magnitude2)
IF magnitude1 == 0 OR magnitude2 == 0:
RETURN 0 # Avoid division by zero
RETURN dot_product / (magnitude1 * magnitude2)
```
The similarity collection would then look like this (see below). Not sure
where to fit this into HOP existing features. The whole lot of it can be done
within the javascript/groovy step, but not ideal.
```
# Store similarities
similarities = []
FOR data_point IN data_points:
similarity = cosine_similarity(query, data_point)
ADD similarity TO similarities
# Find the most similar
max_similarity = -1
most_similar_index = -1
FOR i FROM 0 TO LENGTH(similarities) - 1:
IF similarities[i] > max_similarity:
max_similarity = similarities[i]
most_similar_index = i
```
Perhaps java's Vector API (if enabled) could make this into something more
practical within HOP and thus relying less on external software.
GitHub link:
https://github.com/apache/hop/discussions/4732#discussioncomment-11719002
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]