GitHub user ep9io added a comment to the discussion: Further LLM Support

@usbrandon Only to extend on the cosine similarity, that could fit into the 
existing HOP's calculator step?  It already has other distance measuring 
calculations in there such as hamming and levenshtein.  Cosine similarity could 
be added to that list, might not need to be created in another plugin.  

Below is a pseudo-code of the function.  Of course, it might not perform as 
fast as a database that has it better implemented or in C++/Rust.  

```
FUNCTION cosine_similarity(vector1, vector2):
    dot_product = 0
    magnitude1 = 0
    magnitude2 = 0
    
    FOR i FROM 0 TO LENGTH(vector1) - 1:
        dot_product += vector1[i] * vector2[i]
        magnitude1 += vector1[i]^2
        magnitude2 += vector2[i]^2
    
    magnitude1 = SQRT(magnitude1)
    magnitude2 = SQRT(magnitude2)
    
    IF magnitude1 == 0 OR magnitude2 == 0:
        RETURN 0  # Avoid division by zero
    
    RETURN dot_product / (magnitude1 * magnitude2)
```

The similarity collection would then look like this (see below).  Not sure 
where to fit this into HOP existing features.  The whole lot of it can be done 
within the javascript/groovy step, but not ideal.

```
# Store similarities
similarities = []

FOR data_point IN data_points:
    similarity = cosine_similarity(query, data_point)
    ADD similarity TO similarities

# Find the most similar
max_similarity = -1
most_similar_index = -1

FOR i FROM 0 TO LENGTH(similarities) - 1:
    IF similarities[i] > max_similarity:
        max_similarity = similarities[i]
        most_similar_index = i
```



GitHub link: 
https://github.com/apache/hop/discussions/4732#discussioncomment-11719002

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to