jpountz commented on a change in pull request #416:
URL: https://github.com/apache/lucene/pull/416#discussion_r782157204



##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.java
##########
@@ -54,13 +63,19 @@
  *   <li><b>[int32]</b> vector similarity function ordinal
  *   <li><b>[vlong]</b> offset to this field's vectors in the .vec file
  *   <li><b>[vlong]</b> length of this field's vectors, in bytes
- *   <li><b>[vlong]</b> offset to this field's index in the .vex file
+ *   <li><b>[vlong]</b> to this field's index in the .vex file

Review comment:
       why did you remove `offset`?

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsFormat.java
##########
@@ -35,14 +35,23 @@
  * <p>This file stores all the floating-point vector data ordered by field, 
document ordinal, and
  * vector dimension. The floats are stored in little-endian byte order.
  *
- * <h2>.vex (vector index) file</h2>
+ * <h2>.vex (vector index)</h2>
  *
- * <p>Stores graphs connecting the documents for each field. For each document 
having a vector for a
- * given field, this is stored as:
+ * <p>Stores graphs connecting the documents for each field organized as a 
list of nodes' neighbours
+ * as following:
  *
  * <ul>
- *   <li><b>[int32]</b> the number of neighbor nodes
- *   <li><b>array[vint]</b> the neighbor ordinals, delta-encoded (initially 
subtracting -1)
+ *   <li>For each level:
+ *       <ul>
+ *         <li>For each node:
+ *             <ul>
+ *               <li><b>[int32]</b> the number of neighbor nodes
+ *               <li><b>array[int32]</b> the neighbor ordinals
+ *               <li><b>array[int32]</b> padding from empty integers if the 
number of neigbors less

Review comment:
       ```suggestion
    *               <li><b>array[int32]</b> padding from empty integers if the 
number of neighbors less
   ```

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -464,30 +477,45 @@ private void readValue(int targetOrd) throws IOException {
   /** Read the nearest-neighbors graph from the index input */
   private static final class IndexedKnnGraphReader extends KnnGraphValues {
 
-    final FieldEntry entry;
     final IndexInput dataIn;
+    final int[][] nodesByLevel;
+    final long[] graphOffsetsByLevel;
+    final int numLevels;
+    final int entryNode;
+    final int size;
+    final long bytesForConns;
 
     int arcCount;
     int arcUpTo;
     int arc;
 
     IndexedKnnGraphReader(FieldEntry entry, IndexInput dataIn) {
-      this.entry = entry;
       this.dataIn = dataIn;
+      this.nodesByLevel = entry.nodesByLevel;
+      this.numLevels = entry.numLevels;
+      this.entryNode = numLevels > 1 ? nodesByLevel[numLevels - 1][0] : 0;
+      this.size = entry.size();
+      this.graphOffsetsByLevel = entry.graphOffsetsByLevel;
+      this.bytesForConns = ((long) entry.maxConn + 1) * 4;
     }
 
     @Override
-    public void seek(int targetOrd) throws IOException {
+    public void seek(int level, int targetOrd) throws IOException {
+      int targetIndex =
+          level == 0
+              ? targetOrd
+              : Arrays.binarySearch(nodesByLevel[level], 0, 
nodesByLevel[level].length, targetOrd);
+      long graphDataOffset = graphOffsetsByLevel[level] + targetIndex * 
bytesForConns;

Review comment:
       maybe check that `targetIndex` is actually positive since binarySearch 
is allowed to return negative results for missing values

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -464,30 +477,45 @@ private void readValue(int targetOrd) throws IOException {
   /** Read the nearest-neighbors graph from the index input */
   private static final class IndexedKnnGraphReader extends KnnGraphValues {
 
-    final FieldEntry entry;
     final IndexInput dataIn;
+    final int[][] nodesByLevel;
+    final long[] graphOffsetsByLevel;
+    final int numLevels;
+    final int entryNode;
+    final int size;
+    final long bytesForConns;
 
     int arcCount;
     int arcUpTo;
     int arc;
 
     IndexedKnnGraphReader(FieldEntry entry, IndexInput dataIn) {
-      this.entry = entry;
       this.dataIn = dataIn;
+      this.nodesByLevel = entry.nodesByLevel;
+      this.numLevels = entry.numLevels;
+      this.entryNode = numLevels > 1 ? nodesByLevel[numLevels - 1][0] : 0;
+      this.size = entry.size();
+      this.graphOffsetsByLevel = entry.graphOffsetsByLevel;
+      this.bytesForConns = ((long) entry.maxConn + 1) * 4;

Review comment:
       Maybe use `Integer.BYTES` rathen than `4`, it would make it clearer 
where this 4x factor is coming from?

##########
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorsReader.java
##########
@@ -147,7 +137,6 @@ private static IndexInput openDataInput(
               + versionVectorData,
           in);
     }
-    checksumRef[0] = CodecUtil.retrieveChecksum(in);

Review comment:
       Even if we don't do anything with the checksum, we should keep calling 
`retrieveChecksum`, which helps make sure that the codec footer has the correct 
structure. See this bit from `Lucene90CompressingStoredFieldsReader` for 
instance that does this:
   
   ```java
         // NOTE: data file is too costly to verify checksum against all the 
bytes on open,
         // but for now we at least verify proper structure of the checksum 
footer: which looks
         // for FOOTER_MAGIC + algorithmID. This is cheap and can detect some 
forms of corruption
         // such as file truncation.
         CodecUtil.retrieveChecksum(fieldsStream);
   ```

##########
File path: lucene/core/src/java/org/apache/lucene/index/KnnGraphValues.java
##########
@@ -32,25 +33,41 @@
   protected KnnGraphValues() {}
 
   /**
-   * Move the pointer to exactly {@code target}, the id of a node in the 
graph. After this method
+   * Move the pointer to exactly the given {@code level}'s {@code target}. 
After this method
    * returns, call {@link #nextNeighbor()} to return successive (ordered) 
connected node ordinals.
    *
-   * @param target must be a valid node in the graph, ie. &ge; 0 and &lt; 
{@link
+   * @param level level of the graph
+   * @param target ordinal of a node in the graph, must be &ge; 0 and &lt; 
{@link
    *     VectorValues#size()}.
    */
-  public abstract void seek(int target) throws IOException;
+  public abstract void seek(int level, int target) throws IOException;
 
   /** Returns the number of nodes in the graph */
   public abstract int size();
 
   /**
    * Iterates over the neighbor list. It is illegal to call this method after 
it returns
-   * NO_MORE_DOCS without calling {@link #seek(int)}, which resets the 
iterator.
+   * NO_MORE_DOCS without calling {@link #seek(int, int)}, which resets the 
iterator.
    *
    * @return a node ordinal in the graph, or NO_MORE_DOCS if the iteration is 
complete.
    */
   public abstract int nextNeighbor() throws IOException;
 
+  /** Returns the number of levels of the graph */
+  public abstract int numLevels() throws IOException;
+
+  /** Returns graph's entry point on the top level * */
+  public abstract int entryNode() throws IOException;
+
+  /**
+   * Get all nodes on a given level as node 0th ordinals
+   *
+   * @param level level for which to get all nodes
+   * @return an iterator over nodes where {@code nextDoc} returns a next node 
ordinal
+   */
+  // TODO: return a more suitable iterator over nodes than DocIdSetIterator

Review comment:
       Would `IntStream` be a good fit? Or `PrimitiveIterator.OfInt`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to