Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-03-13 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2721866606

   Merged! Thanks, everybody, for the help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-03-13 Thread via GitHub


alessandrobenedetti merged PR #3151:
URL: https://github.com/apache/solr/pull/3151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-03-11 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2714076350

   Ok, no updates, comments or help in the last three weeks, so at the end of 
the week, I'll proceed fixing the tests and merging. any help with the test 
clean up is still welcome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2672286942

   I've done another round of cleanup, documentation and improvements.
   
   There's still the annoying problem with the beforeClass/afterClass and tests 
leftovers.
   I'll be back in a couple of weeks, feel free to contribute any suggestions!
   
   I believe we are quite close to finalise this contribution, thanks for all 
the help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1964100880


##
solr/modules/llm/src/test/org/apache/solr/llm/textvectorisation/update/processor/TextToVectorUpdateProcessorFactoryTest.java:
##
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.textvectorisation.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.MultiMapSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.TestLlmBase;
+import org.apache.solr.request.SolrQueryRequestBase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+
+public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase {
+  private TextToVectorUpdateProcessorFactory factoryToTest =
+  new TextToVectorUpdateProcessorFactory();
+  private NamedList args = new NamedList<>();
+  
+  @BeforeClass
+  public static void initArgs() throws Exception {
+setupTest("solrconfig-llm.xml", "schema.xml", false, false);
+  }
+
+  @AfterClass
+  public static void after() throws Exception {
+afterTest();
+  }
+
+  @Test
+  public void init_fullArgs_shouldInitAllParams() {
+args.add("inputField", "_text_");
+args.add("outputField", "vector");
+args.add("model", "model1");
+factoryToTest.init(args);
+
+assertEquals("_text_", factoryToTest.getInputField());
+assertEquals("vector", factoryToTest.getOutputField());
+assertEquals("model1", factoryToTest.getModelName());
+  }
+
+  @Test
+  public void init_nullInputField_shouldThrowExceptionWithDetailedMessage() {
+args.add("outputField", "vector");
+args.add("model", "model1");
+
+SolrException e = assertThrows(SolrException.class, () -> 
factoryToTest.init(args));
+assertEquals("Missing required parameter: inputField", e.getMessage());
+  }
+
+  @Test
+  public void init_nullOutputField_shouldThrowExceptionWithDetailedMessage() {
+args.add("inputField", "_text_");
+args.add("model", "model1");
+
+SolrException e = assertThrows(SolrException.class, () -> 
factoryToTest.init(args));
+assertEquals("Missing required parameter: outputField", e.getMessage());
+  }
+  
+  @Test
+  public void init_nullModel_shouldThrowExceptionWithDetailedMessage() {
+args.add("inputField", "_text_");
+args.add("outputField", "vector");
+
+SolrException e = assertThrows(SolrException.class, () -> 
factoryToTest.init(args));
+assertEquals("Missing required parameter: model", e.getMessage());
+  }
+
+  /* Following tests depends on a real solr schema */
+  @Test
+  public void 
init_notExistentOutputField_shouldThrowExceptionWithDetailedMessage() throws 
Exception {
+args.add("inputField", "_text_");
+args.add("outputField", "notExistentOutput");
+args.add("model", "model1");
+
+Map params = new HashMap<>();
+MultiMapSolrParams mmparams = new MultiMapSolrParams(params);

Review Comment:
   Thanks! just pushed it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1964094479


##
solr/modules/llm/src/test/org/apache/solr/llm/textvectorisation/update/processor/TextToVectorUpdateProcessorFactoryTest.java:
##
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.textvectorisation.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.MultiMapSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.TestLlmBase;
+import org.apache.solr.request.SolrQueryRequestBase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+
+public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase {
+  private TextToVectorUpdateProcessorFactory factoryToTest =
+  new TextToVectorUpdateProcessorFactory();
+  private NamedList args = new NamedList<>();

Review Comment:
   Agreed and pushed the changes!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1964064479


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);
+List vectorAsList = new ArrayList(vector.length);
+for (float f : vector) {
+vectorAsList.add(f);
+}
+doc.addField(outputField, vectorAsList);
+}
+super.processAdd(cmd);
+}
+
+protected boolean isNullOrEmpty(SolrInputField inputFieldContent, 
SolrInputDocument doc, String fieldName) {

Review Comment:
   addressed, resolving this, feel free to open a new comment if necessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1964083376


##
solr/modules/llm/src/test/org/apache/solr/llm/textvectorisation/search/TextToVectorQParserTest.java:
##
@@ -29,6 +30,11 @@ public static void init() throws Exception {
 loadModel("dummy-model.json");
   }
 
+  @AfterClass
+  public static void cleanup() throws Exception {
+afterTest();
+  }

Review Comment:
   That's the part that is giving me headache and nasty leftovers in tests (as 
detailed in the initial comment).
   
   The reason for that was to index once before all the tests, do the tests and 
then do the cleanup at the end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-20 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1964081310


##
solr/modules/llm/src/java/org/apache/solr/llm/textvectorisation/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.textvectorisation.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.textvectorisation.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.textvectorisation.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+private String inputField;
+private String outputField;
+private String modelName;
+private SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+params = args.toSolrParams();
+RequiredSolrParams required = params.required();
+inputField = required.get(INPUT_FIELD_PARAM);
+outputField = required.get(OUTPUT_FIELD_PARAM);
+modelName = required.get(MODEL_NAME);
+}
+
+@Override
+public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
SolrQueryResponse rsp, UpdateRequestProcessor next) {
+IndexSchema latestSchema = req.getCore().getLatestSchema();
+if(!latestSchema.hasExplicitField(inputField)){
+throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
"undefined field: \"" + inputField + "\"");
+}
+if(!latestSchema.hasExplicitField(outputField)){
+throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
"undefined field: \"" + outputField + "\"");
+}
+   
+final SchemaField outputFieldSchema = 
latestSchema.getField(outputField);
+assertIsDenseVectorField(outputFieldSchema);
+
+ManagedTextToVectorModelStore modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+SolrTextToVectorModel textToVector = modelStore.getModel(modelName);

Review Comment:
   I have the same assumption, I debugged it a few times and the manage 
resource keeps track of models in a map so should work as expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-06 Thread via GitHub


dsmiley commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r194470


##
solr/modules/llm/src/java/org/apache/solr/llm/textvectorisation/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.textvectorisation.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.RequiredSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.textvectorisation.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.textvectorisation.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.IndexSchema;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+private String inputField;
+private String outputField;
+private String modelName;
+private SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+params = args.toSolrParams();
+RequiredSolrParams required = params.required();
+inputField = required.get(INPUT_FIELD_PARAM);
+outputField = required.get(OUTPUT_FIELD_PARAM);
+modelName = required.get(MODEL_NAME);
+}
+
+@Override
+public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
SolrQueryResponse rsp, UpdateRequestProcessor next) {
+IndexSchema latestSchema = req.getCore().getLatestSchema();
+if(!latestSchema.hasExplicitField(inputField)){
+throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
"undefined field: \"" + inputField + "\"");
+}
+if(!latestSchema.hasExplicitField(outputField)){
+throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
"undefined field: \"" + outputField + "\"");
+}
+   
+final SchemaField outputFieldSchema = 
latestSchema.getField(outputField);
+assertIsDenseVectorField(outputFieldSchema);
+
+ManagedTextToVectorModelStore modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+SolrTextToVectorModel textToVector = modelStore.getModel(modelName);

Review Comment:
   I presume looking up the model is fast/cached if it already exists?



##
solr/modules/llm/src/test/org/apache/solr/llm/textvectorisation/search/TextToVectorQParserTest.java:
##
@@ -29,6 +30,11 @@ public static void init() throws Exception {
 loadModel("dummy-model.json");
   }
 
+  @AfterClass
+  public static void cleanup() throws Exception {
+afterTest();
+  }

Review Comment:
   weird; why?  Weird to see a test cleanup also be a suite cleanup.
   (same for TestModelManager)



##
solr/modules/llm/src/test/org/apache/solr/llm/textvectorisation/update/processor/TextToVectorUpdateProcessorFactoryTest.java:
##
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed unde

Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-06 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2639773032

   Added one iteration of polishing, should have addressed @cpoerschke concerns 
on vectorisation failures (I took inspiration from the lang detect update 
processor).
   
   I also removed the additional test solr config files addressing @dsmiley 
concerns.
   
   I'm still puzzled by the testing errors I get (the before/after problems I 
mentioned in the first comment), any help there would be beneficial.
   
   I think we made progress, I;ll wait some other iterations and then work on 
the documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943637205


##
solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorTest.java:
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.client.solrj.SolrQuery;
+import org.apache.solr.llm.TestLlmBase;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+
+public class TextToVectorUpdateProcessorTest extends TestLlmBase {
+
+@BeforeClass
+public static void init() throws Exception {
+setupTest("solrconfig-llm-indexing.xml", "schema.xml", false, false);
+
+}
+
+@Test
+public void processAdd_inputField_shouldVectoriseInputField()
+throws Exception {
+loadModel("dummy-model.json");
+assertU(adoc("id", "99", "_text_", "Vegeta is the saiyan prince."));
+assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince."));
+assertU(commit());
+
+final String solrQuery = "*:*";
+final SolrQuery query = new SolrQuery();
+query.setQuery(solrQuery);
+query.add("fl", "id,vector");
+
+assertJQ(
+"/query" + query.toQueryString(),
+"/response/numFound==2]",
+"/response/docs/[0]/id=='99'",
+"/response/docs/[0]/vector==[1.0, 2.0, 3.0, 4.0]",
+"/response/docs/[1]/id=='98'",
+"/response/docs/[1]/vector==[1.0, 2.0, 3.0, 4.0]");
+
+restTestHarness.delete(ManagedTextToVectorModelStore.REST_END_POINT + 
"/dummy-1");
+}
+
+/*
+This test looks for the 'dummy-1' model, but such model is not loaded, the 
model store is empty, so the update fails
+ */
+@Test
+public void processAdd_modelNotFound_shouldRaiseException() {
+assertFailedU("This update should fail but actually succeeded", 
adoc("id", "99", "_text_", "Vegeta is the saiyan prince."));
+
+checkUpdateU(adoc("id", "99", "_text_", "Vegeta is the saiyan 
prince."),
+"/response/lst[@name='error']/str[@name='msg']=\"The model 
requested 'dummy-1' can't be found in the store: 
/schema/text-to-vector-model-store\"",
+"/response/lst[@name='error']/int[@name='code']='400'");
+}
+
+@Test
+public void processAdd_emptyInputField_shouldLogAndIndexWithNoVector() 
throws Exception {
+loadModel("dummy-model.json");
+assertU(adoc("id", "99", "_text_", ""));
+assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince."));
+assertU(commit());
+
+final String solrQuery = "*:*";
+final SolrQuery query = new SolrQuery();
+query.setQuery(solrQuery);
+query.add("fl", "id,vector");
+
+assertJQ(
+"/query" + query.toQueryString(),
+"/response/numFound==2]",
+"/response/docs/[0]/id=='99'",
+"!/response/docs/[0]/vector==", //no vector field for the 
document 99

Review Comment:
   it took an afternoon almost to find that, it deserved a comment :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943636345


##
solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorTest.java:
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.client.solrj.SolrQuery;
+import org.apache.solr.llm.TestLlmBase;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+
+public class TextToVectorUpdateProcessorTest extends TestLlmBase {
+
+@BeforeClass
+public static void init() throws Exception {
+setupTest("solrconfig-llm-indexing.xml", "schema.xml", false, false);
+
+}
+
+@Test
+public void processAdd_inputField_shouldVectoriseInputField()
+throws Exception {
+loadModel("dummy-model.json");
+assertU(adoc("id", "99", "_text_", "Vegeta is the saiyan prince."));
+assertU(adoc("id", "98", "_text_", "Vegeta is the saiyan prince."));
+assertU(commit());
+
+final String solrQuery = "*:*";
+final SolrQuery query = new SolrQuery();
+query.setQuery(solrQuery);
+query.add("fl", "id,vector");
+
+assertJQ(
+"/query" + query.toQueryString(),
+"/response/numFound==2]",
+"/response/docs/[0]/id=='99'",
+"/response/docs/[0]/vector==[1.0, 2.0, 3.0, 4.0]",
+"/response/docs/[1]/id=='98'",
+"/response/docs/[1]/vector==[1.0, 2.0, 3.0, 4.0]");
+
+restTestHarness.delete(ManagedTextToVectorModelStore.REST_END_POINT + 
"/dummy-1");

Review Comment:
   it was cleanup, but not all tests need it, so I added it explicitly, added a 
line comment to make it clearer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943633074


##
solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactoryTest.java:
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.MultiMapSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.TestLlmBase;
+import org.apache.solr.request.SolrQueryRequestBase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+
+public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase {
+  private TextToVectorUpdateProcessorFactory factoryToTest =
+  new TextToVectorUpdateProcessorFactory();
+  private NamedList args = new NamedList<>();
+  
+  @BeforeClass
+  public static void initArgs() throws Exception {
+setupTest("solrconfig-llm.xml", "schema.xml", false, false);
+  }
+
+  @AfterClass
+  public static void after() throws Exception {
+afterTest();
+  }
+
+  @Test
+  public void init_fullArgs_shouldInitFullClassificationParams() {
+args.add("inputField", "_text_");
+args.add("outputField", "vector");
+args.add("model", "model1");
+factoryToTest.init(args);
+
+assertEquals("_text_", factoryToTest.getInputField());
+assertEquals("vector", factoryToTest.getOutputField());
+assertEquals("model1", factoryToTest.getModelName());
+  }
+
+  @Test
+  public void init_nullInputField_shouldThrowExceptionWithDetailedMessage() {
+args.add("outputField", "vector");
+args.add("model", "model1");
+
+SolrException e = assertThrows(SolrException.class, () -> 
factoryToTest.init(args));
+assertEquals("Text to Vector UpdateProcessor 'inputField' can not be 
null", e.getMessage());
+  }
+
+  @Test
+  public void 
init_notExistentInputField_shouldThrowExceptionWithDetailedMessage() throws 
Exception {
+args.add("inputField", "notExistentInput");
+args.add("outputField", "vector");
+args.add("model", "model1");
+
+Map params = new HashMap<>();
+MultiMapSolrParams mmparams = new MultiMapSolrParams(params);
+SolrQueryRequestBase req = new 
SolrQueryRequestBase(solrClientTestRule.getCoreContainer().getCore("collection1"),
 (SolrParams) mmparams) {};

Review Comment:
   I admit I don't know, I'm not that java savvy, I suspect it has to do with 
instatiating a subclass of an abstract class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943628396


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);
+List vectorAsList = new ArrayList(vector.length);
+for (float f : vector) {
+vectorAsList.add(f);
+}
+doc.addField(outputField, vectorAsList);
+}
+super.processAdd(cmd);
+}
+
+protected boolean isNullOrEmpty(SolrInputField inputFieldContent, 
SolrInputDocument doc, String fieldName) {

Review Comment:
   mmm I see your point, better if we just log a warning say "vectorisation 
failed", with the reason "null or empty source field" ?
   
   
   I suspect that silent failure would be equally problematic to understand why 
there are no vectors? (I also just discovered that the 'vectorise' method could 
throw runtime exception)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943616272


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;

Review Comment:
   I agree, texttovector is horribly unreadable, maybe 'textvectorisation' ? 
adding it in the coming commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943614530


##
solr/modules/llm/src/test-files/solr/collection1/conf/solrconfig-llm-indexing-notDenseVectorField.xml:
##


Review Comment:
   I could, but the reason I added it  is that I struggled to find testing 
methods such as org.apache.solr.util.RestTestBase#assertU(java.lang.String) 
that takes the chain as a parameter.
   So I added as the default and I could test it.
   
   I would not want to be the default when indexing docs for the query time 
test.
   If you have any suggestion I'm open to changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943608528


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+if (args != null) {
+params = args.toSolrParams();
+inputField = params.get(INPUT_FIELD_PARAM);
+checkNotNull(INPUT_FIELD_PARAM, inputField);
+
+outputField = params.get(OUTPUT_FIELD_PARAM);
+checkNotNull(OUTPUT_FIELD_PARAM, outputField);
+
+modelName = params.get(MODEL_NAME);
+checkNotNull(MODEL_NAME, modelName);
+}
+}
+
+private void checkNotNull(String paramName, Object param) {
+if (param == null) {
+throw new SolrException(
+SolrException.ErrorCode.SERVER_ERROR,
+"Text to Vector UpdateProcessor '" + paramName + "' can 
not be null");
+}
+}
+
+@Override
+public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
SolrQueryResponse rsp, UpdateRequestProcessor next) {
+req.getCore().getLatestSchema().getField(inputField);

Review Comment:
   it checks that 'inputField' is defined in the schema.
   With the latest commit I changed it to make it more explicit but I am open 
to suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943595213


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+if (args != null) {
+params = args.toSolrParams();
+inputField = params.get(INPUT_FIELD_PARAM);
+checkNotNull(INPUT_FIELD_PARAM, inputField);
+
+outputField = params.get(OUTPUT_FIELD_PARAM);
+checkNotNull(OUTPUT_FIELD_PARAM, outputField);
+
+modelName = params.get(MODEL_NAME);
+checkNotNull(MODEL_NAME, modelName);

Review Comment:
   much cleaner, thanks David!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943577028


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+if (args != null) {

Review Comment:
   my bad, I took inspiration from an old factory, I'll remove this useless 
check in the next commit!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);

Review Comment:
   1) @cpoerschke : I double checked and the langchain4j library 'embed' method 
(that's used in our 'vectorise' method) returns a RuntimeException .
   That's bad as it was not detected without investigating the internals of the 
code (I hate these practices).
   I'll give it a thought, any suggestion is welcome!
   
   2) @epugh : given that 'update.chain' is a parameter, if you configure a 
chain with no vector enrichment and a chain with vector enrichment, what 
prevents you from first index using the 'no vectors' chain and then slowly 
updating the index with atomic updates that add vectors (using the 
vector-chain)? We should double check and add to the documentation once we 
consolidate the code, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2637867724

   > I wanted to follow-up on my feedback to the LLM module concerning use of 
the word "embedding". I first tried to say that I was not familiar with the 
word, and your response was to remove it (completely?) from the module. If 
"embedding" is an appropriate word then use it. The documentation should 
reference it in the ref guide, even if just an "AKA".
   
   Embedding is widely used in the field, but it's a bit ambiguous and to be 
honest, I'm with you in not using any term that can cause confusion. Do you 
mean I added embedding in here somewhere? If that's the case, It's a mistake, 
point it to me and I'll remove it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);

Review Comment:
   1) @cpoerschke : I double checked and the langchain4j library 'embed' method 
(that's used in our 'vecctorise' method), doesn't return any exception, but I 
gree we should investigate what happens if that request fails (my best guess is 
we get an empty vector or null, I'll add that to tests)
   
   2) @epugh : given that 'update.chain' is a parameter, if you configure a 
chain with no vector enrichment and a chain with vector enrichment, what 
prevents you from first index using the 'no vectors' chain and then slowly 
updating the index with atomic updates that add vectors (using the 
vector-chain)? We should double check and add to the documentation once we 
consolidate the code, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943565203


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);

Review Comment:
   1) @cpoerschke : I double checked and the langchain4j library 'embed' method 
(that's used in our 'vectorise' method), doesn't return any exception, but I 
gree we should investigate what happens if that request fails (my best guess is 
we get an empty vector or null, I'll add that to tests)
   
   2) @epugh : given that 'update.chain' is a parameter, if you configure a 
chain with no vector enrichment and a chain with vector enrichment, what 
prevents you from first index using the 'no vectors' chain and then slowly 
updating the index with atomic updates that add vectors (using the 
vector-chain)? We should double check and add to the documentation once we 
consolidate the code, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943553277


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);

Review Comment:
   I was debugging the flow to have a better understanding of the lifecycle of 
an update request processor.
   
   From what I see from the test, the factory instantiates a new update request 
processor every time a new update request is received.
   I think it's ok to keep it a class member, but let me see if I can move the 
instantiation to the factory.
   Ideally I wanted that to happen when the factory is initiate but It seems 
that the update request processor factory is not compatible with resource 
loading (as far as I debugged and checked)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943488899


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


dsmiley commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943539435


##
solr/test-framework/src/java/org/apache/solr/util/RestTestBase.java:
##
@@ -88,13 +88,33 @@ private static void checkUpdateU(String message, String 
update, boolean shouldSu
 if (response != null) fail(m + "update was not successful: " + 
response);
   } else {
 String response = restTestHarness.validateErrorUpdate(update);
-if (response != null) fail(m + "update succeeded, but should have 
failed: " + response);
+if (response == null) fail(m + "update succeeded, but should have 
failed: " + response);
   }
 } catch (SAXException e) {
   throw new RuntimeException("Invalid XML", e);
 }
   }
 
+  public static void checkUpdateU(String update, String... tests) {

Review Comment:
   At the moment, RestTestBase is common to basically any test using a 
"REST-based model store"; which the LLM stuff recently added a new variant of 
and hence RestTestBase is used.  RestTestBase is used a lot.  Preferrably we 
wouldn't depend too much on our class hierarchy to accomplish re-usable things. 
 But there's no realistic action to take right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


dsmiley commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943529534


##
solr/modules/llm/src/test/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactoryTest.java:
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.MultiMapSolrParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.llm.TestLlmBase;
+import org.apache.solr.request.SolrQueryRequestBase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+
+public class TextToVectorUpdateProcessorFactoryTest extends TestLlmBase {
+  private TextToVectorUpdateProcessorFactory factoryToTest =
+  new TextToVectorUpdateProcessorFactory();
+  private NamedList args = new NamedList<>();
+  
+  @BeforeClass
+  public static void initArgs() throws Exception {
+setupTest("solrconfig-llm.xml", "schema.xml", false, false);
+  }
+
+  @AfterClass
+  public static void after() throws Exception {
+afterTest();
+  }
+
+  @Test
+  public void init_fullArgs_shouldInitFullClassificationParams() {
+args.add("inputField", "_text_");
+args.add("outputField", "vector");
+args.add("model", "model1");
+factoryToTest.init(args);
+
+assertEquals("_text_", factoryToTest.getInputField());
+assertEquals("vector", factoryToTest.getOutputField());
+assertEquals("model1", factoryToTest.getModelName());
+  }
+
+  @Test
+  public void init_nullInputField_shouldThrowExceptionWithDetailedMessage() {
+args.add("outputField", "vector");
+args.add("model", "model1");
+
+SolrException e = assertThrows(SolrException.class, () -> 
factoryToTest.init(args));
+assertEquals("Text to Vector UpdateProcessor 'inputField' can not be 
null", e.getMessage());
+  }
+
+  @Test
+  public void 
init_notExistentInputField_shouldThrowExceptionWithDetailedMessage() throws 
Exception {
+args.add("inputField", "notExistentInput");
+args.add("outputField", "vector");
+args.add("model", "model1");
+
+Map params = new HashMap<>();
+MultiMapSolrParams mmparams = new MultiMapSolrParams(params);
+SolrQueryRequestBase req = new 
SolrQueryRequestBase(solrClientTestRule.getCoreContainer().getCore("collection1"),
 (SolrParams) mmparams) {};

Review Comment:
   It's an anonymous inner class.  What's probably throwing you off is that 
there are no method overrides, which 99% of the time is the point of doing an 
anonymous inner class.  Here it's because SQRB is abstract so he's forced to 
subclass it in order to use it.  I've been thinking of this case recently and I 
think we should simply make that impl not abstract.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


alessandrobenedetti commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943491138


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;

Review Comment:
   Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-05 Thread via GitHub


epugh commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1943278670


##
solr/test-framework/src/java/org/apache/solr/util/RestTestBase.java:
##
@@ -88,13 +88,33 @@ private static void checkUpdateU(String message, String 
update, boolean shouldSu
 if (response != null) fail(m + "update was not successful: " + 
response);
   } else {
 String response = restTestHarness.validateErrorUpdate(update);
-if (response != null) fail(m + "update succeeded, but should have 
failed: " + response);
+if (response == null) fail(m + "update succeeded, but should have 
failed: " + response);
   }
 } catch (SAXException e) {
   throw new RuntimeException("Invalid XML", e);
 }
   }
 
+  public static void checkUpdateU(String update, String... tests) {

Review Comment:
   Not specific per se to this, but I wish we had a clearer plan about the 
future of RestTestBase.  ARe we embracing it?



##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);

Review Comment:
   I was chatting with @iamsanjay this morning, and I was expounding on the 
thought that a lot of folks might want to first index the document with just 
the core text/string/numbers, and then, since enrichment is SLOW, come back 
with a streaming expression and do things like vectorization, and an atomic 
update..  that way you pump your data in as fast as possible, and then enrich 
at your leisure...This model cons

Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-02-04 Thread via GitHub


dsmiley commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1942178207


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+if (args != null) {

Review Comment:
   args should never be null



##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;
+
+
+@Override
+public void init(final NamedList args) {
+if (args != null) {
+params = args.toSolrParams();
+inputField = params.get(INPUT_FIELD_PARAM);
+checkNotNull(INPUT_FIELD_PARAM, inputField);
+
+outputField = params.get(OUTPUT_FIELD_PARAM);
+checkNotNull(OUTPUT_FIELD_PARAM, outputField);
+
+modelName = params.get(MODEL_NAME);
+checkNotNull(MODEL_NAME, modelName);
+}
+}
+
+private void checkNotNull(String paramName, Object param) {
+if (param == null) {
+throw new SolrException(
+SolrException.Er

Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


cpoerschke commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1937732098


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);
+if (textToVector == null) {
+throw new SolrException(
+SolrException.ErrorCode.BAD_REQUEST,
+"The model requested '"
++ model
++ "' can't be found in the store: "
++ ManagedTextToVectorModelStore.REST_END_POINT);
+}
+
+SolrInputDocument doc = cmd.getSolrInputDocument();
+SolrInputField inputFieldContent = doc.get(inputField);
+if (!isNullOrEmpty(inputFieldContent, doc, inputField)) {
+String textToVectorise = 
inputFieldContent.getValue().toString();//add null checks and
+float[] vector = textToVector.vectorise(textToVectorise);

Review Comment:
   So text-to-vector in the search case is ephemeral i.e. in case of errors or 
timeouts the user gets an error or exception back and they may or may not 
choose to retry.
   
   For text-to-vector in the update case, might some users prefer to index only 
documents with (all the) vectors and others would rather have the document 
indexed even if some vectors are missing? (I assume but don't know for sure 
that the `vectorise` call might throw an exception in certain circumstances and 
if it's not caught that might fail the indexing for the entire document.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


cpoerschke commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1937657567


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);

Review Comment:
   Wondering re: `textToVector` as class member vs. local variable.
   
   edit: if `model` is configured i.e. always the same it could be a class 
member, perhaps, but if perhaps `model` was a field in the `cmd` document then 
it would need to be a local variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


cpoerschke commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1937653570


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;

Review Comment:
   ```suggestion
   private final ManagedTextToVectorModelStore modelStore;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


cpoerschke commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1937657567


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessor.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.llm.texttovector.model.SolrTextToVectorModel;
+import 
org.apache.solr.llm.texttovector.store.rest.ManagedTextToVectorModelStore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.List;
+
+
+class TextToVectorUpdateProcessor extends UpdateRequestProcessor {
+private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+private final String inputField;
+private final String outputField;
+private final String model;
+private SolrTextToVectorModel textToVector;
+private ManagedTextToVectorModelStore modelStore = null;
+
+public TextToVectorUpdateProcessor(
+String inputField,
+String outputField,
+String model,
+SolrQueryRequest req,
+UpdateRequestProcessor next) {
+super(next);
+this.inputField = inputField;
+this.outputField = outputField;
+this.model = model;
+this.modelStore = 
ManagedTextToVectorModelStore.getManagedModelStore(req.getCore());
+}
+
+/**
+ * @param cmd the update command in input containing the Document to 
process
+ * @throws IOException If there is a low-level I/O error
+ */
+@Override
+public void processAdd(AddUpdateCommand cmd) throws IOException {
+this.textToVector = modelStore.getModel(model);

Review Comment:
   Wondering re: `textToVector` as class member vs. local variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


cpoerschke commented on code in PR #3151:
URL: https://github.com/apache/solr/pull/3151#discussion_r1937651836


##
solr/modules/llm/src/java/org/apache/solr/llm/texttovector/update/processor/TextToVectorUpdateProcessorFactory.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.llm.texttovector.update.processor;
+
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.DenseVectorField;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.schema.SchemaField;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.apache.solr.update.processor.UpdateRequestProcessorFactory;
+
+/**
+ * This class implements an UpdateProcessorFactory for the Text To Vector 
Update Processor.
+ */
+public class TextToVectorUpdateProcessorFactory extends 
UpdateRequestProcessorFactory {
+private static final String INPUT_FIELD_PARAM = "inputField";
+private static final String OUTPUT_FIELD_PARAM = "outputField";
+private static final String MODEL_NAME = "model";
+
+String inputField;
+String outputField;
+String modelName;
+SolrParams params;

Review Comment:
   ```suggestion
   private String inputField;
   private String outputField;
   private String modelName;
   private SolrParams params;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17632: Text to Vector Update Request Processor [solr]

2025-01-31 Thread via GitHub


alessandrobenedetti commented on PR #3151:
URL: https://github.com/apache/solr/pull/3151#issuecomment-2627858272

   Let's give it a first round of discussion/brainstorming/improvements.
   
   Then I'll adjust the gradle check, documentation and changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org