A few architectural questions: http://bit.ly/18vbbaT

I created a local instance of the LucidWorks Search on my dev machine. I can 
quite easily save the similarity vectors from the DRMs into docs at special 
locations and index them with LucidWorks. But to ingest the docs and put them 
in separate fields of the same index we need some new code (unless I've missed 
some Lucid config magic) that does the indexing and integrates with LucidWorks. 

I imagine two indexes. One index for the similarity matrix and optionally the 
cross-similairty matrix in two fields of type 'string'. Another index for 
users' history--we could put the docs there for retrieval by user ID. The user 
history docs then become the query on the similarity index and would return 
recommendations. Or any realtime collected or generated history could be used 
too.

Is this what you imagined Ted? Especially WRT Lucid integration?

Someone could probably donate their free tier EC2 instance and set this up 
pretty easily. Not sure if this would fit given free tier memory but maybe for 
small data sets.

To get this available for actual use we'd need:
1-- An instance with an IP address somewhere to run the ingestion and 
customized LucidWorks Search.
2-- Synthetic data created using Ted's tool.
3-- Customized Solr indexing code for integration with LucidWorks? Not sure how 
this is done. I can do the Solr part but have not looked into Lucid integration 
yet.
4-- Flesh out the rest of Ted's outline but 1-3 will give a minimally running 
example.

Assuming I've got this right, does someone want to help with these?

Another way to approach this is to create a stand alone codebase that requires 
Mahout and Solr and supplies an API something like the proposed Mahout SGD 
online recommender or Myrrix. This would be easier to consume but would lack 
all the UI and inspection code of LucidWorks. 




Reply via email to