A few architectural questions: http://bit.ly/18vbbaT
I created a local instance of the LucidWorks Search on my dev machine. I can quite easily save the similarity vectors from the DRMs into docs at special locations and index them with LucidWorks. But to ingest the docs and put them in separate fields of the same index we need some new code (unless I've missed some Lucid config magic) that does the indexing and integrates with LucidWorks. I imagine two indexes. One index for the similarity matrix and optionally the cross-similairty matrix in two fields of type 'string'. Another index for users' history--we could put the docs there for retrieval by user ID. The user history docs then become the query on the similarity index and would return recommendations. Or any realtime collected or generated history could be used too. Is this what you imagined Ted? Especially WRT Lucid integration? Someone could probably donate their free tier EC2 instance and set this up pretty easily. Not sure if this would fit given free tier memory but maybe for small data sets. To get this available for actual use we'd need: 1-- An instance with an IP address somewhere to run the ingestion and customized LucidWorks Search. 2-- Synthetic data created using Ted's tool. 3-- Customized Solr indexing code for integration with LucidWorks? Not sure how this is done. I can do the Solr part but have not looked into Lucid integration yet. 4-- Flesh out the rest of Ted's outline but 1-3 will give a minimally running example. Assuming I've got this right, does someone want to help with these? Another way to approach this is to create a stand alone codebase that requires Mahout and Solr and supplies an API something like the proposed Mahout SGD online recommender or Myrrix. This would be easier to consume but would lack all the UI and inspection code of LucidWorks.