On 8/9/24 21:57, Zara Parst wrote:
Actually, I have a small app called MassiveMark ,  where people insert
different text, Markdown(MathML, LaTex, Chemistry formula etc), codeblock,
images and normal text. Later on we figured out this is mainly used by
students and professors to create lecture notes, exam papers etc. I guess
they are mainly converting text from ChatGPT and downloading it as docx.
However few users requested if we can allow them to store it and later they
can fetch it. We were planning to also let them search in the document. I
have no clue how we are going to search in Organic chemistry, Compound are
mainly manipulated from smile code, which is manipulated during visual or
export to docx.

Same math formulae van be written in different ways, e.g. 'y = ax + b' is the same as 'z = i + j * k'. Same goes for SMILES notation (in many cases) that's used for small molecules. Long nucleic acid and protein chains are are whole 'nother story: they are written as strings of letters (not SMILES) and have specialized sequence matching algorithms that are the subject of Bioinformatics 101.

I.e. solr is the wrong tool for those jobs.

It'll work for text search, it's just that they'll likely get garbage out if they try searching for math or chemical formula.

Dima

Reply via email to