Hi,

let's say I have an index that contains a field of type BinaryField
called "fingerprint" that stores a few (let's say 100) bytes that are
some kind of digital fingerprint-like thing.

Let's say I want to perform queries on that field to achieve sorting
or filtering based on a kind of custom distance function
"customDistance", i.e. I input a reference "fingerprint" and Solr
returns either all documents sorted by
customDistance(<referenceFingerprint>,<documentFingerprint>) or use
that in an frange expression for filtering.

I have read http://wiki.apache.org/solr/SolrPerformanceFactors and I
do understand that using function queries with a custom function is
definitely an expensive thing as it will result in what is called a
"full table scan" in the sql world, i.e. data from all documents needs
to be touched to select the correct documents or sort by a function's
result.

Given all that and provided, I have to use a custom function for my
needs, I would like to know a few more details about solr architecture
to understand what I have to look out for.

I will have potentially millions of records. Does the data contained
in other index fields play a role when I only use the "fingerprint"
field for sorting and searching when it comes to RAM usage? I am
hoping to calculate that my RAM should be able to accommodate the
fingerprint data of all available documents for the queries to be fast
but not fingerprint data and all other indexed or stored data.

Example: My fingerprint data needs 100bytes per document, my other
indexed field data needs 900 bytes per document. Will I need 100MB or
1GB to fit all data that is needed to process one query in memory?

Are there other things to be aware of?

Thanks,

Robert

Reply via email to