mck created CASSANDRA-14247: ------------------------------- Summary: SASI tokenizer for simple delimiter based entries Key: CASSANDRA-14247 URL: https://issues.apache.org/jira/browse/CASSANDRA-14247 Project: Cassandra Issue Type: Improvement Components: sasi Reporter: mck
Currently SASI offers only two tokenizer options: - NonTokenizerAnalyser - StandardAnalyzer The latter is built upon Snowball, powerful for human languages but overkill for simple tokenization. A simple tokenizer is proposed here. The need for this arose as a workaround around CASSANDRA-11182, and to avoid the disk usage explosion when having to resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org