Incorrect step size when string is longer than the number of n-grams desired #18

dblock · 2014-02-26T22:44:56Z

In https://github.com/artsy/mongoid_fulltext/blob/master/lib/mongoid_fulltext.rb:

# Figure out how many ngrams to extract from the string. If we can't afford to extract all ngrams,
# step over the string in evenly spaced strides to extract ngrams. For example, to extract 3 3-letter
# ngrams from 'abcdefghijk', we'd want to extract 'abc', 'efg', and 'ijk'.
if bound_number_returned
   step_size = [((filtered_str.length - config[:ngram_width]).to_f / config[:max_ngrams_to_search]).ceil, 1].max
else
   step_size = 1
end

If we want to get 3 n-grams: abc, efg and ijk from abcdefghijk (11) we need a step of 4, not 3.

(11.to_f - 3) / 3 = 2.6, ceil to 3

I think this needs to not do - config[:ngram_width].

However, I wonder whether the comment is incorrect and we want the first 3 n-grams instead of skipping characters.

cc: @aaw

The text was updated successfully, but these errors were encountered:

dblock added the bug? label Sep 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect step size when string is longer than the number of n-grams desired #18

Incorrect step size when string is longer than the number of n-grams desired #18

dblock commented Feb 26, 2014

Incorrect step size when string is longer than the number of n-grams desired #18

Incorrect step size when string is longer than the number of n-grams desired #18

Comments

dblock commented Feb 26, 2014