Back to year 2006 when I was working for my first sphinxsearch
project I was playing with stopwords files. Stopwords is basically a small
set of highly frequent words you often don’t want to search for
(like “I”, “Am”, “The”, etc). For most sphinx instances they only
wasting index space and slower your search queries by finding all
occurrences of these non-important words.
Say if you are searching for “when is jane’s birthday” you are
actually looking to find documents with “jane’s birthday”, and
you don’t really care about lot’s of documents (blog posts, news
articles, etc) with only “when” and “is” inside.
Remove those high frequency words from search index is usually
smart move and ages ago I’ve created two stopword file samples
which I’m using by now.
…
[Read more]