Why does Doofinder return different results when I search “estanterías” and “estanterias”? Doofinder was supposed to clean those “special characters” in the searches.
We’ll talk about what happens when someone makes a misspelling when searching.
There are some filters that are applied on the words when the data-feed is processed. We’ll consider two of them:
- Stemming: process where the root of the word is obtained.
- Character cleaning: some replacements are done, i.e. e instead of è, n instead of ñ, a instead of á…
In the index process the character cleaning is done after stemming. So, for instance, Spanish word estantería, is recognized as word in the dictionary and the stemming process takes the root estant, when character cleaning take place no special character is found.
Imagine someone mistypes this word and writes estanteria, which is wrong and doesn’t belongs to Spanish dictionary. Stemming takes estanteri as root, which results in removing the a (feminine particle in Spanish).
The first root (estant) will match with more words than the second (estanteri). For instance, the first root will match with every item with the word estante, while the second won’t.
The stemming is done before character cleaning because, if it is done after character cleaning a lot of words won’t be recognized as Spanish words. In the example above, estantería would be cleaned to estanteria and stemming wouldn’t recognize it.
To fix those cases where people usually mistype a word, synonyms could be used.
Though the behavior won’t be exactly the same for the synonyms (cause there are some fields where synonyms are not applied), it will improve the results for those misspellings.