It’s been more than 10 years since I presented one of my first papers at the first conference on Automated Knowledge Base Construction (AKBC) in Grenoble in 2010 😬
[Read More]
You shall know a word by the company it keeps
Ambiguity and synonym filters in elasticsearch
We’ve already seen some potential pitfalls when using synonym token filters in the post about nGram and synonym filters. Another thing to keep in mind when working with synonym filters is ambiguity, the problem that a word or even a phrase can have multiple meanings. For instance, the acronym “NL”...
[Read More]
Awesome resources to get into Natural Language Processing
An opionated list of links
This list is by no means comprehensive.
[Read More]
Demystifying elasticsearch's Decompounder Token Filter
What does the longest match setting do?
The Dictionary decompounder token filter is yet another tool for text analysis with elasticsearch. It is especially useful when creating search engines that need to handle languages like German. These languages tend to create awfully long compound words. If you are not familiar with compound words, you can find a...
[Read More]
Combining Synonym and Ngram Filters in elasticsearch
What could possibly go wrong?
The Synonym token filter and the NGram token filter are two frequently used tools for text analysis with elasticsearch.
[Read More]