It’s been more than 10 years since I presented one of my first papers at the first conference on Automated Knowledge Base Construction (AKBC) in Grenoble in 2010 😬
[Read more]
You shall know a word by the company it keeps
Ambiguity and synonym filters in elasticsearch
We’ve already seen some potential pitfalls when using synonym token filters in the post about nGram and synonym filters. Another thing to keep in mind when working with synonym filters is ambiguity, the problem that a word or even a phrase can have multiple meanings. For instance, the acronym “NL”...
[Read more]
Awesome resources to get into Natural Language Processing
An opionated list of links
This list is by no means comprehensive.
[Read more]
Demystifying elasticsearch's Decompounder Token Filter
What does the longest match setting do?
The Dictionary decompounder token filter is yet another tool for text analysis with elasticsearch. It is especially useful when creating search engines that need to handle languages like German. These languages tend to create awfully long compound words. If you are not familiar with compound words, you can find a...
[Read more]
Combining Synonym and Ngram Filters in elasticsearch
What could possibly go wrong?
The Synonym token filter and the NGram token filter are two frequently used tools for text analysis with elasticsearch.
[Read more]
My favourite Natural Language Processing tools
Three NLP tools worth the test of time award
There are many great Natural Language Processing (NLP) libraries and frameworks out there. But if you work in industry, you will find that they often don’t meet your needs. Be it in language support, application domain or simply in the licensing model. If you have to dig in and implement...
[Read more]