Blog

AKBC 2022

A heavily biased selection of papers, repos, images and thoughts on AKBC 2022

Posted on November 13, 2022

It’s been more than 10 years since I presented one of my first papers at the first conference on Automated Knowledge Base Construction (AKBC) in Grenoble in 2010 😬 [Read more]

You shall know a word by the company it keeps

Ambiguity and synonym filters in elasticsearch

Posted on November 13, 2020

We’ve already seen some potential pitfalls when using synonym token filters in the post about nGram and synonym filters. Another thing to keep in mind when working with synonym filters is ambiguity, the problem that a word or even a phrase can have multiple meanings. For instance, the acronym “NL”... [Read more]

Awesome resources to get into Natural Language Processing

An opionated list of links

Posted on August 27, 2020

This list is by no means comprehensive. [Read more]

Demystifying elasticsearch's Decompounder Token Filter

What does the longest match setting do?

Posted on August 18, 2020

The Dictionary decompounder token filter is yet another tool for text analysis with elasticsearch. It is especially useful when creating search engines that need to handle languages like German. These languages tend to create awfully long compound words. If you are not familiar with compound words, you can find a... [Read more]

Combining Synonym and Ngram Filters in elasticsearch

What could possibly go wrong?

Posted on July 13, 2020

The Synonym token filter and the NGram token filter are two frequently used tools for text analysis with elasticsearch. [Read more]

My favourite Natural Language Processing tools

Three NLP tools worth the test of time award

Posted on October 9, 2018

There are many great Natural Language Processing (NLP) libraries and frameworks out there. But if you work in industry, you will find that they often don’t meet your needs. Be it in language support, application domain or simply in the licensing model. If you have to dig in and implement... [Read more]