Let’s take the sentence, “Dallas, it’s 9 a.m.: I’m walking down the street where it happens, 55 years ago…” Could you guess where the action takes place ? Of course, you may say: Dallas, in Texas, and the “it happens” certainly refers to the assassination of President Kennedy ! If it seems crystal clear to us, it’s only because we, humans, are very good at solving the ambiguities of natural language.

For a machine, however, “Dallas” could very well be a person’s name – and this scene a dialogue. Even specifying to the computer that Dallas is a place name [1], this question would remain : what Dallas are we talking about ? According to Wikipedia, at least twenty localities around the world share this name. Without sufficient clues (we mean : clues understandable by a machine), the computer will probably choose the most popular candidate. For example, the one with the longest Wikipedia page, or the most viewed.

This process of trying to link automatically words in a document (eg place names) to an external database (in this case, Wikipedia) is called in the scientific literature “Entity Linking”, “Named Entity Disambiguation”, “Reconciliation”, “Entity resolution”, and so on. Although much progress has been made since the seminal works [2], the task is far from being solved. Its difficulty varies according to the quality and the length of the text. A tweet or a SMS, for example, can be ambiguous even for a human. The same goes for short photo captions, as in the case of the Cegesoma database. The temporal context is also important. When a picture of the 1940s says that the scene takes place in “Hamme, Belgium”, we must not forget that, at the time, at least three belgian municipalities had this name, in three different provinces. Finally, it often happens that the named entities recognition extracts in the same text several place names. How to guess which one is the location where the picture was taken ?

The method we are working on uses clues from the photo database. In this case, the thesaurus keywords that Cegesoma’s archivists have applied to folders containing groups of photos related to the same theme. These keywords sometimes contain a place name, such as a city, a province, or a region. Using these terms, which we parsed out from the keywords, we will apply an algorithm [3] that will query a database called Wikidata – a kind of Wikipedia containing structured information, readable both by humans and machines. For each location previously extracted from the photo captions, the algorithm will select the possible candidates and choose the most probable based on the available clues. When multiple places are mentioned for the same picture, it will use the same clues to rule out the least likely. Finally, if the ambiguity is too strong, it will not try to guess further and will simply indicate that this text or this photo must be verified by a human.

 

[1]        Which is the role of the named entities recognition, see section ???
[2]                Rao, D., McNamee, P., & Dredze, M. (2013). Entity linking: Finding extracted entities in a knowledge base. In Multi-source, multilingual information extraction and summarization (pp. 93-115). Springer, Berlin, Heidelberg.
[3]        Written in Python. Still in development phase, but the code will be soon available online.

Leave a Reply

Be the First to Comment!

Notify of
avatar