Selen writeup on Cohen et al.
This is a review of Cohen_2000_hardening_soft_information_sources by user:Selen.
In this paper, authors come up with a method the hardening problem which is inferring the most likely underlying database of a "soft" database that has inconsistencies and duplication.
The method is applied only to co-reference problems, however it would be useful if it were designed to be applied to broader domains. For instance, in gene entity databases, usually researches need to "normalize" the database in order to unify the names, abreviations, aliases to a common word. It is possible to extract the underlying "hard" database by perhaps looking at the protein associations etc. however even though this method is very elegant I wonder how it can be generalized to tasks in bioinformatics.