An Entity-Resolution Framework for Mission-Critical Applications
As the volume and velocity of data grow, inference across networks and semantic relationships between entities becomes increasingly difficult. Such challenges amount to a substantial barrier to organizations' ability to fully understand their data, let alone make effective use of predictive analytics to optimize targeting, thresholding, and resource management.
A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity: the process of reconciling these references is called entity-resolution. Entity Resolution is the process of identifying which of the records in a data collection refer to the same entity of the real world.
Detective Gadget is an innovative tool, developed by Svelto!, for solving entity-resolution tasks also in presence of dirty data.
Traditionally, the primary tasks involved in entity resolution are deduplication (presence of repeated data), record linkage (records that reference the same entity across different sources), and canonicalization (converting data with more than one possible representation into a standard form).
With Detective Gadget you can optimize the process because of its main features:
Detective Gadget adopts a generic approach to entity resolution, i.e., it may incorporate a variety of match functions in order to establish if two records match each other, seen largely as black boxes. It allows for the maximum flexibility in incorporating known techniques to speed-up the initial entity-resolution step.
Detective Gadget does not assume that a data cleaning step has been performed prior to the entity-resolution phase; on the contrary, it handles data- cleaning and entity-resolution in an integrated fashion, using a greedy match algorithm that uses both positive and negative evidence about the matches to refines the entity-resolution while at the same time cleaning-up the original data set.
Detective Gadget utilizes a very fast match algorithm capable of leveraging past positive and negative evidence in an extremely fast way. The algorithm is based on a novel technique called alias-based hashing, that relies on shadow values for the input records, called aliases.
This approach really transforms the entity-resolution tasks in respect of the traditional way to do!