|
"It's a basic principle of data protection that personal information that we give for one purpose should not then be used for another purpose without our consent. This is particularly important since we often have no choice about giving government the information in the first place ? on tax returns, to receive benefits, to drive, or to obtain a passport." Although it is easy to see the many upsides of information exchange, it is hard to get past the potential downsides. Most organizations have adopted a "better safe than sorry" attitude, avoiding information sharing initiatives altogether.
To overcome this barrier, central challenges must be resolved: How can businesses, governments, and countries effectively exchange knowledge without handing over their data ownership and control of disclosure? How can they secure information in the exchange process to reduce the possibility of revealing sensitive details and thus compromise the security and privacy of the information they have been entrusted to protect? In other words, how can they achieve knowledge discovery without having to relinquish or discover knowledge?
Solution Description
At the Las Vegas, Nevada headquarters of IBM's Entity Analytic Solutions group, not far from the California high desert where in 1947 Chuck Yeager challenged the naysayers and broke the sound barrier, the anonymous data sharing barrier has been shattered. IBM's DB2 Anonymous Resolution software enables multiple organizations to share and compare proprietary information assets in a de-identified format that allows the original data holders to maintain control over the flow of what information is revealed and what information is concealed.
Extending the Utility of Existing One-Way Hashing
For years, cryptologists have used one-way hash techniques to accomplish various security functions, such as digital signatures which can be used to ensure that a document has not been modified. A one way hash is basically an algorithm that converts input text data into fixed strings of alphanumeric characters.
Conceptually speaking this capability would seem a natural fit for two organizations wishing to create a more secure environment for data sharing. Organization A, and Organization B, would simply one way hash their clear text identity information, share and then compare for common strings of alphanumeric characters. Provided that the same hash algorithm is in use at both sites and the original input text information being sought is "identical" prior to being one way hashed this approach could work, it is however very unlikely given the inconsistencies and irregularities plaguing most identity information stores that would seriously degrade any insight gleaned from the process.
Using a standard hashing process, if anything changes with the input-even if one character or extra space is added-the hashed output will be expressed by an entirely different hash value, this is called an avalanche effect.
Even a slight change in an input string should cause the hash value to change drastically. Even if 1 bit is flipped in the input string, at least half of the bits in the hash value will flip as a result. This is called an avalanche effect.
Although the variations in first and last name are slight the hashed values are completely different and thus recognized as three distinct identities with three hashed values. If the objective of sharing these records was to recognize duplicate customers between two data sets the resulting counts would be inaccurate. It is important to state that the inability of one way hashing to resolve identity does not reflect a problem with one way hashing whose primary function is data de-identification, rather it recognizes the limitation of hashing alone to facilitate true knowledge discovery from an information sharing exercise
DB2 Anonymous Resolution's breakthrough is its ability to correlate identity data within a hashed data set despite inconsistencies in how identities are expressed and poor data quality. Leveraging the IBM DB2 Relationship Resolution breakthrough context-accumulating techniques (patent-pending), pre-processing techniques are applied before the one-way hash is applied. As a result, AR achieves fuzzy-like matching properties, including the ability to recognize ambiguities, misspellings, or partial records within a data set and resolve identities across all attributes to produce the higher levels of information accuracy. In addition, AR can detect non-obvious relationships between individuals inside of the same anonymized data space.
|