Advanced Global Name Recognition Technology1
IBM Information Management software
Advanced Global Name
Recognition Technology
Dr. John C. HermansenIBM Distinguished EngineerChief Technology OfficerIBM Global Name RecognitionAdvanced Global Name Recognition Technology2
IntroductionContents Despite many remarkable advances made in other areas of business automation, automated processing and matching of personal names in 2 Introduction databases has languished for decades without significant theoretical or practical advances. The purpose of this paper is to highlight the 4 Elements of the IBM High-Precision issues, requirements, and technologies available for automated Name Matching System advanced name recognition.
8 IBM Global Name Recognition - The problem to be solved is a familiar one for many people: a name Leading the Industry in Advanced is entered in one database with the surname "Rodgers," and in Name Recognition a different database as "Rogers." A person's name is recorded as "Dayton," but should actually be spelled "Deighton." The problem is 9 IBM Global Name Recognition greatly compounded with names originating outside North America. Technologies For example, the same Chinese person may have one set of information recorded under the surname "Xue," and another under the surname 10 IBM Global Name Analytics "Hsueh."
10 IBM Global Name Scoring The earliest attempt at coping with name variation was the Russell Soundex matching algorithm, developed around 1910 as an aid in 11 IBM Global Name Reference the manual analysis of U.S. Census records. The original Soundex Encyclopedia method of generating 'keys' was later implemented as a software-based algorithm, and is today the most widely used alternative to exact-12 Platforms Supported matching when names are involved in automated search and retrieval systems. Over the years, there have been many attempts to improve on 12 For Additional Information Soundex, but they are all still key-based systems and, therefore, suffer from the same fundamental deficiencies that plague Soundex.
While it is certainly compact and efficient, the key-based approach falls well short of solving many of the problems associated with searching for names. Two extensive studies examined the results of the Advanced Global Name Recognition Technology3
basic Soundex algorithm, using statistical measures to gauge accuracy.
. Study #1 Results: Only 33% of the matches that would be returned by Soundex would be correct. Even more significant was the finding that fully 25% of correct matches would fail to be discovered by Soundex. (Alan Stanier, September 1990, Computers in Genealogy, Vol. 3, No. 7). Study #2: Only 36.37% of Soundex returns were correct, while more than 60% of correct names were never returned by Soundex. (A.J. Lait and B. Randell, 1996)
Obviously, for mission-critical federal applications such as terrorist watch-lists, INS tracking, visa applications, and fraud detection, failing to identify 25-60% of target names within a database is unacceptable. The Federal Government recognized this deficiency, and worked with IBM Global Name Recognition over the past two decades to develop advanced technology for improving performance across multiple cultures. This approach hinges on the latest advances in computational linguistics - the application of statistics, mathematics, linguistics research, and computational expertise to the problem of name matching. This approach is now also available for commercial organizations.
IBM Global Name Recognition technology is the ONLY name searching patented software since Soundex!Advanced Global Name Recognition Technology4
Elements of the IBM High-Precision Name Matching SystemIn order to meet the challenges posed by large, multi-cultural databases in which both predictable and random name-spelling variations are present in a significant number of records, an IBM Global Name Recognition solution provides:
1. Culture-specific matching criteria. Naming systems differ significantly from one culture to the next-in the relative order in which parts of a name appear, in the consistency with which they are written in romanized form, in the way they are abbreviated, and in which parts are considered mandatory for identification. To identify all potential matches accurately, IBM technologies m... [download for more]