Record matching refers to the task of finding entries that refer to the same entity in two or more files, is a vital process in data integration. Most of the record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for web database scenario, where query results dynamically generated on-the- fly. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which effectively identifies the duplicates from query result records of multiple web databases. First, same source duplicates are eliminated by using exact matching method the ―presumed‖ non duplicate records from the same source can be used as training examples . Starting from the non duplicate set, we use two cooperating classifiers a weight component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases.
ALEKHYA, V. and NAIK, DS.BHUPAL
"RECORD MATCHING FOR WEB DATABASES BY DOMAIN-SPECIFIC QUERY PROBING,"
International Journal of Computer and Communication Technology: Vol. 7:
4, Article 5.
Available at: https://www.interscience.in/ijcct/vol7/iss4/5