Cite Article

Record Duplication Detection in Database: A Review

Choose citation format

BibTeX

@article{IJASEIT1368,
   author = {Saleh Rehiel Alenazi and Kamsuriah Ahmad},
   title = {Record Duplication Detection in Database: A Review},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {6},
   number = {6},
   year = {2016},
   pages = {838--845},
   keywords = {Duplication Detection Algorithm; Windowing; Blocking; Scalability; Efficiency; Data Quality},
   abstract = {

The recognition of similar entities in databases has gained substantial attention in many application areas. Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate record detection. Calculating time complexity of the proposed methods reveals their performance rating. The time complexity calculation showed that the efficiency of these methods improved when blocking and windowing is applied. Some domain-specific methods train systems to optimize results and improve efficiency and scalability, but they are prone to errors. Most of the existing methods fail to either discuss, or lack thoroughness in consideration of scalability. The process of sorting and searching form an essential part of duplication detection, but they are time-consuming. Therefore this paper proposes the possibility of eliminating the sorting process by utilization of tree structure to improve the record duplication detection. This has added benefits of reducing time required, and offers a probable increase in scalability. For database system, scalability is an inherent feature for any proposed solution, due to the fact that the data size is huge. Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods. This paper reveals that the current proposed methods lack in providing solutions that are scalable, high accurate, and reduce the processing time during detecting duplication of records in database. The ability to provide solutions to this problem will improve the quality of data that are used for decision making process.

},    issn = {2088-5334},    publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},    url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=1368},    doi = {10.18517/ijaseit.6.6.1368} }

EndNote

%A Alenazi, Saleh Rehiel
%A Ahmad, Kamsuriah
%D 2016
%T Record Duplication Detection in Database: A Review
%B 2016
%9 Duplication Detection Algorithm; Windowing; Blocking; Scalability; Efficiency; Data Quality
%! Record Duplication Detection in Database: A Review
%K Duplication Detection Algorithm; Windowing; Blocking; Scalability; Efficiency; Data Quality
%X 

The recognition of similar entities in databases has gained substantial attention in many application areas. Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate record detection. Calculating time complexity of the proposed methods reveals their performance rating. The time complexity calculation showed that the efficiency of these methods improved when blocking and windowing is applied. Some domain-specific methods train systems to optimize results and improve efficiency and scalability, but they are prone to errors. Most of the existing methods fail to either discuss, or lack thoroughness in consideration of scalability. The process of sorting and searching form an essential part of duplication detection, but they are time-consuming. Therefore this paper proposes the possibility of eliminating the sorting process by utilization of tree structure to improve the record duplication detection. This has added benefits of reducing time required, and offers a probable increase in scalability. For database system, scalability is an inherent feature for any proposed solution, due to the fact that the data size is huge. Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods. This paper reveals that the current proposed methods lack in providing solutions that are scalable, high accurate, and reduce the processing time during detecting duplication of records in database. The ability to provide solutions to this problem will improve the quality of data that are used for decision making process.

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=1368 %R doi:10.18517/ijaseit.6.6.1368 %J International Journal on Advanced Science, Engineering and Information Technology %V 6 %N 6 %@ 2088-5334

IEEE

Saleh Rehiel Alenazi and Kamsuriah Ahmad,"Record Duplication Detection in Database: A Review," International Journal on Advanced Science, Engineering and Information Technology, vol. 6, no. 6, pp. 838-845, 2016. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.6.6.1368.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Alenazi, Saleh Rehiel
AU  - Ahmad, Kamsuriah
PY  - 2016
TI  - Record Duplication Detection in Database: A Review
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 6 (2016) No. 6
Y2  - 2016
SP  - 838
EP  - 845
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - Duplication Detection Algorithm; Windowing; Blocking; Scalability; Efficiency; Data Quality
N2  - 

The recognition of similar entities in databases has gained substantial attention in many application areas. Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate record detection. Calculating time complexity of the proposed methods reveals their performance rating. The time complexity calculation showed that the efficiency of these methods improved when blocking and windowing is applied. Some domain-specific methods train systems to optimize results and improve efficiency and scalability, but they are prone to errors. Most of the existing methods fail to either discuss, or lack thoroughness in consideration of scalability. The process of sorting and searching form an essential part of duplication detection, but they are time-consuming. Therefore this paper proposes the possibility of eliminating the sorting process by utilization of tree structure to improve the record duplication detection. This has added benefits of reducing time required, and offers a probable increase in scalability. For database system, scalability is an inherent feature for any proposed solution, due to the fact that the data size is huge. Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods. This paper reveals that the current proposed methods lack in providing solutions that are scalable, high accurate, and reduce the processing time during detecting duplication of records in database. The ability to provide solutions to this problem will improve the quality of data that are used for decision making process.

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=1368 DO - 10.18517/ijaseit.6.6.1368

RefWorks

RT Journal Article
ID 1368
A1 Alenazi, Saleh Rehiel
A1 Ahmad, Kamsuriah
T1 Record Duplication Detection in Database: A Review
JF International Journal on Advanced Science, Engineering and Information Technology
VO 6
IS 6
YR 2016
SP 838
OP 845
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 Duplication Detection Algorithm; Windowing; Blocking; Scalability; Efficiency; Data Quality
AB 

The recognition of similar entities in databases has gained substantial attention in many application areas. Despite several techniques proposed to recognize and locate duplication of database records, there is a dearth of studies available which rate the effectiveness of the diverse techniques used for duplicate record detection. Calculating time complexity of the proposed methods reveals their performance rating. The time complexity calculation showed that the efficiency of these methods improved when blocking and windowing is applied. Some domain-specific methods train systems to optimize results and improve efficiency and scalability, but they are prone to errors. Most of the existing methods fail to either discuss, or lack thoroughness in consideration of scalability. The process of sorting and searching form an essential part of duplication detection, but they are time-consuming. Therefore this paper proposes the possibility of eliminating the sorting process by utilization of tree structure to improve the record duplication detection. This has added benefits of reducing time required, and offers a probable increase in scalability. For database system, scalability is an inherent feature for any proposed solution, due to the fact that the data size is huge. Improving the efficiency in identifying duplicate records in databases is an essential step for data cleaning and data integration methods. This paper reveals that the current proposed methods lack in providing solutions that are scalable, high accurate, and reduce the processing time during detecting duplication of records in database. The ability to provide solutions to this problem will improve the quality of data that are used for decision making process.

LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=1368 DO - 10.18517/ijaseit.6.6.1368