Cite Article
Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques
Choose citation formatBibTeX
@article{IJASEIT14345, author = {Marco Sánchez and Verónica Olmedo and Carlos Narvaez and Myriam Hernández and Luis Urquiza-Aguiar}, title = {Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques}, journal = {International Journal on Advanced Science, Engineering and Information Technology}, volume = {11}, number = {6}, year = {2021}, pages = {2534--2542}, keywords = {Fraud triangle theory; machine learning; deep learning; LSTM; RNN.}, abstract = {Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices. This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.
}, issn = {2088-5334}, publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development}, url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345}, doi = {10.18517/ijaseit.11.6.14345} }
EndNote
%A Sánchez, Marco %A Olmedo, Verónica %A Narvaez, Carlos %A Hernández, Myriam %A Urquiza-Aguiar, Luis %D 2021 %T Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques %B 2021 %9 Fraud triangle theory; machine learning; deep learning; LSTM; RNN. %! Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques %K Fraud triangle theory; machine learning; deep learning; LSTM; RNN. %XFraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices. This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.
%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 %R doi:10.18517/ijaseit.11.6.14345 %J International Journal on Advanced Science, Engineering and Information Technology %V 11 %N 6 %@ 2088-5334
IEEE
Marco Sánchez,Verónica Olmedo,Carlos Narvaez,Myriam Hernández and Luis Urquiza-Aguiar,"Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques," International Journal on Advanced Science, Engineering and Information Technology, vol. 11, no. 6, pp. 2534-2542, 2021. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.11.6.14345.
RefMan/ProCite (RIS)
TY - JOUR AU - Sánchez, Marco AU - Olmedo, Verónica AU - Narvaez, Carlos AU - Hernández, Myriam AU - Urquiza-Aguiar, Luis PY - 2021 TI - Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques JF - International Journal on Advanced Science, Engineering and Information Technology; Vol. 11 (2021) No. 6 Y2 - 2021 SP - 2534 EP - 2542 SN - 2088-5334 PB - INSIGHT - Indonesian Society for Knowledge and Human Development KW - Fraud triangle theory; machine learning; deep learning; LSTM; RNN. N2 -Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices. This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.
UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 DO - 10.18517/ijaseit.11.6.14345
RefWorks
RT Journal Article ID 14345 A1 Sánchez, Marco A1 Olmedo, Verónica A1 Narvaez, Carlos A1 Hernández, Myriam A1 Urquiza-Aguiar, Luis T1 Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques JF International Journal on Advanced Science, Engineering and Information Technology VO 11 IS 6 YR 2021 SP 2534 OP 2542 SN 2088-5334 PB INSIGHT - Indonesian Society for Knowledge and Human Development K1 Fraud triangle theory; machine learning; deep learning; LSTM; RNN. ABFraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices. This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.
LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 DO - 10.18517/ijaseit.11.6.14345