Cite Article

Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques

Choose citation format

BibTeX

@article{IJASEIT14345,
   author = {Marco Sánchez and Verónica Olmedo and Carlos Narvaez and Myriam Hernández and Luis Urquiza-Aguiar},
   title = {Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {11},
   number = {6},
   year = {2021},
   pages = {2534--2542},
   keywords = {Fraud triangle theory; machine learning; deep learning; LSTM; RNN.},
   abstract = {

Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices.  This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.

},    issn = {2088-5334},    publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},    url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345},    doi = {10.18517/ijaseit.11.6.14345} }

EndNote

%A Sánchez, Marco
%A Olmedo, Verónica
%A Narvaez, Carlos
%A Hernández, Myriam
%A Urquiza-Aguiar, Luis
%D 2021
%T Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques
%B 2021
%9 Fraud triangle theory; machine learning; deep learning; LSTM; RNN.
%! Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques
%K Fraud triangle theory; machine learning; deep learning; LSTM; RNN.
%X 

Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices.  This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 %R doi:10.18517/ijaseit.11.6.14345 %J International Journal on Advanced Science, Engineering and Information Technology %V 11 %N 6 %@ 2088-5334

IEEE

Marco Sánchez,Verónica Olmedo,Carlos Narvaez,Myriam Hernández and Luis Urquiza-Aguiar,"Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques," International Journal on Advanced Science, Engineering and Information Technology, vol. 11, no. 6, pp. 2534-2542, 2021. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.11.6.14345.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Sánchez, Marco
AU  - Olmedo, Verónica
AU  - Narvaez, Carlos
AU  - Hernández, Myriam
AU  - Urquiza-Aguiar, Luis
PY  - 2021
TI  - Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 11 (2021) No. 6
Y2  - 2021
SP  - 2534
EP  - 2542
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - Fraud triangle theory; machine learning; deep learning; LSTM; RNN.
N2  - 

Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices.  This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 DO - 10.18517/ijaseit.11.6.14345

RefWorks

RT Journal Article
ID 14345
A1 Sánchez, Marco
A1 Olmedo, Verónica
A1 Narvaez, Carlos
A1 Hernández, Myriam
A1 Urquiza-Aguiar, Luis
T1 Generation of a Synthetic Dataset for the Study of Fraud through Deep Learning Techniques
JF International Journal on Advanced Science, Engineering and Information Technology
VO 11
IS 6
YR 2021
SP 2534
OP 2542
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 Fraud triangle theory; machine learning; deep learning; LSTM; RNN.
AB 

Fraud is defined as any purposeful or deliberate act including cunning, deception, or other unfair means to deprive someone of property or money. Nowadays, fraud-related activities are growing at a dizzying rate, causing substantial economic losses every year. For an adequate analysis of this phenomenon, it is necessary to have data that evidences this behavior. Even so, given that these data are scarce and difficult to find, generating synthetic data for their study is a viable option. We designed two algorithms to generate text to create a synthetic data set that allows fraud analysis. These algorithms rely on the Fraud Triangle Theory proposed by Donald R. Cressey and use Recurrent Neural Network (RNN) and Long Short-Term Memory Networks (LSTM), respectively. The datasets generated were analyzed from the semantic point of view, giving a score about their readability and grammar consistency. The results obtained from this evaluation indicate that the data generation architecture proposed using the LSTM algorithm provides better performance in sentence readability (efficiency greater than 70%) than RNN (less than 40%). With LSTM, it was possible to synthesize a comprehensive data set related to the fraud triangle's vertices.  This will make it easier to investigate fraudulent actions that are linked to human behavior. We will present a fraud predictor system based on machine learning techniques in the future.

LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=14345 DO - 10.18517/ijaseit.11.6.14345