Software Traceability in Agile Development Using Topic Modeling

Nuraisa Novia Hidayati (1), Siti Rochimah (2), Agus Budi Raharjo (3)
(1) Department of Informatics Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
(2) Department of Informatics Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
(3) Department of Informatics Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Hidayati, Nuraisa Novia, et al. “Software Traceability in Agile Development Using Topic Modeling”. International Journal on Advanced Science, Engineering and Information Technology, vol. 12, no. 4, Aug. 2022, pp. 1410-2, doi:10.18517/ijaseit.12.4.15195.
Tracing the implementation of requirements for making better software identifies whether the application fulfils users' desires; progress of development; problematic areas in the testing process, and how far those apply to the source code. In this paper, the software development method we studied was the agile method, Extreme Programming (XP). The artifacts in the agile approach considered vital include the requirement documents, test documents, and source codes. We used Topic Modelling to map the content similarities from those documents to make trace links. The three topic modelling methods we compared consist of Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF). The NMF method proved itself the most stable, with an accuracy value of 67% for the requirement, 59% for testing, and 48% for defect lists. The second application results proved more accurate with 70%, 79%, and 54%. Although NMF lost to LSA in the second application (LSA achieved an accuracy of 79%, 84%, and 56%), the precision and recall values showed almost similar results. We successfully found the link in the source code based on keywords extracted from each topic. This research provides a way of explaining the requirement in detail, simplifying it for tracing purposes such as the consistent use of terms, technical details inclusion, and mentioning all the variables involved. In the future, sentence structure and synonyms need recognition as part of pre-processing to build better trace links.

N. N. Hidayati and S. Rochimah, "Requirements traceability for detecting defects in agile software development," EECCIS 2020 - 2020 10th Electr. Power, Electron. Commun. Control. Informatics Semin., pp. 248-253, 2020, doi: 10.1109/EECCIS49483.2020.9263420.

B. Wang, R. Peng, Y. Li, H. Lai, and Z. Wang, "Requirements traceability technologies and technology transfer decision support: A systematic review," J. Syst. Softw., vol. 146, pp. 59-79, 2018, doi: 10.1016/j.jss.2018.09.001.

T. Vale, E. S. de Almeida, V. Alves, U. Kulesza, N. Niu, and R. de Lima, "Software product lines traceability: A systematic mapping study," Inf. Softw. Technol., vol. 84, pp. 1-18, 2017, doi: 10.1016/j.infsof.2016.12.004.

C. Mills, J. Escobar-Avila, and S. Haiduc, "Automatic traceability maintenance via machine learning classification," Proc. - 2018 IEEE Int. Conf. Softw. Maint. Evol. ICSME 2018, pp. 369-380, 2018, doi: 10.1109/ICSME.2018.00045.

D. Nanang, P. L. Penelusuran, and P. L. Penelusuran, “Pembangunan Link Penelusuran Kebutuhan Fungsional Dan Method Pada Kode Sumber Dengan Metode Pengambilan Informasi,” ELTEK, vol. 16, pp. 151-165, 2018, [Online]. Available: https://doi.org/10.33795/eltek.v16i2.106.

A. S. Ahmadiyah, R. Sarno, and F. Revindasari, "Adopted topic modeling for business process and software component conformity checking," Telkomnika (Telecommunication Comput. Electron. Control., vol. 18, no. 6, pp. 2939-2947, 2020, doi: 10.12928/TELKOMNIKA.v18i6.13381.

S. Rani and M. Kumar, "Topic modeling and its applications in materials science and engineering," Mater. Today Proc., vol. 45, pp. 5591-5596, 2021, doi: 10.1016/j.matpr.2021.02.313.

J. Zhao, Q. P. Feng, P. Wu, J. L. Warner, J. C. Denny, and W. Q. Wei, "Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)," PLoS One, vol. 14, no. 2, pp. 1-15, 2019, doi: 10.1371/journal.pone.0212112.

R. Albalawi, T. H. Yeap, and M. Benyoucef, "Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis," Front. Artif. Intell., vol. 3, no. July, pp. 1-14, 2020, doi: 10.3389/frai.2020.00042.

Y. Chen, H. Zhang, R. Liu, Z. Ye, and J. Lin, "Experimental explorations on short text topic mining between LDA and NMF based Schemes," Knowledge-Based Syst., vol. 163, pp. 1-13, 2019, doi: 10.1016/j.knosys.2018.08.011.

Q. Fu, Y. Zhuang, J. Gu, Y. Zhu, and X. Guo, "Agreeing to Disagree: Choosing Among Eight Topic-Modeling Methods," Big Data Res., vol. 23, p. 100173, 2021, doi: 10.1016/j.bdr.2020.100173.

H. Kaiya, A. Hazeyama, S. Ogata, T. Okubo, N. Yoshioka, and H. Washizaki, "Towards a knowledge base for software developers to choose suitable traceability techniques," Procedia Comput. Sci., vol. 159, pp. 1075-1084, 2019, doi: 10.1016/j.procs.2019.09.276.

A. Guo and T. Yang, "Research and improvement of feature words weight based on TFIDF algorithm," Proc. 2016 IEEE Inf. Technol. Networking, Electron. Autom. Control Conf. ITNEC 2016, pp. 415-419, 2016, doi: 10.1109/ITNEC.2016.7560393.

H. Suhartoyo and S. Rochimah, “Membangun Hubungan Kerunutan Artifak Pada Lingkungan Pengembangan Cepat,” SYSTEMIC, vol. 02, no. 01, pp. 1-17, 2016.

P. M. Prihatini, I. Putra, I. Giriantari, and M. Sudarma, "Indonesian text feature extraction using gibbs sampling and mean variational inference latent dirichlet allocation," QiR 2017 - 2017 15th Int. Conf. Qual. Res. Int. Symp. Electr. Comput. Eng., vol. 2017-Decem, pp. 40-44, 2017, doi: 10.1109/QIR.2017.8168448.

P. Suri and N. R. Roy, "Comparison between LDA & NMF for event-detection from large text stream data," 3rd IEEE Int. Conf. , pp. 1-5, 2017, doi: 10.1109/CIACT.2017.7977281.

T. D. Hien, D. Van Tuan, P. Van At, and L. H. Son, "Novel algorithm for non-negative matrix factorization," New Math. Nat. Comput., vol. 11, no. 2, pp. 121-133, 2015, doi: 10.1142/S1793005715400013.

H. Dalianis and H. Dalianis, "Evaluation Metrics and Evaluation," Clin. Text Min., no. 1967, pp. 45-53, 2018, doi: 10.1007/978-3-319-78503-5_6.

S. Vanbelle, "Comparing dependent kappa coefficients obtained on multilevel data," Biometrical J., vol. 59, no. 5, pp. 1016-1034, 2017, doi: 10.1002/bimj.201600093.

E. Bagli and G. Visani, "Metrics for Multi-Class Classification : an Overview," arXiv, vol. abs/2008.0, pp. 1-17, 2020.

S. A. Curiskis, B. Drake, T. R. Osborn, and P. J. Kennedy, "An evaluation of document clustering and topic modelling in two online social networks : Twitter and Reddit," Inf. Process. Manag., vol. 57, no. 2, p. 102034, 2020, doi: 10.1016/j.ipm.2019.04.002.

D. Braun and M. Langen, "Evaluating Natural Language Understanding Services for Conversational Question Answering Systems," Proc. 18th Annu. {SIG}dial Meet. Discourse Dialogue, no. August, pp. 174-185, 2017.

M. Belford, B. Mac Namee, and D. Greene, "Stability of topic modeling via matrix factorization," Expert Syst. Appl., vol. 91, pp. 159-169, 2018, doi: 10.1016/j.eswa.2017.08.047.

R. M. Suleman and I. Korkontzelos, "Extending latent semantic analysis to manage its syntactic blindness," Expert Syst. Appl., vol. 165, no. January 2020, p. 114130, 2021, doi: 10.1016/j.eswa.2020.114130.

A Amalia et al, "Automated Bahasa Indonesia essay evaluation with latent semantic analysis Automated Bahasa Indonesia essay evaluation with latent semantic analysis," J. Phys. Conf. Ser. 1235 012100, pp. 0-8, 2019, doi: 10.1088/1742-6596/1235/1/012100.

J. A. Lossio-Ventura, S. Gonzales, J. Morzan, H. Alatrista-Salas, T. Hernandez-Boussard, and J. Bian, "Evaluation of clustering and topic modeling methods over health-related tweets and emails," Artif. Intell. Med., vol. 117, no. March, p. 102096, 2021, doi: 10.1016/j.artmed.2021.102096.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).