AgnosticChaos: A Tool to Assess Software Applications' Reliability

Odai Hussein Ahmed Al-sayaghi (1), Noraini Che Pa (2), Hafeez Osman (3), Ainita Ban (4)
(1) Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, University Putra Malaysia, Serdang, Selangor, Malaysia
(2) Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, University Putra Malaysia, Serdang, Selangor, Malaysia
(3) Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, University Putra Malaysia, Serdang, Selangor, Malaysia
(4) Department of Software Engineering and Information System, Faculty of Computer Science and Information Technology, University Putra Malaysia, Serdang, Selangor, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
[1]
O. H. A. Al-sayaghi, N. Che Pa, H. Osman, and A. Ban, “AgnosticChaos: A Tool to Assess Software Applications’ Reliability ”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 15, no. 3, pp. 806–813, Jun. 2025.
Data-intensive software applications that process millions of events per second from IoT sensors need to maintain high availability to deliver continuous data to consumers. Downtime can have significant impacts on service quality, so many cloud vendors, including Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP), provide geo-redundant infrastructure to enhance service reliability. However, despite these provisions, it remains crucial to assess the resilience and reliability of software applications under varied outage and fault scenarios to ensure that they can handle unexpected disruptions. Chaos engineering offers a systematic approach to enhancing this reliability by deliberately introducing controlled failures into systems. This practice enables developers to gain insights into an application's response under stress, ultimately fostering a better understanding of its robustness and identifying areas for improvement. This research introduces AgnosticChaos, a novel tool designed to integrate with Azure's continuous delivery pipelines, enabling the seamless use of multiple cloud vendors and third-party chaos engineering tools. AgnosticChaos provides a streamlined environment for testing applications' resilience and reliability prior to deployment in a production environment. To evaluate its effectiveness, AgnosticChaos was tested on three open-source microservices: an event producer, event receiver, and event retainer. Our findings reveal that AgnosticChaos is not only more efficient and developer-friendly but also offers comparable effectiveness to direct use of third-party chaos engineering tools. This study highlights the value of AgnosticChaos as a vital component in pre-production workflows, offering a comprehensive and adaptable solution for resilience testing across diverse cloud environments.

R. Gupta et al., "An IoT-centric data protection method for preserving security and privacy in cloud," IEEE Syst. J., vol. 17, no. 2, pp. 2445-2454, Jun. 2023, doi: 10.1109/jsyst.2022.3218894.

H. A. A. Hassan and M. Zolfy, "Exploring lightweight deep learning techniques for intrusion detection systems in IoT networks: A survey," J. Electr. Syst., pp. 1944-1958, 2024, doi: 10.52783/jes.2292.

M. Kokila and K. Srinivasa Reddy, "Authentication, access control and scalability models in Internet of Things security-A review," Cybersecur. Appl., vol. 3, pp. 1-18, 2025, doi:10.1016/j.csa.2024.100057.

S. Pal, A. Dorri, and R. Jurdak, "Blockchain for IoT access control: Recent trends and future research directions," J. Netw. Comput. Appl., vol. 203, pp. 1-19, 2022, doi: 10.1016/j.jnca.2022.103371.

C. M. Mohammed and S. R. M. Zeebaree, "Sufficient comparison among cloud computing services: IaaS, PaaS, and SaaS: A review," Int. J. Sci. Bus., vol. 5, no. 2, pp. 17-30, 2021.

G. Saini and N. Kaur, "Information leakage techniques in cloud computing: A review," in Proc. ICTACS, Tashkent, Uzbekistan, 2022, pp. 327-334, doi: 10.1109/ictacs56270.2022.9988405.

U. S. Umar and M. E. Rana, "Cloud revolution in manufacturing: Exploring benefits, applications, and challenges in the era of digital transformation," in Proc. ICETSIS, Manama, Bahrain, 2024, pp. 1890-1897, doi: 10.1109/icetsis61505.2024.10459473.

R. Islam et al., "The future of cloud computing: Benefits and challenges," Int. J. Commun. Netw. Syst. Sci., vol. 16, no. 4, pp. 53-65, Apr. 2023, doi: 10.4236/ijcns.2023.164004.

S. S. Gill et al., "Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: Evolution, vision, trends and open challenges," Internet Things, vol. 8, pp. 1-26, 2019, doi:10.1016/j.iot.2019.100118.

L. Zhang, J. Ron, B. Baudry, and M. Monperrus, "Chaos engineering of ethereum blockchain clients," Distrib. Ledger Technol. Res. Pract., vol. 2, no. 3, pp. 1-18, 2023, doi: 10.1145/3611649.

A. Tchernykh et al., "Towards understanding uncertainty in cloud computing with risks of confidentiality, integrity, and availability," J. Comput. Sci., vol. 36, no. 4, pp. 1-9, 2019, doi:10.1016/j.jocs.2016.11.011.

M. Akour, M. Alenezi, and O. Alqasem, "Enhancing software fault detection with deep reinforcement learning: A Q-learning approach," in Proc. ACM Int. Conf. Proc. Ser., 2024, pp. 97-101, doi:10.1145/3651781.3651796.

C. Jiang et al., "A hybrid computing framework for risk-oriented reliability analysis in dynamic PSA context: A case study," Qual. Reliab. Eng., vol. 39, no. 8, pp. 3445-3471, 2023, doi:10.1002/qre.3196.

M. Jaival, K. Markaym, and A. Kaplan, "Serverless cloud functions - Opportunity in chaos," in Proc. CSCI, Las Vegas, NV, USA, 2022, pp. 1330-1335, doi: 10.1109/csci58124.2022.00239.

G. Chen et al., "Big data system testing method based on chaos engineering," in Proc. ICEIEC, Beijing, China, 2022, pp. 210-215, doi: 10.1109/iceiec54567.2022.9835072.

S. Sharieh and A. Ferworn, "Securing APIs and chaos engineering," in Proc. ESSCA, 2021, pp. 290-294, doi:10.1109/cns53000.2021.9705049.

A. Basiri et al., "Chaos engineering," IEEE Softw., vol. 33, no. 3, pp. 35-41, May-Jun. 2016, doi: 10.1109/ms.2016.60.

A. A. Z. Ibrahim et al., "Reliability-aware swarm based multi-objective optimization for controller placement in distributed SDN architecture," Digit. Commun. Netw., vol. 10, no. 5, pp. 1245-1257, 2024, doi: 10.1016/j.dcan.2023.11.007.

M. Verma et al., "A chaos recommendation tool for reliability testing in large-scale cloud-native systems," in Proc. COMSNETS, 2024, pp. 270-272, doi: 10.1109/comsnets59351.2024.10427311.

J. Simonsson et al., "Observability and chaos engineering on system calls for containerized applications in Docker," Future Gener. Comput. Syst., vol. 122, pp. 117-129, 2021, doi:10.1016/j.future.2021.04.001.

H. Jernberg, P. Runeson, and E. Engström, "Getting started with chaos engineering - Design of an implementation framework in practice," in Proc. ESEM, Bari, Italy, 2020, pp. 1-10, doi:10.1145/3382494.3421464.

K. A. Torkura et al., "CloudStrike: Chaos engineering for security and resiliency in cloud infrastructure," IEEE Access, vol. 8, pp. 123044-123060, 2020, doi: 10.1109/access.2020.3007338.

F. Poltronieri, M. Tortonesi, and C. Stefanelli, "ChaosTwin: A chaos engineering and digital twin approach for the design of resilient IT services," in Proc. CNSM, 2021, pp. 234-238, doi:10.23919/cnsm52442.2021.9615519.

A. Basiri et al., "Automating chaos experiments in production," in Proc. ICSE-SEIP, Montreal, QC, Canada, 2019, pp. 31-40, doi:10.1109/icse-seip.2019.00012.

A. Blohowiak et al., "A platform for automating chaos experiments," in Proc. ISSREW, 2016, pp. 5-8, doi: 10.1109/issrew.2016.52.

M. Ohshima and N. Uchihira, "Mechanisms for improving investment efficiency through continuous delivery in internet services," in Proc. PICMET, 2023, pp. 1-6, doi:10.23919/picmet59654.2023.10216798.

L. Chen, "Continuous delivery: Huge benefits, but challenges too," IEEE Softw., vol. 32, no. 2, pp. 50-54, 2015, doi:10.1109/MS.2015.27.

M. Soni, "End to end automation on cloud with build pipeline: The case for DevOps in insurance industry, continuous integration, continuous testing, and continuous delivery," in Proc. CCEM, 2015, pp. 85-89, doi: 10.1109/ccem.2015.29.

S. Afaneh et al., "Security challenges review in agile and DevOps practices," in Proc. ICIT, 2023, pp. 102-107, doi:10.1109/icit58056.2023.10226018.

S. Shawki et al., "Healthcare monitoring system for automatic database management using mobile application in IoT environment," Bull. Electr. Eng. Inform., vol. 12, no. 2, pp. 1055-1068, Apr. 2023, doi: 10.11591/eei.v12i2.4282.

M. S. Almhanna et al., "Customizing the minimum number of replicas for achieving fault tolerance in a cloud/grid environment," Bull. Electr. Eng. Inform., vol. 13, no. 1, pp. 396-404, 2024, doi:10.11591/eei.v13i1.5413.

G. Premsankar, M. Di Francesco, and T. Taleb, "Edge computing for the Internet of Things: A case study," IEEE Internet Things J., vol. 5, no. 2, pp. 1275-1284, 2018, doi: 10.1109/jiot.2018.2805263.

L. M. Song, M. Zhang, and Y. Luo, "Effective replica management for improving reliability and availability in edge-cloud computing environment," J. Parallel Distrib. Comput., vol. 143, pp. 107-128, 2020, doi: 10.1016/j.jpdc.2020.04.012.

S. Isukapalli and S. N. Srirama, "A systematic survey on fault-tolerant solutions for distributed data analytics: Taxonomy, comparison, and future directions," Comput. Sci. Rev., vol. 53, pp. 1-25, Aug. 2024, doi:10.1016/j.cosrev.2024.100660.

D. Saxena et al., "A fault tolerant elastic resource management framework toward high availability of cloud services," IEEE Trans. Netw. Serv. Manag., vol. 19, no. 3, pp. 3048-3061, Sep. 2022, doi:10.1109/tnsm.2022.3170379.

P. Kumari and P. Kaur, "A survey of fault tolerance in cloud computing," J. King Saud Univ.-Comput. Inf. Sci., vol. 33, no. 10, pp. 1159-1176, 2021, doi: 10.1016/j.jksuci.2018.09.021.

A. Gupta, V. Chandra, and A. Dixit, "Reliability analysis of a fault-tolerant full-duplex optical wireless communication transceiver," IEEE Access, vol. 11, pp. 61298-61312, 2023, doi:10.1109/access.2023.3287335.

H. Adamu et al., "An approach to failure prediction in a cloud based environment," in Proc. FiCloud, 2017, pp. 191-197, doi:10.1109/ficloud.2017.56.

Q. Lin et al., "Predicting node failure in cloud service systems," in Proc. ESEC/FSE, 2018, pp. 480-490, doi:10.1145/3236024.3236060.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).