Design and Implementation of a Data Preprocessing Automatic Assessment Module in Jupyter Notebook

HakNeung Go (1), Hyunwoo Moon (2), Youngjun Lee (3), Seong-Won Kim (4)
(1) Songjeong Jungang Elementary School, 90 Eodeung-daero 653beon-gil, Gwangsan-Gu, Gwangju, Republic of Korea
(2) Department of Computer Education, Korea National University of Education, 250 Taeseongtabyeon-ro, Cheongju, Republic of Korea
(3) Department of Computer Education, Korea National University of Education, 250 Taeseongtabyeon-ro, Cheongju, Republic of Korea
(4) Department of Computer Education, Busan National University of Education, 24 Gyodae-ro, Yeonje-gu, Busan, Republic of Korea
Fulltext View | Download
How to cite (IJASEIT) :
Go , HakNeung, et al. “Design and Implementation of a Data Preprocessing Automatic Assessment Module in Jupyter Notebook”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 6, Dec. 2024, pp. 1922-7, doi:10.18517/ijaseit.14.6.20233.
In data analysis, the preprocessing step is crucial, directly impacting the accuracy and reliability of results. While data preprocessing using a programming language offers capacity, speed, and reproducibility advantages, the complexity of learning programming languages and the scarcity of supportive educational tools pose significant challenges. This study introduces the Data Preprocessing Automatic Assessment (DPAA) module, designed to facilitate learning data preprocessing through programming. The DPAA module, developed using Python and Pandas, utilizes a model answer-based assessment method. It features a self-assessment mechanism that simultaneously displays the outputs of the student’s code alongside the model answer, highlighting discrepancies for visual emphasis. Additionally, it includes an automatic evaluation method that compares and evaluates results after transforming them into an array. Furthermore, feedback is provided when the student's answer is incorrect. An example was constructed to validate the DPAA module based on a tutorial from the official Pandas website. The DPAA module, along with examples, was reviewed by informatics teachers at a high school for gifted students and was confirmed for its effectiveness. The DPAA module is expected to support the learning of data preprocessing syntax using Pandas, thereby aiding in the broader application of Python in data analysis. This innovative tool promises to enhance educational outcomes by making the learning process more interactive and supportive, ultimately fostering a deeper understanding of data preprocessing techniques.

I. Ahmed, M. Ahmad, G. Jeon, and F. Piccialli, “A Framework for Pandemic Prediction Using Big Data Analytics,” Big Data Research, vol. 25, p. 100190, Jul. 2021, doi: 10.1016/j.bdr.2021.100190.

W. Jang and S. Kim, “A review on trends of programming(algorithm) automated assessment system and it’s application,” The Journal of Korean Association of Computer Education, vol. 20, no. 1, pp. 13-26, Jan. 2017.

S. Kim, Y. Jeon, and T. Kim, “Research on the Development and Utility Analysis of K-12 Artificial Intelligence Educational Datasets Using Synthetic Datasets Generation Method,” The Journal of Korean Association of Computer Education. Vol. 25, no. 3, pp. 9–21, May 2022.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Frontiers in Energy Research, vol. 9, Mar. 2021, doi: 10.3389/fenrg.2021.652801.

J. Huang, Y.-F. Li, and M. Xie, “An empirical analysis of data preprocessing for machine learning-based software cost estimation,” Information and Software Technology, vol. 67, pp. 108–127, Nov. 2015, doi: 10.1016/j.infsof.2015.07.004.

Ministry of Education, “Informatics curriculum in 2022 revised curriculum”, 2022. [Online]. Available: http://ncic.re.kr. Accessed on: Mar. 1, 2024.

W. Jang and S. Kim, “Differences in self-efficacy between block and textual language in programming education using online judge,” The Journal of Korean Association of Computer Education, vol. 23, no. 4, pp. 23-33, Jul. 2020.

W. Jang, “The Effects of Online Judge System on Motivation and Thinking in Programming Education : Structural Relationships between Factors,” The Journal of Korean Association of Computer Education, vol. 24, no. 5, pp. 1-16, Sep. 2021.

L. Y.-H. Lo, Y. Ming, and H. Qu, “Learning Vis Tools: Teaching Data Visualization Tutorials,” 2019 IEEE Visualization Conference (VIS), pp. 11–15, Oct. 2019, doi: 10.1109/visual.2019.8933751.

Y. Watanobe, Md. M. Rahman, T. Matsumoto, U. K. Rage, and P. Ravikumar, “Online Judge System: Requirements, Architecture, and Experiences,” International Journal of Software Engineering and Knowledge Engineering, vol. 32, no. 06, pp. 917–946, Jun. 2022, doi:10.1142/s0218194022500346.

D. Bilegjargal and N.-L. Hsueh, “Understanding Students’ Acceptance of Online Judge System in Programming Courses: A Structural Equation Modeling Approach,” IEEE Access, vol. 9, pp. 152606–152615, 2021, doi: 10.1109/access.2021.3126896.

J. Wang, P. Lin, Z. Tang, and S. Chen, “How problem difficulty and order influence programming education outcomes in online judge systems,” Heliyon, vol. 9, no. 11, p. e20947, Nov. 2023, doi:10.1016/j.heliyon.2023.e20947.

H. Go and Y. Lee, “Design and Implementation of a Data Visualization Assessment Module in Jupyter Notebook,” Journal of the Korea society of computer and information, vol. 28, no. 9, pp. 167-176, Oct. 2023.

M. Islam, “Data Analysis: Types, Process, Methods, Techniques and Tools,” International Journal on Data Science and Technology, vol. 6, no. 1, p. 10, 2020, doi: 10.11648/j.ijdst.20200601.12.

S. Shin, “A Study on the Instructional Model in Elementary School for Data Science Education using Public Data,” Journal of The Korean Association of Information Education, vol. 27, no. 1, pp. 57–69, Feb. 2023, doi: 10.14352/jkaie.2023.27.1.57.

S. García, S. Ramírez-Gallego, J. Luengo, J. M. Benítez, and F. Herrera, “Big data preprocessing: methods and prospects,” Big Data Analytics, vol. 1, no. 1, Nov. 2016, doi: 10.1186/s41044-016-0014-0.

S. Kim, Y. Jeon, and T. Kim, “A Study on Data Preprocessing Content Knowledges According to School Level for Artificial Intelligence Education,” The Journal of Korean Association of Computer Education. vol.24, no. 4, pp. 1–12, Sep. 2021.

Ministry of Education, “Informatics curriculum in 2015 revised curriculum”, 2015. [Online]. Available: http://ncic.re.kr. Accessed on: Mar. 1, 2024.

Ministry of Education, “Informatics Education Master Plan”, 2020. [Online]. Available: http://moe.go.kr. Accessed on: Mar. 1, 2024.

A. Joo and M. R. Kim, “Effects of Discussion Classes Using Data Visualization Materials on Data Literacy of Elementary School Students,” The Journal of Korean Association of Computer Education, vol. 27, no. 2, pp. 37–47, Mar. 2024, doi:10.32431/kace.2024.27.2.004.

A. Kurnia, A. Lim, and B. Cheang, “Online Judge,” Computers & Education, vol. 36, no. 4, pp. 299–315, May 2001, doi: 10.1016/s0360-1315(01)00018-5.

S. Jeong, H. Go, and Y. Lee, “Development and Application of Non-linear Search Questions Using an Online Judge System,” The Journal of Korean Association of Computer Education, vol. 27, no. 2, pp. 1–11, Mar. 2024, doi: 10.32431/kace.2024.27.2.001.

S. Kim and T. Kim “A Study on Educational Dataset Standards for K-12 Artificial Intelligence Education,” The Journal of Korean Association of Computer Education. vol. 25, no. 1, pp. 29–40, Jan. 2022.

H. Go, J. H. Jeon, and Y. Lee, “A Study on the Development of Problem Bank for Programming·Math Convergence Education in Programming Automatic Assessment System,” Journal of The Korean Association of Information Education, vol. 27, no. 2, pp. 141–152, Apr. 2023, doi: 10.14352/jkaie.2023.27.2.141.

W. Y. Chang, “The Effects of Online Judge System on Motivation and Thinking in Programming Education : Structural Relationships between Factors,” The Journal of Korean Association of Computer Education, vol. 24, no. 5, pp. 1–16, 2021.

J. Seo, “A Case Study on the Teaching and Learning Method of SW Education for Data Analysis Problem Solving,” Journal of Digital Contents Society, vol. 20, no. 10, pp. 1953–1960, Oct. 2019, doi:10.9728/dcs.2019.20.10.1953.

B. Wisniewski, K. Zierer, and J. Hattie, “The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research,” Frontiers in Psychology, vol. 10, Jan. 2020, doi:10.3389/fpsyg.2019.03087.

A. Nguyen, L. Gardner, and D. Sheridan, “Data analytics in higher education: An integrated view,” Journal of Information Systems Education, vol. 31, no. 1, pp. 61-71, 2020.

H. Go, S.-W. Kim, and Y. Lee, “Design and Implementation of a Programming Automatic Assessment System in Jupyter Notebook,” International Journal on Advanced Science, Engineering and Information Technology, vol. 13, no. 3, pp. 1080–1086, Jun. 2023, doi:10.18517/ijaseit.13.3.18457.

Pandas, "Preprocessing library in Python." [Online]. Available: https://pandas.pydata.org. [Accessed: Jul. 10, 2024].

GoHakNeung, "PAAinJN: question_example/example.ipynb," GitHub, Available: https://github.com/GoHakNeung/PAAinJN/blob/main/question_example/example.ipynb. [Accessed: Jul. 10, 2024].

S. Choi, “Designing LLM-based Code Reviewing Learning Environment for Programming Education,” The Journal of Korean Association of Computer Education, vol. 26, no. 5, pp. 1-11, Sep. 2023.

S. Kim, “Developing Code Generation Prompts for Programming Education with Generative AI,” The Journal of Korean Association of Computer Education, vol. 26, no. 5, pp. 107-117, Sep. 2023.

Y. S. Cho, “Research on Design for the Assessment System and Knowledge Tracing Methods Based on Generative AI,” The Journal of Korean Association of Computer Education, vol. 27, no. 1, pp. 143-156, Jan. 2024.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).