Development of a Python Library to Generate Synthetic Datasets for Artificial Intelligence Education
How to cite (IJASEIT) :
J. McCarthy, “What is artificial intelligence,” 2007, Accessed: Feb. 13, 2024. [Online]. Available: http://cse.unl.edu/~choueiry/S09-476-876/Documents/whatisai.pdf
L. Li, “A comparative study on artificial intelligence curricula,” PhD Thesis, Western Ontario Univ., Canada, 2020. Accessed: Feb. 13, 2024. [Online]. Available: https://search.proquest.com/openview/b13c4d3058c533a76cba308137d8faa6/1?pq-origsite=gscholar&cbl=18750&diss=y
S. Druga, S. T. Vu, E. Likhith, and T. Qiu, “Inclusive AI literacy for kids around the world,” in Proceedings of FabLearn 2019, in FL2019. New York, NY, USA: Association for Computing Machinery, Mar. 2019, pp. 104–111. doi: 10.1145/3311890.3311904.
S. G. Han, “Digital Content to Improve Artificial Intelligence Literacy Ability,” Journal of the Korea Society of Computer and Information, vol. 25, no. 12, pp. 93–100, Dec. 2020, doi: 10.9708/JKSCI.2020.25.12.093.
D. Long and B. Magerko, “What is AI Literacy? Competencies and Design Considerations,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu HI USA: ACM, Apr. 2020, pp. 1–16. doi: 10.1145/3313831.3376727.
D. T. K. Ng, J. K. L. Leung, S. K. W. Chu, and M. S. Qiao, “Conceptualizing AI literacy: An exploratory review,” Computers and Education: Artificial Intelligence, vol. 2, p. 100041, 2021, doi: 10.1016/j.caeai.2021.100041.
W. Yang, “Artificial Intelligence education for young children: Why, what, and how in curriculum design and implementation,” Computers and Education: Artificial Intelligence, vol. 3, p. 100061, 2022, doi: 10.1016/j.caeai.2022.100061.
I. T. Sanusi, S. S. Oyelere, H. Vartiainen, J. Suhonen, and M. Tukiainen, “A systematic review of teaching and learning machine learning in K-12 education,” Educ Inf Technol, vol. 28, no. 5, pp. 5967–5997, May 2023, doi: 10.1007/s10639-022-11416-7.
P. Langley, “An Integrative Framework for Artificial Intelligence Education,” AAAI, vol. 33, no. 01, pp. 9670–9677, Jul. 2019, doi: 10.1609/aaai.v33i01.33019670.
R. M. Martins and C. Gresse Von Wangenheim, “Findings on Teaching Machine Learning in High School: A Ten - Year Systematic Literature Review,” Informatics in Education, Sep. 2022, doi: 10.15388/infedu.2023.18.
W. Chow, “A Pedagogy that Uses a Kaggle Competition for Teaching Machine Learning: an Experience Sharing,” in 2019 IEEE International Conference on Engineering, Technology and Education (TALE), Yogyakarta, Indonesia: IEEE, Dec. 2019, pp. 1–5. doi: 10.1109/TALE48000.2019.9226005.
M. Tedre, T. Toivonen, J. Kahila, H. Vartiainen, T. Valtonen, I. Jormanainen, and A. Pears, “Teaching Machine Learning in K–12 Classroom: Pedagogical and Technological Trajectories for Artificial Intelligence Education,” IEEE Access, vol. 9, pp. 110558–110572, 2021, doi: 10.1109/ACCESS.2021.3097962.
B. Hutchinson, A. Smart, A. Hanna, E. Denton, C. Greer, O. Kjartansson, P. Barnes and M. Mitchell, “Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event Canada: ACM, Mar. 2021, pp. 560–575. doi: 10.1145/3442188.3445918.
I. Evangelista, G. Blesio, and E. Benatti, “Why Are We Not Teaching Machine Learning at High School? A Proposal,” in 2018 World Engineering Education Forum - Global Engineering Deans Council (WEEF-GEDC), Albuquerque, NM, USA: IEEE, Nov. 2018, pp. 1–6. doi: 10.1109/WEEF-GEDC.2018.8629750.
R. Biehler and Y. Fleischer, “Introducing students to machine learning with decision trees using CODAP and Jupyter Notebooks,” Teaching Statistics, vol. 43, no. S1, Jul. 2021, doi: 10.1111/test.12279.
H. Vartiainen, T. Toivonen, I. Jormanainen, J. Kahila, M. Tedre, and T. Valtonen, “Machine learning for middle schoolers: Learning through data-driven design,” International Journal of Child-Computer Interaction, vol. 29, p. 100281, Sep. 2021, doi: 10.1016/j.ijcci.2021.100281.
S. Hooper and L. P. Rieber, “Teaching with technology,” Teaching: Theory into practice, vol. 2013, pp. 154–170, 1995.
T. K. F. Chiu and C. Chai, “Sustainable Curriculum Planning for Artificial Intelligence Education: A Self-Determination Theory Perspective,” Sustainability, vol. 12, no. 14, p. 5568, Jul. 2020, doi: 10.3390/su12145568.
I. Bosnić, I. Čavrak, and A. Zuiderwijk, “Introducing Open Data Concepts to STEM Students Using Real-World Open Datasets,” in 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Sep. 2021, pp. 1530–1535. doi: 10.23919/MIPRO52101.2021.9596998.
S. Kim, K. Kim, and T. Kim, “Development of PISA Mathematical Context-oriented Dataset for K-12 Artificial Intelligence Education,” Journal of The Korean Association of Information Education, vol. 27, no. 3, pp. 255–267, Jun. 2023, doi: 10.14352/jkaie.2023.27.3.255.
T. Coughlan, “The use of open data as a material for learning,” Education Tech Research Dev, vol. 68, no. 1, pp. 383–411, Feb. 2020, doi: 10.1007/s11423-019-09706-y.
K. El Emam, L. Mosquera, and R. Hoptroff, Practical synthetic data generation: balancing privacy and the broad availability of data. O’Reilly Media, 2020. Accessed: Mar. 13, 2024. [Online]. Available: https://books.google.co.kr/books?hl=ko&lr=&id=XWnnDwAAQBAJ&oi=fnd&pg=PP1&dq=emam+mosquera&ots=FouI9cNjuo&sig=Bq-XWVp-mzxsXh24EpJ7tY2MSfo
S. Kim, T. Kim, and Y. Jeon, “Research on the Development and Utility Analysis of K-12 Artificial Intelligence Educational Datasets Using Synthetic Datasets Generation Method,” The Journal of Korean Association of Computer Education, vol. 25, no. 3, pp. 9–21, May 2022, doi: 10.32431/KACE.2022.25.3.002.
A. Rossett, Training needs assessment. Educational Technology, 1987. Accessed: Mar. 13, 2024. [Online]. Available: https://books.google.co.kr/books?hl=ko&lr=&id=IWBppwNMC-QC&oi=fnd&pg=PR7&dq=training+needs+assessment&ots=PazVFE8lP1&sig=3TfbexATFVfucdSu-1POqvV57Hs
T. E. Raghunathan, J. M. Lepkowski, J. Van Hoewyk, and P. Solenberger, “A multivariate technique for multiply imputing missing values using a sequence of regression models,” Survey methodology, vol. 27, no. 1, pp. 85–96, 2001.
J. Kim and M. Park, “Multiple imputation and synthetic data,” The Korean Journal of Applied Statistics, vol. 32, no. 1, pp. 83–97, 2019.
J. P. Reiter, “Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study,” Journal of the Royal Statistical Society Series A: Statistics in Society, vol. 168, no. 1, pp. 185–205, 2005.
J. Lee, “Review on Statistical Methods for Synthetic Data,” M. S. thesis, Dept Statics, UOS, Seoul Univ, Seoul, Korea, 2021.
B. Nowok, G. M. Raab, and C. Dibben, “synthpop: Bespoke Creation of Synthetic Data in R,” J. Stat. Soft., vol. 74, no. 11, 2016, doi: 10.18637/jss.v074.i11.
S. Yoo and N. Park, “Synthetic Data Generation for Individual Credit Data Using CART,” Journal of the Korean Official Statistics, vol. 25, no. 1, pp. 1–30, 2020.
Hazy Limeted, “hazy/synthpop.”, Dec. 16, 2019. Accessed: Dec. 08, 2022. [Online]. Available: https://github.com/hazy/synthpop
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Michel, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825–2830, 2011.
M. Carlisle, “racist data destruction?,” Medium. Accessed: Jan. 14, 2024. [Online]. Available: https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8
C. H. Lawshe, “A Quantitative Approach to Content Validity,” Personnel Psychology, vol. 28, no. 4, pp. 563–575, Dec. 1975, doi: 10.1111/j.1744-6570.1975.tb01393.x.
M. Bergdahl, M. Ehling, E. Elvers, E. Földesi, T. Körner, A. Kron, P. Lohauß, K. Mag, V. Morais, A. Nimmergut, H. Viggo, K, Szép, U. Timm, and M. J. Zilhão “Handbook on Data Quality Assessment Methods and Tools.” Ehling, Manfred Körner, Thomas, 2007.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).