Romero Silva · BRACIS 2020
Cloud Computing and Machine Learning for Analysis ofLarge Volumes of Educational Data
This paper describes the application of supervised and unsupervised machine learning in large volumes of open governmental data from INEP. This work uses the following algorithms: K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest and K-means. The methodology is based on the CRISP-DM and KDD processes, requiring the use of the DataBricks cloud platform. In addition, the Hadoop and Apache Spark cluster Technologies were also used. Such technologies provided high processing power for the execution of the experiments. This enabled the performance evaluation of the models and the discovery of knowledge about Brazilian basic education.