Vinicius Eiji's lectures
Vinicius Eiji Martins · BRACIS 2020
Active Learning embedded in incremental decision trees
As technology evolves and electronic devices become widespread, the amount of data produced in the form of stream increases in enormous proportions. Data streams are an online source of data, meaning that it keeps producing data continuously. This creates the need for fast and reliable methods to analyse and extract information from these sources. Stream mining algorithms exist for this purpose, but the use of supervised machine learning is extremely limited in the stream domain since it is unfeasible to label every data instance requested to be processed. Tackling this problem, our paper proposes the use of active learning techniques for stream mining algorithms, specifically incremental Hoeffding trees-based. It is important to mention that the active learning techniques were implemented to match the stream mining constraints regarding low computational cost. We took advantage of the incremental tree original structure to avoid overburdening the original computational cost when selecting a label. In other words, the statistical strategy to grow each incremental tree has supported the execution of active learning. Using techniques of uncertainty sampling, we were able to drastically reduce the number of labels required at the cost of a very small reduction in accuracy. Particularly with Budget Entropy there was an average negative impact of accuracy about 4% using only 14% of samples labelled.