SNIC
SUPR
SNIC SUPR
Random Rescaling of Non-Invariant Classifiers and Analysis of Missing Data
Dnr:

SNIC 2018/8-309

Type:

SNAC Small

Principal Investigator:

David Randahl

Affiliation:

Uppsala universitet

Start Date:

2018-10-08

End Date:

2019-11-01

Primary Classification:

10106: Probability Theory and Statistics

Webpage:

Allocation

Abstract

This paper introduces the Random Rescaling Method for the k nearest neighbor classifier. The Random Rescaling Method allows for an extension of the \textit{K}-nearest-neighbor method where the standardized variables are randomly rescaled \textit{B} times. The paper shows that rescaling the standardized data does not only improve prediction accuracy, but it also allows us to make inferences on how the different variables weights affect predictions. The optimal weights of the variables can therefore be used to wmake inferential statements regarding the relative importance of the different variables in predicting the outcome variable. The Random Rescaling Method can consequently be seen as a machine learning method which allows for testing, evaluating, and building theoretical arguments regarding the relationship between the predictors and the outcome variable in addition to its predictive capacity. The usefulness of the method is tested by training a model using the Random Rescaling Method kNN on country-month data for one-sided-violence events in the time period 1990-2010, and performing out-of-sample evaluation of the model for the years 2011-2015.