Random Rescaling of Non-Invariant Classifiers and Analysis of Missing Data

Dnr:

SNIC 2018/8-309

Type:

SNAC Small Compute

Principal Investigator:

David Randahl

Affiliation:

Uppsala universitet

Start Date:

2018-10-08

End Date:

2019-11-01

Primary Classification:

10106: Probability Theory and Statistics

Webpage:

- Crex 1 at UPPMAX: 128 GiB
- Rackham at UPPMAX: 2 x 1000 core-h/month
- Snowy at UPPMAX: 1 x 1000 core-h/month

This paper introduces the Random Rescaling Method for the k nearest neighbor classifier. The Random Rescaling Method allows for an extension of the \textit{K}-nearest-neighbor method where the standardized variables are randomly rescaled \textit{B} times. The paper shows that rescaling the standardized data does not only improve prediction accuracy, but it also allows us to make inferences on how the different variables weights affect predictions. The optimal weights of the variables can therefore be used to wmake inferential statements regarding the relative importance of the different variables in predicting the outcome variable. The Random Rescaling Method can consequently be seen as a machine learning method which allows for testing, evaluating, and building theoretical arguments regarding the relationship between the predictors and the outcome variable in addition to its predictive capacity. The usefulness of the method is tested by training a model using the Random Rescaling Method kNN on country-month data for one-sided-violence events in the time period 1990-2010, and performing out-of-sample evaluation of the model for the years 2011-2015.