A python implementation for missing value imputation using kNN.
git clone https://github.com/bwanglzu/Imputer.py.git
cd Imputer.py
# install dependencies
pip install -r requirements.txt
# install imputer
python setup.py install
from imputer import Imputer
impute = Imputer()
Default Usage (X
should be a pandas.dataframe
/np.ndarray
, column is the name or index of the dataframe):
X_imputed = impute.knn(X=data, column='age') # default 10nn
Change Number of k:
X_imputed = impute.knn(X=data, column='age', k=3)
Default impute for numerical features, for categorical feature imputation:
X_imputed = impute.knn(X=data, column='gender', k=10, is_categorical=True)
nosetests --with-coverage
Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays[J]. Bioinformatics, 2001, 17(6): 520-525.