| Title: | Clustering-Based K-Nearest Neighbor Regression for Longitudinal Data |
|---|---|
| Description: | Implements the 'CKNNRLD' algorithm (Clustering-Based K-Nearest Neighbor Regression for Longitudinal Data) for improving K-Nearest Neighbor ('KNN') regression on longitudinal data through cluster-based partitioning and localized prediction. Offers enhanced computational efficiency and accuracy for high-volume longitudinal datasets. The acronym 'KNN' stands for K-Nearest Neighbor. References: Loeloe MS, Tabatabaei SM, Sefidkar R, Mehrparvar AH, Jambarsang S (2025). "Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach." BMC Bioinformatics, 26, 232. <doi:10.1186/s12859-025-06205-1>. |
| Authors: | Mohammad Sadegh Loeloe [aut, cre], Seyyed Mohammad Tabatabaei [aut], Reyhane Sefidkar [aut], Amir Houshang Mehrparvar [aut], Sara Jambarsang [aut, ths] |
| Maintainer: | Mohammad Sadegh Loeloe <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4 |
| Built: | 2026-06-10 06:04:01 UTC |
| Source: | https://github.com/cran/CKNNRLD |
This function determines the best number of clusters (C) for longitudinal data clustering using the elbow method (WCSS).
BestC(Y, range_clusters = 2:4, method = "kmeans")BestC(Y, range_clusters = 2:4, method = "kmeans")
Y |
A matrix or data frame of longitudinal outcomes (subjects x timepoints). |
range_clusters |
A numeric vector of cluster numbers to evaluate (e.g., 2:4). |
method |
Clustering method to use (currently only "kmeans"). |
A list with best_c, criteria, and criteria_best.
set.seed(123) n <- 20 T <- 3 y <- matrix(rnorm(n * T), nrow = n) best_c_info <- BestC(Y = y, range_clusters = 2:3) print(best_c_info$best_c)set.seed(123) n <- 20 T <- 3 y <- matrix(rnorm(n * T), nrow = n) best_c_info <- BestC(Y = y, range_clusters = 2:3) print(best_c_info$best_c)
This function implements a clustering-based KNN regression method for longitudinal data.
CKNNRLD(xnew, y, x, k = 5, c = 4, cluster_method = "kmeans")CKNNRLD(xnew, y, x, k = 5, c = 4, cluster_method = "kmeans")
xnew |
A matrix of predictor values for test data. |
y |
A matrix or data frame of longitudinal responses (subjects x timepoints). |
x |
A matrix or data frame of predictors for training data. |
k |
Number of nearest neighbors to use. |
c |
Number of clusters. |
cluster_method |
Clustering method. Currently supports "kmeans". |
A data frame with predicted values and cluster assignment.
set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) train_idx <- sample(1:n, 14) test_idx <- setdiff(1:n, train_idx) result <- CKNNRLD( x = x[train_idx, ], y = y[train_idx, ], xnew = x[test_idx, ], k = 3, c = 2 ) head(result)set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) train_idx <- sample(1:n, 14) test_idx <- setdiff(1:n, train_idx) result <- CKNNRLD( x = x[train_idx, ], y = y[train_idx, ], xnew = x[test_idx, ], k = 3, c = 2 ) head(result)
Automatically selects the best number of clusters (C) and tunes CKNNRLD.
CKNNRLD.tune( y, x, nfolds = 10, folds = NULL, seed = NULL, A = 10, C_range = 2:4, cluster_method = "kmeans" )CKNNRLD.tune( y, x, nfolds = 10, folds = NULL, seed = NULL, A = 10, C_range = 2:4, cluster_method = "kmeans" )
y |
Matrix of longitudinal outcomes. |
x |
Matrix of predictor variables. |
nfolds |
Number of folds for cross-validation. |
folds |
Optional list of pre-specified fold indices. |
seed |
Random seed for reproducibility. |
A |
Maximum number of neighbors to evaluate. |
C_range |
Range of cluster numbers to evaluate. |
cluster_method |
Clustering method to use (currently only "kmeans"). |
A list containing best_c, cluster_results, cluster_sizes, etc.
set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) tune_result <- CKNNRLD.tune( y = y, x = x, nfolds = 3, A = 4, C_range = 2:3 ) print(tune_result$best_c)set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) tune_result <- CKNNRLD.tune( y = y, x = x, nfolds = 3, A = 4, C_range = 2:3 ) print(tune_result$best_c)
This function performs KNN regression for longitudinal data without clustering. It predicts longitudinal outcomes for new observations based on the average of their k nearest neighbors in the predictor space.
KNNRLD(xnew, y, x, k = 5)KNNRLD(xnew, y, x, k = 5)
xnew |
A matrix of predictor values for prediction (test set). |
y |
A matrix or data frame of longitudinal responses (training set). |
x |
A matrix or data frame of training predictor values. |
k |
Number of nearest neighbors to use. Can be a scalar or a vector. |
A list of matrices with predicted values for each value of k. Each matrix has dimensions nrow(xnew) x ncol(y).
set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) train_idx <- sample(1:n, 14) test_idx <- setdiff(1:n, train_idx) pred <- KNNRLD( xnew = x[test_idx, ], y = y[train_idx, ], x = x[train_idx, ], k = 3 ) head(pred[[1]])set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) train_idx <- sample(1:n, 14) test_idx <- setdiff(1:n, train_idx) pred <- KNNRLD( xnew = x[test_idx, ], y = y[train_idx, ], x = x[train_idx, ], k = 3 ) head(pred[[1]])
Finds the optimal number of neighbors for KNN regression using k-fold CV.
KNNRLD.tune( y, x, nfolds = 10, folds = NULL, seed = NULL, A = 10, graph = FALSE )KNNRLD.tune( y, x, nfolds = 10, folds = NULL, seed = NULL, A = 10, graph = FALSE )
y |
Matrix of longitudinal outcomes. |
x |
Matrix of predictor variables. |
nfolds |
Number of cross-validation folds. |
folds |
Optional list of pre-specified fold indices. |
seed |
Optional random seed. |
A |
Maximum number of neighbors to evaluate. |
graph |
Logical; if TRUE, plots MSPE vs. k. |
A list containing crit, best_k, performance, and runtime.
set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) tune_result <- KNNRLD.tune( y = y, x = x, nfolds = 3, A = 4 ) str(tune_result)set.seed(123) n <- 20 T <- 3 d <- 2 x <- matrix(runif(n * d), nrow = n) y <- matrix(rnorm(n * T), nrow = n) tune_result <- KNNRLD.tune( y = y, x = x, nfolds = 3, A = 4 ) str(tune_result)