Skip to contents

This function applies the TockyKmeansRF model to assess the performance of clustering with a specified range of cluster numbers over repeats. It evaluates the model's performance by calculating ROC curves, along with their confidence intervals, and computes the AUC for each curve across multiple runs.

Usage

TockyRFClusterOptimization(
  trainData,
  testData,
  num_cluster_vec = 4:9,
  k = 1,
  ctrl_group = NULL,
  expr_group = NULL,
  iter.max = 10,
  nstart = 1,
  mtry = NULL,
  ntree = 100
)

Arguments

trainData

Training dataset as a TockyPrepData object.

testData

Test dataset as a TockyPrepData object.

num_cluster_vec

A numeric vector of cluster numbers to evaluate.

k

Integer, number of iterations to estimate confidence intervals.

ctrl_group

The name of the control group within `sampledef`.

expr_group

The name of the experimental group within `sampledef`.

iter.max

the maximum number of iterations allowed. To be passed to kmeans.

nstart

the number of random sets to be used in each clustering. To be passed to kmeans.

mtry

The number of variables randomly sampled as candidates at each split when building a tree within the Random Forest. To be passed to randomForest.

ntree

The number of trees to grow in the Random Forest. The default value is set to 100. To be passed to randomForest.

Value

A list containing three elements: 'roc_results' a list of ROC curve objects for each cluster number, 'auc_values' a list of AUC values with their respective confidence intervals for each cluster configuration, and 'roc_plots' a list of ggplot objects each depicting the ROC curve with confidence intervals for the range of specified clusters.

Examples

if (FALSE) { # \dontrun{
result <- TockyRFClusterOptimization(trainData, testData, num_cluster_vec = 4:9, k = 50)
} # }