Getting Started with TockyRandomForest Analysis

Introduction to Fluorescent Timer and the Tocky System

Fluorescent Timer proteins change their emission spectra over time, serving as powerful tools for monitoring transcriptional dynamics in vivo. Our recent efforts have successfully implemented data preprocessing methods in the TockyPrep package (Ono (2024b), Ono (2025)). Additionally, to analyze Timer fluorescence dynamics and apply quantitative and statistical analysis methods, we have developed the TockyLocus package (Ono (2024a)). However, analyzing complex Timer profiles, as typically seen in flow cytometric data from Foxp3-Tocky mice, remains challenging.

Aim

To overcome these challenges, applying machine learning methods is an attractive approach. The package suite, TockyMachineLearning, provides comprehensive methods for identifying feature cells that represent group-specific features in Timer profiles.

Specifically, the current TockyRandomForest package offers Random Forest methods developed for analyzing flow cytometric Fluorescent Timer data.

Relationship to the Packages TockyPrep and TockyLocus

The TockyPrep package is designed to facilitate data preprocessing for flow cytometric Fluorescent Timer data. Subsequently, the TockyLocus package leverages this preprocessed data to apply data categorization methods, enabling quantitative analysis of Timer Angle data (Bending et al. (2018)). However, this approach is applicable to one-dimensional data only.

The TockyRandomForest package utilizes the special object class TockyPrepData provided by the TockyPrep package to perform machine learning analysis

Install TockyRandomForest

To begin using TockyRandomForest, you need to install both the TockyRandomForest and TockyPrep packages from GitHub:

# Install TockyPrep and TockyRandomForest from GitHub
devtools::install_github("MonoTockyLab/TockyPrep")
devtools::install_github("MonoTockyLab/TockyRandomForest")

Sample Workflow

Identifying CNS2-dependent Foxp3 transcriptional dynamics

This section guides you through a typical analysis workflow using TockyRandomForest to process flow cytometric data of cells expressing Fluorescent Timer proteins. To facilitate the analysis, preprocessed data are provided, and the TockyPrep package offers methods to analyze data using its S4 object class TockyPrepData.

First, load the necessary packages.

library(TockyPrep)
library(TockyRandomForest)
library(gridExtra)

Load example TockyPrepData objects included in the TockyRandomForest package as follows:

# Example data load
# Define the base path
file_path <- system.file("extdata", package = "TockyRandomForest")
filenames <- list.files(path = file_path, pattern = 'CNS2.rda')
files <- file.path(file_path, filenames)
for(i in files){load(i)}

The dataset was generated by analyzing T-cells from Foxp3-Tocky mice (WT) and CRISPR-mediated Foxp3-Tocky mutants, specifically CNS2KO Foxp3-Tocky mice (KO). Here we aim to identify CNS2-dependent Foxp3 transcription dynamics in the Timer space in a data-oriented manner.

TockyKmeansRF Analysis for Feature Cell Identification

We will use the TockyKmeansRF function to perform TockyRandomForest learning on train_x and testing on test_y. Note that these two datasets are independent of each other.

show(train_x)

## TockyPrepData Object:
## Total cell number: 1804517 
## Variables:  file, Angle, Intensity, FSC.A, Timer.Blue, Timer.Red 
## Total sample number: 34 
## Groups:  KO, WT

show(test_y)

## TockyPrepData Object:
## Total cell number: 1974400 
## Variables:  file, Angle, Intensity, FSC.A, Timer.Blue, Timer.Red 
## Total sample number: 49 
## Groups:  KO, WT

The TockyKmeansRF function performs both a model training using a training dataset and a model testing using an independent test dataset.

result_rf <- TockyKmeansRF(train_x, test_y, num_cluster = 18)

## Train Data: 
## 
## Call:
##  randomForest(formula = group ~ ., data = cluster_train_data_wide,      ntree = ntree) 
##                Type of random forest: classification
##                      Number of trees: 100
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 0%
## Confusion matrix:
##    KO WT class.error
## KO 20  0           0
## WT  0 14           0
## Test Data: 
##     test_predictions
##      KO WT
##   KO 27  0
##   WT  1 21
## [1] "Accuracy: 0.979591836734694"

Clustering Feature Cells

plotImportanceScores(result_rf, percentile = 0.6)

Next, use the output object result from TockyRandomForestAnalysis to cluster feature cells.

result_rf = ClusteringFeatureCells(result_rf, percentile = 0.6, eps = 2, minPts = 3)

The function ClusteringFeatureCells utilises the DBScan algorithm. The parameters eps and minPts may need to be adjusted to optimise the clustering results.

Use violinPlotFeatureCells to analyse group-specific effects as captured by the TockyKmeansRF model.

p = violinPlotFeatureCells(result_rf, ncol = 2)

## Warning: Removed 122 rows containing missing values or values outside the scale range
## (`geom_violin()`).

plot(p)

Marker Expression Analysis

Lestly, analyse the marker expression of identified clusters.

p2 <- plotClusterMFI(test_y, result_rf, min_cells = 10, group = 'WT')

Final Notes

TockyRandomForest is a component of the comprehensive TockyMachineLearning package suite, designed to support advanced machine learning analyses in Tocky studies. Explore the other packages within the suite to fully leverage the potential of your datasets!

References

Bending, David, Paz Prieto Martin, Alina Paduraru, Catherine Ducker, Erik Marzaganov, Marie Laviron, Satsuki Kitano, Hitoshi Miyachi, Tessa Crompton, and Masahiro Ono. 2018. “A Timer for Analyzing Temporally Dynamic Changes in Transcription During Differentiation in Vivo.” Journal of Cell Biology 217 (8): 2931–50.

Ono, Masahiro. 2024a. “TockyLocus: Quantitative Analysis Methods for Flow Cytometric Fluorescent Timer Data.” https://arxiv.org/abs/2411.09386.

———. 2024b. “TockyPrep: Data Preprocessing Methods for Flow Cytometric Fluorescent Timer Analysis.” https://arxiv.org/abs/2411.04111.

———. 2025. “TockyPrep: Data Preprocessing Methods for Flow Cytometric Fluorescent Timer Analysis.” Journal Article. BMC Bioinformatics 26 (1): 44. https://doi.org/10.1186/s12859-025-06058-8.

Dr Masahiro Ono

2025-02-20