More PENCIL examples

We have prepared more examples to help users to be able to reproduce the results in our paper. First, let’s load in the required packages:

library(Seurat)
library(reticulate)
library(scales)
library(ggplot2)

As before, the simulted datasets have been preprocessed as seurat objects, and we can download the two seurat objects to local,

source('data_download_2.R')

Apply PENCIL’s classification mode

In this example, there are two conditions in phenotype labels. We also use the classification mode of PENCIL to identify phenotype enriched subpopulations.

Load the dataset

load('./data/PENCIL_tutorial_3.Rdata')
dim(sc_data)

## [1] 55737  6350

The condition labels can be visualized on the UMAP from top 2000 most variable genes (MVG2000) as follows. We can see that the cell phenotype labels are distributed very randomly on the UMAP generated from MVG2000 under the standard process. It is difficult to identify phenotype associated subpopulations using general clustering algorithms or KNN graph-based methods without gene selection. And all of the MVG2000 will be input to PENCIL later to detect the phenotypic cell subpopullations and genes.

DimPlot(sc_data, group.by = "cell_phenotype_labels_simulation", reduction = 'umap-mvg2000', pt.size=0.3)

The cell labels of the simulated data were actually generated based on the expression level clustering of MVG1000-1300 (ground truth genes, GT genes). Clusters (3, 14), and 5 are used as ground truth groups (GT groups), and in each ground truth group, 90% of the cells are set to be in the same class, and the remaining 10% are randomly assigned other class labels to simulate phenotype enriched subpopulations. The other cells are randomly assigned a class label as background interference.

num_classes  <- length(unique(sc_data$cell_phenotype_labels_simulation))
pal  <- c('gray', hue_pal()(num_classes))

A <- DimPlot(object=sc_data, reduction='umap', label=T, pt.size=0.3)+ theme(legend.position='none')
B <- DimPlot(object=sc_data, reduction='umap', label=T, group.by="true_groups_simulation", cols=pal, pt.size=0.3)
C <- DimPlot(object=sc_data, reduction='umap', label=T, group.by="cell_phenotype_labels_simulation", cols=pal[2:length(pal)], pt.size=0.3)
A + B + C

Excute PENCIL to identify phenotype enriched subpopulations

PENCIL takes as input a matrix of expression data from MVG2000 (or more genes) and cell labels in an attempt to simultaneously localize GT genes and the cell subpopulations from which they arise.

We extract the data required by PENCIL from the seurat object.

exp_data = sc_data@assays[["RNA"]]@scale.data[VariableFeatures(sc_data),]
labels = as.factor(sc_data$cell_phenotype_labels_simulation)
class_names <- levels(labels)
labels_2_ids <- as.numeric(labels) - 1

Then, we can create a new python chunk to run pencil, and use r.x to pass the R variables into Python. Since bi-classification has been merged into multi-classification, here we still call PENCIL’s multiclassification mode.

import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '0' #select a gpu id, otherwise set to '-1'.

from pencil import *

# For recording the results. 
data_name = 'PENCIL_tutorial_3'
expr_id = '0.0.1'

data = r.exp_data.T.copy()
labels = np.array(r.labels_2_ids, dtype=int)

mode = 'multi-classification'
pencil = Pencil(mode, select_genes=True, seed=1234, data_name=data_name, expr_id=expr_id)
pred, confidence = pencil.fit_transform(
    data, labels, 
    test=True, 
    shuffle_rate=1/4,
    lambda_L1=1e-6, 
    lambda_L2=1e-3, 
    lr=0.01,  
    class_weights=None,
    class_names=r.class_names, 
    plot_show=False
    )

## dataset: PENCIL_tutorial_3, expr_id: 0.0.1
## scheme: ce, Sigmoid
## searching c...
## cmin:0.000, cmax:2.000, c:1.000, rejected 0 cells.
## cmin:0.000, cmax:1.000, c:0.500, rejected 2732 cells.
## cmin:0.000, cmax:0.500, c:0.250, rejected 3007 cells.
## cmin:0.000, cmax:0.250, c:0.125, rejected 6079 cells.
## cmin:0.125, cmax:0.250, c:0.188, rejected 4475 cells.
## cmin:0.125, cmax:0.188, c:0.156, rejected 4899 cells.
## cmin:0.125, cmax:0.156, c:0.141, rejected 5285 cells.
## searched c: 0.125
## pretrain 500 epochs...
## cuda is available.
## epoch=0, loss=0.2724, mean_e=0.4199, mean_r=-0.0001, L1_reg=12.6648
## epoch=20, loss=0.2092, mean_e=0.4197, mean_r=-0.6085, L1_reg=91.6539
## epoch=40, loss=0.2052, mean_e=0.4200, mean_r=-0.9008, L1_reg=61.7682
## epoch=60, loss=0.2045, mean_e=0.4205, mean_r=-0.8943, L1_reg=54.1719
## epoch=80, loss=0.2044, mean_e=0.4216, mean_r=-0.8936, L1_reg=70.6355
## epoch=100, loss=0.2044, mean_e=0.4233, mean_r=-0.8745, L1_reg=95.9450
## epoch=120, loss=0.2043, mean_e=0.4241, mean_r=-0.8578, L1_reg=122.1529
## epoch=140, loss=0.2030, mean_e=0.4207, mean_r=-0.8378, L1_reg=144.4272
## epoch=160, loss=0.2023, mean_e=0.4201, mean_r=-0.8224, L1_reg=165.4748
## epoch=180, loss=0.2016, mean_e=0.4183, mean_r=-0.8099, L1_reg=187.8428
## epoch=200, loss=0.2003, mean_e=0.4159, mean_r=-0.7988, L1_reg=210.1351
## epoch=220, loss=0.1991, mean_e=0.4137, mean_r=-0.7892, L1_reg=234.9082
## epoch=240, loss=0.1975, mean_e=0.4103, mean_r=-0.7604, L1_reg=261.9368
## epoch=260, loss=0.1952, mean_e=0.4053, mean_r=-0.7333, L1_reg=291.2603
## epoch=280, loss=0.1925, mean_e=0.3985, mean_r=-0.7089, L1_reg=322.4242
## epoch=300, loss=0.1910, mean_e=0.3951, mean_r=-0.6915, L1_reg=352.7893
## epoch=320, loss=0.1899, mean_e=0.3928, mean_r=-0.6806, L1_reg=380.5822
## epoch=340, loss=0.1891, mean_e=0.3911, mean_r=-0.6697, L1_reg=405.9336
## epoch=360, loss=0.1884, mean_e=0.3898, mean_r=-0.6501, L1_reg=429.7545
## epoch=380, loss=0.1880, mean_e=0.3889, mean_r=-0.6512, L1_reg=451.5576
## epoch=400, loss=0.1874, mean_e=0.3880, mean_r=-0.6341, L1_reg=470.9637
## epoch=420, loss=0.1868, mean_e=0.3872, mean_r=-0.6219, L1_reg=490.0204
## epoch=440, loss=0.1863, mean_e=0.3864, mean_r=-0.6117, L1_reg=508.7124
## epoch=460, loss=0.1857, mean_e=0.3857, mean_r=-0.5972, L1_reg=526.8679
## epoch=480, loss=0.1850, mean_e=0.3849, mean_r=-0.5832, L1_reg=544.9135
## ---train time: 6.6931493282318115 seconds ---
## 
## Number of examples rejected= 4936 / 6350
## num_of_rejcted
## class_2           2509
## class_1           2427
## dtype: int64
## --- without rejection ---
##               precision    recall  f1-score   support
## 
##      class_1       0.66      0.59      0.62      3275
##      class_2       0.61      0.68      0.64      3075
## 
##     accuracy                           0.63      6350
##    macro avg       0.63      0.63      0.63      6350
## weighted avg       0.63      0.63      0.63      6350
## 
## --- with rejection ---
##               precision    recall  f1-score   support
## 
##      class_1       1.00      1.00      1.00       848
##      class_2       1.00      1.00      1.00       566
## 
##     accuracy                           1.00      1414
##    macro avg       1.00      1.00      1.00      1414
## weighted avg       1.00      1.00      1.00      1414
## 
## ---test time: 0.0655519962310791 seconds ---

Evaluate results

The results can be shown in Python directly by passing parameter emd into pencil.fit_transform.

emd <- sc_data@reductions[["umap"]]@cell.embeddings #R
pencil.fit_transform(..., emd=r.emd, plot_show=True) #Python

But we prefer to use another way, passing the results into R via ‘py$x’, and load them into the seurat object for more flexible visualization. We present the results on the UMAP generated from GT genes to facilitate comparison with the GT group.

pred_labels <- class_names[(py$pred+1)]
pred_labels[py$confidence < 0] = 'Rejected'
pred_labels_names = c('Rejected', as.character(class_names))
pred_labels <- factor(pred_labels, levels = pred_labels_names)
confidence <- as.data.frame(py$confidence, row.names=colnames(sc_data))

sc_data <- AddMetaData(sc_data, metadata = pred_labels, col.name='pred_labels' )
sc_data <- AddMetaData(sc_data, metadata = confidence, col.name='confidence.score')

FeaturePlot(sc_data, features='confidence.score', pt.size=0.3)

DimPlot(object=sc_data, reduction='umap', label=T, group.by="pred_labels", cols=pal, pt.size=0.3)

Moreover, by visualizing the gene weights learned by PENCIL, We found that PENCIL selected only a very small number of genes in this example but some of these genes are indeed also located in the GT genes.

# in python chunck
w = pencil.gene_weights(plot=True)

plt.close()
print('number of selected genes: %d.' % np.sum(np.abs(w)>0.1))

## number of selected genes: 162.

Apply PENCIL’s regression mode

Previously, we already provided an example of regression without gene selection. Here we add a new demo with gene selection simulation.

Load the dataset

The features of input single-cell quantification matrix are genes in this dataset.

load('./data/PENCIL_tutorial_4.Rdata')
dim(sc_data.2)

## [1] 55737  6350

We can visualize this dataset using the UMAP coordinates generated from the top 2000 most variable genes (MVG2000) and color by the simulated cell timepoints. All of the MVG2000 will be input to PENCIL later.

DimPlot(sc_data.2, group.by = "cell_timepoints_simulation", reduction = 'umap-mvg2000', pt.size=0.3)

The simulated timepoint labels are still obtained fromthe expression level clustering of MVG1000-1300 (ground truth genes, GT genes). The clusters 3, 9, 15, 6, 1 are set to the ground truth groups (GT groups). For each GT group, we assign a timepoint respectively. The other cells are still randomly assigned a timepoint label as background noise.

A = DimPlot(sc_data.2, reduction = 'umap', pt.size=0.3, label = T) + theme(legend.position='none')
num_groups = length(unique(sc_data.2$true_groups_simulation))
pal = c(hue_pal()(num_groups-1), 'gray')
B = DimPlot(sc_data.2, group.by = "true_groups_simulation", reduction = 'umap', cols=pal, pt.size=0.3)
C = DimPlot(sc_data.2, group.by = "cell_timepoints_simulation", reduction = 'umap', pt.size=0.3)
A + B + C

Excute PENCIL to detect the phenotype associated trajectory

We then extract the MVG2000-expression-matrix and timepoints labels, and excute pencil in Python.

exp_data = sc_data.2@assays[["RNA"]]@scale.data[VariableFeatures(sc_data),]
labels = as.numeric(as.character(sc_data.2$cell_timepoints_simulation))

import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '0' #select a gpu id, otherwise set to '-1'.

from pencil import *

# For recording the results. 
data_name = 'PENCIL_tutorial_4'
expr_id = '0.0.1'

data = r.exp_data.T.copy()
labels = np.array(r.labels)

mode = 'regression'
pencil = Pencil(mode, select_genes=True, seed=1234, data_name=data_name, expr_id=expr_id, dropouts=[0.4, 0.0])
pred, confidence = pencil.fit_transform(
    data, labels, 
    test=True,
    shuffle_rate=0.06,
    lambda_L1=1e-5, 
    lambda_L2=1e-3, 
    lr=0.1, 
    epochs=2000, 
    rej_type='Sigmoid',
    class_weights=None,
    plot_show=False
    )

## dataset: PENCIL_tutorial_4, expr_id: 0.0.1
## scheme: sml1, Sigmoid
## searching c...
## cmin:0.000, cmax:2.000, c:1.000, rejected 4 cells.
## cmin:0.000, cmax:1.000, c:0.500, rejected 3758 cells.
## cmin:0.000, cmax:0.500, c:0.250, rejected 6068 cells.
## cmin:0.250, cmax:0.500, c:0.375, rejected 6350 cells.
## cmin:0.375, cmax:0.500, c:0.438, rejected 6239 cells.
## cmin:0.438, cmax:0.500, c:0.469, rejected 4570 cells.
## cmin:0.438, cmax:0.469, c:0.453, rejected 5779 cells.
## searched c: 0.453125
## cuda is available.
## epoch=0, loss=1.0701, mean_e=1.6800, mean_r=0.0108, L1_reg=22.4290
## epoch=20, loss=0.5689, mean_e=0.8676, mean_r=-1.0000, L1_reg=435.0388
## epoch=40, loss=0.5318, mean_e=0.7375, mean_r=-0.9997, L1_reg=218.8115
## epoch=60, loss=0.5175, mean_e=0.7054, mean_r=-0.8843, L1_reg=117.0956
## epoch=80, loss=0.5162, mean_e=0.7040, mean_r=-0.8777, L1_reg=71.6255
## epoch=100, loss=0.5170, mean_e=0.7052, mean_r=-0.8896, L1_reg=61.7723
## epoch=120, loss=0.5148, mean_e=0.7008, mean_r=-0.8828, L1_reg=67.0572
## epoch=140, loss=0.5133, mean_e=0.6980, mean_r=-0.8861, L1_reg=73.5912
## epoch=160, loss=0.5136, mean_e=0.6996, mean_r=-0.8770, L1_reg=81.7488
## epoch=180, loss=0.5117, mean_e=0.6951, mean_r=-0.8748, L1_reg=91.6401
## epoch=200, loss=0.5106, mean_e=0.6932, mean_r=-0.8594, L1_reg=101.8560
## epoch=220, loss=0.5096, mean_e=0.6926, mean_r=-0.8382, L1_reg=115.2662
## epoch=240, loss=0.5111, mean_e=0.6975, mean_r=-0.8435, L1_reg=125.9257
## epoch=260, loss=0.5100, mean_e=0.6927, mean_r=-0.8407, L1_reg=130.4958
## epoch=280, loss=0.5091, mean_e=0.6935, mean_r=-0.8184, L1_reg=134.8845
## epoch=300, loss=0.5119, mean_e=0.7010, mean_r=-0.8375, L1_reg=133.4451
## epoch=320, loss=0.5090, mean_e=0.6961, mean_r=-0.8279, L1_reg=132.6670
## epoch=340, loss=0.5067, mean_e=0.6890, mean_r=-0.8316, L1_reg=138.9634
## epoch=360, loss=0.5065, mean_e=0.6896, mean_r=-0.8208, L1_reg=150.0667
## epoch=380, loss=0.5041, mean_e=0.6831, mean_r=-0.8002, L1_reg=165.2169
## epoch=400, loss=0.5020, mean_e=0.6768, mean_r=-0.8066, L1_reg=179.9017
## epoch=420, loss=0.5038, mean_e=0.6805, mean_r=-0.7947, L1_reg=194.2641
## epoch=440, loss=0.5049, mean_e=0.6836, mean_r=-0.7970, L1_reg=196.4059
## epoch=460, loss=0.5066, mean_e=0.6864, mean_r=-0.8276, L1_reg=198.7805
## epoch=480, loss=0.5122, mean_e=0.6960, mean_r=-0.8778, L1_reg=188.8816
## epoch=500, loss=0.5145, mean_e=0.7086, mean_r=-0.8525, L1_reg=178.2662
## epoch=520, loss=0.5087, mean_e=0.6888, mean_r=-0.8559, L1_reg=167.9366
## epoch=540, loss=0.5090, mean_e=0.6886, mean_r=-0.8542, L1_reg=165.3694
## epoch=560, loss=0.5088, mean_e=0.6838, mean_r=-0.8849, L1_reg=166.4413
## epoch=580, loss=0.5040, mean_e=0.6790, mean_r=-0.8373, L1_reg=173.5159
## epoch=600, loss=0.5018, mean_e=0.6738, mean_r=-0.7803, L1_reg=194.7737
## epoch=620, loss=0.5025, mean_e=0.6742, mean_r=-0.8352, L1_reg=214.9658
## epoch=640, loss=0.5032, mean_e=0.6785, mean_r=-0.7668, L1_reg=218.9161
## epoch=660, loss=0.5034, mean_e=0.6833, mean_r=-0.7691, L1_reg=220.2961
## epoch=680, loss=0.5386, mean_e=0.7457, mean_r=-0.8257, L1_reg=211.9185
## epoch=700, loss=0.5123, mean_e=0.7031, mean_r=-0.8365, L1_reg=176.3528
## epoch=720, loss=0.5085, mean_e=0.6913, mean_r=-0.8557, L1_reg=156.0370
## epoch=740, loss=0.5040, mean_e=0.6851, mean_r=-0.7974, L1_reg=166.0924
## epoch=760, loss=0.5013, mean_e=0.6742, mean_r=-0.7561, L1_reg=192.0086
## epoch=780, loss=0.4979, mean_e=0.6697, mean_r=-0.7633, L1_reg=211.9528
## epoch=800, loss=0.4959, mean_e=0.6627, mean_r=-0.6958, L1_reg=247.8159
## epoch=820, loss=0.4939, mean_e=0.6558, mean_r=-0.7044, L1_reg=281.8339
## epoch=840, loss=0.4959, mean_e=0.6618, mean_r=-0.6953, L1_reg=287.1404
## epoch=860, loss=0.5023, mean_e=0.6849, mean_r=-0.6994, L1_reg=277.1692
## epoch=880, loss=0.5067, mean_e=0.6888, mean_r=-0.7619, L1_reg=258.4803
## epoch=900, loss=0.5158, mean_e=0.7237, mean_r=-0.7126, L1_reg=243.4805
## epoch=920, loss=0.5559, mean_e=0.8295, mean_r=-0.9953, L1_reg=142.2109
## epoch=940, loss=0.5376, mean_e=0.7621, mean_r=-0.9999, L1_reg=138.4370
## epoch=960, loss=0.5204, mean_e=0.6979, mean_r=-0.9940, L1_reg=140.0235
## epoch=980, loss=0.5098, mean_e=0.6768, mean_r=-0.9107, L1_reg=145.3437
## epoch=1000, loss=0.5052, mean_e=0.6683, mean_r=-0.8637, L1_reg=155.3289
## epoch=1020, loss=0.4992, mean_e=0.6637, mean_r=-0.7041, L1_reg=172.8886
## epoch=1040, loss=0.4957, mean_e=0.6614, mean_r=-0.6318, L1_reg=184.1977
## epoch=1060, loss=0.4988, mean_e=0.6754, mean_r=-0.5746, L1_reg=197.6059
## epoch=1080, loss=0.4961, mean_e=0.6719, mean_r=-0.6054, L1_reg=208.1211
## epoch=1100, loss=0.5031, mean_e=0.6855, mean_r=-0.6141, L1_reg=213.6350
## epoch=1120, loss=0.4942, mean_e=0.6773, mean_r=-0.6249, L1_reg=219.0787
## epoch=1140, loss=0.4973, mean_e=0.6847, mean_r=-0.6007, L1_reg=230.1927
## epoch=1160, loss=0.4865, mean_e=0.6612, mean_r=-0.5667, L1_reg=250.4260
## epoch=1180, loss=0.4827, mean_e=0.6525, mean_r=-0.5773, L1_reg=271.4144
## epoch=1200, loss=0.4811, mean_e=0.6430, mean_r=-0.5366, L1_reg=304.0663
## epoch=1220, loss=0.4821, mean_e=0.6433, mean_r=-0.5292, L1_reg=329.3247
## epoch=1240, loss=0.4885, mean_e=0.6600, mean_r=-0.5239, L1_reg=322.0790
## epoch=1260, loss=0.4849, mean_e=0.6544, mean_r=-0.5429, L1_reg=317.0919
## epoch=1280, loss=0.4958, mean_e=0.6734, mean_r=-0.5859, L1_reg=303.5660
## epoch=1300, loss=0.5046, mean_e=0.7037, mean_r=-0.6003, L1_reg=282.6642
## epoch=1320, loss=0.4953, mean_e=0.6808, mean_r=-0.6131, L1_reg=272.8597
## epoch=1340, loss=0.4884, mean_e=0.6678, mean_r=-0.5812, L1_reg=280.5821
## epoch=1360, loss=0.4835, mean_e=0.6563, mean_r=-0.5381, L1_reg=296.5276
## epoch=1380, loss=0.4828, mean_e=0.6512, mean_r=-0.5479, L1_reg=308.6104
## epoch=1400, loss=0.4744, mean_e=0.6284, mean_r=-0.4841, L1_reg=334.0766
## epoch=1420, loss=0.4788, mean_e=0.6306, mean_r=-0.5067, L1_reg=360.8175
## epoch=1440, loss=0.4837, mean_e=0.6445, mean_r=-0.5243, L1_reg=351.2635
## epoch=1460, loss=0.4898, mean_e=0.6606, mean_r=-0.5271, L1_reg=346.8650
## epoch=1480, loss=0.5020, mean_e=0.6907, mean_r=-0.5802, L1_reg=337.7504
## epoch=1500, loss=0.5185, mean_e=0.7161, mean_r=-0.7800, L1_reg=310.8184
## epoch=1520, loss=0.4962, mean_e=0.6775, mean_r=-0.6213, L1_reg=282.0571
## epoch=1540, loss=0.4878, mean_e=0.6629, mean_r=-0.5661, L1_reg=282.5089
## epoch=1560, loss=0.4849, mean_e=0.6487, mean_r=-0.5741, L1_reg=292.3632
## epoch=1580, loss=0.4817, mean_e=0.6427, mean_r=-0.4344, L1_reg=307.2570
## epoch=1600, loss=0.4808, mean_e=0.6340, mean_r=-0.5572, L1_reg=332.8128
## epoch=1620, loss=0.4833, mean_e=0.6297, mean_r=-0.6440, L1_reg=355.0899
## epoch=1640, loss=0.4812, mean_e=0.6386, mean_r=-0.5208, L1_reg=352.6810
## epoch=1660, loss=0.4840, mean_e=0.6496, mean_r=-0.5084, L1_reg=354.8397
## epoch=1680, loss=0.4993, mean_e=0.6871, mean_r=-0.5817, L1_reg=344.6962
## epoch=1700, loss=0.4969, mean_e=0.6830, mean_r=-0.5823, L1_reg=331.5757
## epoch=1720, loss=0.4933, mean_e=0.6802, mean_r=-0.5909, L1_reg=320.1424
## epoch=1740, loss=0.4847, mean_e=0.6614, mean_r=-0.5080, L1_reg=323.7273
## epoch=1760, loss=0.4816, mean_e=0.6534, mean_r=-0.5185, L1_reg=339.1985
## epoch=1780, loss=0.4750, mean_e=0.6340, mean_r=-0.4764, L1_reg=359.1059
## epoch=1800, loss=0.4695, mean_e=0.6183, mean_r=-0.4517, L1_reg=391.1357
## epoch=1820, loss=0.4720, mean_e=0.6134, mean_r=-0.4572, L1_reg=419.3006
## epoch=1840, loss=0.4833, mean_e=0.6393, mean_r=-0.4671, L1_reg=406.6838
## epoch=1860, loss=0.4826, mean_e=0.6476, mean_r=-0.4553, L1_reg=396.1197
## epoch=1880, loss=0.4932, mean_e=0.6652, mean_r=-0.5051, L1_reg=377.5341
## epoch=1900, loss=0.4982, mean_e=0.6792, mean_r=-0.5540, L1_reg=357.5332
## epoch=1920, loss=0.4969, mean_e=0.6792, mean_r=-0.6122, L1_reg=340.0559
## epoch=1940, loss=0.4822, mean_e=0.6555, mean_r=-0.4997, L1_reg=340.4936
## epoch=1960, loss=0.4809, mean_e=0.6510, mean_r=-0.5255, L1_reg=354.7948
## epoch=1980, loss=0.4754, mean_e=0.6327, mean_r=-0.5227, L1_reg=378.1031
## ---train time: 20.95675015449524 seconds ---
## 
## Number of examples rejected= 4506 / 6350
## ---test time: 0.039072513580322266 seconds ---

Evaluate results

Adding the pencil’s results into the seurat object to visualize.

pred.time <- as.vector(py$pred)
sc_data.2 <- AddMetaData(sc_data.2, metadata = pred.time, col.name='pred.time' )
sc_data.2 <- AddMetaData(sc_data.2, metadata = py$confidence, col.name='confidence.score')
FeaturePlot(sc_data.2, features = 'confidence.score', pt.size=0.3, reduction = 'umap')

FeaturePlot(sc_data.2, features = 'pred.time', cells=Cells(sc_data.2)[sc_data.2$confidence.score > 0], pt.size = 0.3, reduction = 'umap') + scale_colour_gradientn(colours=c("red","green","blue"))

And visualizing the gene weights learned by PENCIL, we can see that the selected genes are indeed also mostly located in the range of mvg1000-1300 (GT genes).

# in python chunck
w = pencil.gene_weights(plot=True)

plt.close()

PENCIL Tutorial in R (additional)

Tao Ren

2023-01-03

More PENCIL examples

Apply PENCIL’s classification mode

Load the dataset

Excute PENCIL to identify phenotype enriched subpopulations

Evaluate results

Apply PENCIL’s regression mode

Load the dataset

Excute PENCIL to detect the phenotype associated trajectory

Evaluate results

Reference

Package versions