We have prepared more examples to help users to be able to reproduce the results in our paper. First, let’s load in the required packages:
library(Seurat)
library(reticulate)
library(scales)
library(ggplot2)
As before, the simulted datasets have been preprocessed as seurat objects, and we can download the two seurat objects to local,
source('data_download_2.R')
In this example, there are two conditions in phenotype labels. We also use the classification mode of PENCIL to identify phenotype enriched subpopulations.
load('./data/PENCIL_tutorial_3.Rdata')
dim(sc_data)
## [1] 55737 6350
The condition labels can be visualized on the UMAP from top 2000 most variable genes (MVG2000) as follows. We can see that the cell phenotype labels are distributed very randomly on the UMAP generated from MVG2000 under the standard process. It is difficult to identify phenotype associated subpopulations using general clustering algorithms or KNN graph-based methods without gene selection. And all of the MVG2000 will be input to PENCIL later to detect the phenotypic cell subpopullations and genes.
DimPlot(sc_data, group.by = "cell_phenotype_labels_simulation", reduction = 'umap-mvg2000', pt.size=0.3)
The cell labels of the simulated data were actually generated based on the expression level clustering of MVG1000-1300 (ground truth genes, GT genes). Clusters (3, 14), and 5 are used as ground truth groups (GT groups), and in each ground truth group, 90% of the cells are set to be in the same class, and the remaining 10% are randomly assigned other class labels to simulate phenotype enriched subpopulations. The other cells are randomly assigned a class label as background interference.
num_classes <- length(unique(sc_data$cell_phenotype_labels_simulation))
pal <- c('gray', hue_pal()(num_classes))
A <- DimPlot(object=sc_data, reduction='umap', label=T, pt.size=0.3)+ theme(legend.position='none')
B <- DimPlot(object=sc_data, reduction='umap', label=T, group.by="true_groups_simulation", cols=pal, pt.size=0.3)
C <- DimPlot(object=sc_data, reduction='umap', label=T, group.by="cell_phenotype_labels_simulation", cols=pal[2:length(pal)], pt.size=0.3)
A + B + C
PENCIL takes as input a matrix of expression data from MVG2000 (or more genes) and cell labels in an attempt to simultaneously localize GT genes and the cell subpopulations from which they arise.
We extract the data required by PENCIL from the seurat object.
exp_data = sc_data@assays[["RNA"]]@scale.data[VariableFeatures(sc_data),]
labels = as.factor(sc_data$cell_phenotype_labels_simulation)
class_names <- levels(labels)
labels_2_ids <- as.numeric(labels) - 1
Then, we can create a new python chunk to run pencil,
and use r.x to pass the R variables into Python. Since
bi-classification has been merged into multi-classification, here we
still call PENCIL’s multiclassification mode.
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' #select a gpu id, otherwise set to '-1'.
from pencil import *
# For recording the results.
data_name = 'PENCIL_tutorial_3'
expr_id = '0.0.1'
data = r.exp_data.T.copy()
labels = np.array(r.labels_2_ids, dtype=int)
mode = 'multi-classification'
pencil = Pencil(mode, select_genes=True, seed=1234, data_name=data_name, expr_id=expr_id)
pred, confidence = pencil.fit_transform(
data, labels,
test=True,
shuffle_rate=1/4,
lambda_L1=1e-6,
lambda_L2=1e-3,
lr=0.01,
class_weights=None,
class_names=r.class_names,
plot_show=False
)
## dataset: PENCIL_tutorial_3, expr_id: 0.0.1
## scheme: ce, Sigmoid
## searching c...
## cmin:0.000, cmax:2.000, c:1.000, rejected 0 cells.
## cmin:0.000, cmax:1.000, c:0.500, rejected 2732 cells.
## cmin:0.000, cmax:0.500, c:0.250, rejected 3007 cells.
## cmin:0.000, cmax:0.250, c:0.125, rejected 6079 cells.
## cmin:0.125, cmax:0.250, c:0.188, rejected 4475 cells.
## cmin:0.125, cmax:0.188, c:0.156, rejected 4899 cells.
## cmin:0.125, cmax:0.156, c:0.141, rejected 5285 cells.
## searched c: 0.125
## pretrain 500 epochs...
## cuda is available.
## epoch=0, loss=0.2724, mean_e=0.4199, mean_r=-0.0001, L1_reg=12.6648
## epoch=20, loss=0.2092, mean_e=0.4197, mean_r=-0.6085, L1_reg=91.6539
## epoch=40, loss=0.2052, mean_e=0.4200, mean_r=-0.9008, L1_reg=61.7682
## epoch=60, loss=0.2045, mean_e=0.4205, mean_r=-0.8943, L1_reg=54.1719
## epoch=80, loss=0.2044, mean_e=0.4216, mean_r=-0.8936, L1_reg=70.6355
## epoch=100, loss=0.2044, mean_e=0.4233, mean_r=-0.8745, L1_reg=95.9450
## epoch=120, loss=0.2043, mean_e=0.4241, mean_r=-0.8578, L1_reg=122.1529
## epoch=140, loss=0.2030, mean_e=0.4207, mean_r=-0.8378, L1_reg=144.4272
## epoch=160, loss=0.2023, mean_e=0.4201, mean_r=-0.8224, L1_reg=165.4748
## epoch=180, loss=0.2016, mean_e=0.4183, mean_r=-0.8099, L1_reg=187.8428
## epoch=200, loss=0.2003, mean_e=0.4159, mean_r=-0.7988, L1_reg=210.1351
## epoch=220, loss=0.1991, mean_e=0.4137, mean_r=-0.7892, L1_reg=234.9082
## epoch=240, loss=0.1975, mean_e=0.4103, mean_r=-0.7604, L1_reg=261.9368
## epoch=260, loss=0.1952, mean_e=0.4053, mean_r=-0.7333, L1_reg=291.2603
## epoch=280, loss=0.1925, mean_e=0.3985, mean_r=-0.7089, L1_reg=322.4242
## epoch=300, loss=0.1910, mean_e=0.3951, mean_r=-0.6915, L1_reg=352.7893
## epoch=320, loss=0.1899, mean_e=0.3928, mean_r=-0.6806, L1_reg=380.5822
## epoch=340, loss=0.1891, mean_e=0.3911, mean_r=-0.6697, L1_reg=405.9336
## epoch=360, loss=0.1884, mean_e=0.3898, mean_r=-0.6501, L1_reg=429.7545
## epoch=380, loss=0.1880, mean_e=0.3889, mean_r=-0.6512, L1_reg=451.5576
## epoch=400, loss=0.1874, mean_e=0.3880, mean_r=-0.6341, L1_reg=470.9637
## epoch=420, loss=0.1868, mean_e=0.3872, mean_r=-0.6219, L1_reg=490.0204
## epoch=440, loss=0.1863, mean_e=0.3864, mean_r=-0.6117, L1_reg=508.7124
## epoch=460, loss=0.1857, mean_e=0.3857, mean_r=-0.5972, L1_reg=526.8679
## epoch=480, loss=0.1850, mean_e=0.3849, mean_r=-0.5832, L1_reg=544.9135
## ---train time: 6.6931493282318115 seconds ---
##
## Number of examples rejected= 4936 / 6350
## num_of_rejcted
## class_2 2509
## class_1 2427
## dtype: int64
## --- without rejection ---
## precision recall f1-score support
##
## class_1 0.66 0.59 0.62 3275
## class_2 0.61 0.68 0.64 3075
##
## accuracy 0.63 6350
## macro avg 0.63 0.63 0.63 6350
## weighted avg 0.63 0.63 0.63 6350
##
## --- with rejection ---
## precision recall f1-score support
##
## class_1 1.00 1.00 1.00 848
## class_2 1.00 1.00 1.00 566
##
## accuracy 1.00 1414
## macro avg 1.00 1.00 1.00 1414
## weighted avg 1.00 1.00 1.00 1414
##
## ---test time: 0.0655519962310791 seconds ---
The results can be shown in Python directly by passing parameter
emd into pencil.fit_transform.
emd <- sc_data@reductions[["umap"]]@cell.embeddings #R
pencil.fit_transform(..., emd=r.emd, plot_show=True) #Python
But we prefer to use another way, passing the results into R via ‘py$x’, and load them into the seurat object for more flexible visualization. We present the results on the UMAP generated from GT genes to facilitate comparison with the GT group.
pred_labels <- class_names[(py$pred+1)]
pred_labels[py$confidence < 0] = 'Rejected'
pred_labels_names = c('Rejected', as.character(class_names))
pred_labels <- factor(pred_labels, levels = pred_labels_names)
confidence <- as.data.frame(py$confidence, row.names=colnames(sc_data))
sc_data <- AddMetaData(sc_data, metadata = pred_labels, col.name='pred_labels' )
sc_data <- AddMetaData(sc_data, metadata = confidence, col.name='confidence.score')
FeaturePlot(sc_data, features='confidence.score', pt.size=0.3)
DimPlot(object=sc_data, reduction='umap', label=T, group.by="pred_labels", cols=pal, pt.size=0.3)
Moreover, by visualizing the gene weights learned by PENCIL, We found that PENCIL selected only a very small number of genes in this example but some of these genes are indeed also located in the GT genes.
# in python chunck
w = pencil.gene_weights(plot=True)
plt.close()
print('number of selected genes: %d.' % np.sum(np.abs(w)>0.1))
## number of selected genes: 162.
Previously, we already provided an example of regression without gene selection. Here we add a new demo with gene selection simulation.
The features of input single-cell quantification matrix are genes in this dataset.
load('./data/PENCIL_tutorial_4.Rdata')
dim(sc_data.2)
## [1] 55737 6350
We can visualize this dataset using the UMAP coordinates generated from the top 2000 most variable genes (MVG2000) and color by the simulated cell timepoints. All of the MVG2000 will be input to PENCIL later.
DimPlot(sc_data.2, group.by = "cell_timepoints_simulation", reduction = 'umap-mvg2000', pt.size=0.3)
The simulated timepoint labels are still obtained fromthe expression
level clustering of MVG1000-1300 (ground truth genes, GT genes). The
clusters 3, 9, 15, 6, 1 are set to the ground truth groups (GT groups).
For each GT group, we assign a timepoint respectively. The other cells
are still randomly assigned a timepoint label as background noise.
A = DimPlot(sc_data.2, reduction = 'umap', pt.size=0.3, label = T) + theme(legend.position='none')
num_groups = length(unique(sc_data.2$true_groups_simulation))
pal = c(hue_pal()(num_groups-1), 'gray')
B = DimPlot(sc_data.2, group.by = "true_groups_simulation", reduction = 'umap', cols=pal, pt.size=0.3)
C = DimPlot(sc_data.2, group.by = "cell_timepoints_simulation", reduction = 'umap', pt.size=0.3)
A + B + C
We then extract the MVG2000-expression-matrix and timepoints labels,
and excute pencil in Python.
exp_data = sc_data.2@assays[["RNA"]]@scale.data[VariableFeatures(sc_data),]
labels = as.numeric(as.character(sc_data.2$cell_timepoints_simulation))
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' #select a gpu id, otherwise set to '-1'.
from pencil import *
# For recording the results.
data_name = 'PENCIL_tutorial_4'
expr_id = '0.0.1'
data = r.exp_data.T.copy()
labels = np.array(r.labels)
mode = 'regression'
pencil = Pencil(mode, select_genes=True, seed=1234, data_name=data_name, expr_id=expr_id, dropouts=[0.4, 0.0])
pred, confidence = pencil.fit_transform(
data, labels,
test=True,
shuffle_rate=0.06,
lambda_L1=1e-5,
lambda_L2=1e-3,
lr=0.1,
epochs=2000,
rej_type='Sigmoid',
class_weights=None,
plot_show=False
)
## dataset: PENCIL_tutorial_4, expr_id: 0.0.1
## scheme: sml1, Sigmoid
## searching c...
## cmin:0.000, cmax:2.000, c:1.000, rejected 4 cells.
## cmin:0.000, cmax:1.000, c:0.500, rejected 3758 cells.
## cmin:0.000, cmax:0.500, c:0.250, rejected 6068 cells.
## cmin:0.250, cmax:0.500, c:0.375, rejected 6350 cells.
## cmin:0.375, cmax:0.500, c:0.438, rejected 6239 cells.
## cmin:0.438, cmax:0.500, c:0.469, rejected 4570 cells.
## cmin:0.438, cmax:0.469, c:0.453, rejected 5779 cells.
## searched c: 0.453125
## cuda is available.
## epoch=0, loss=1.0701, mean_e=1.6800, mean_r=0.0108, L1_reg=22.4290
## epoch=20, loss=0.5689, mean_e=0.8676, mean_r=-1.0000, L1_reg=435.0388
## epoch=40, loss=0.5318, mean_e=0.7375, mean_r=-0.9997, L1_reg=218.8115
## epoch=60, loss=0.5175, mean_e=0.7054, mean_r=-0.8843, L1_reg=117.0956
## epoch=80, loss=0.5162, mean_e=0.7040, mean_r=-0.8777, L1_reg=71.6255
## epoch=100, loss=0.5170, mean_e=0.7052, mean_r=-0.8896, L1_reg=61.7723
## epoch=120, loss=0.5148, mean_e=0.7008, mean_r=-0.8828, L1_reg=67.0572
## epoch=140, loss=0.5133, mean_e=0.6980, mean_r=-0.8861, L1_reg=73.5912
## epoch=160, loss=0.5136, mean_e=0.6996, mean_r=-0.8770, L1_reg=81.7488
## epoch=180, loss=0.5117, mean_e=0.6951, mean_r=-0.8748, L1_reg=91.6401
## epoch=200, loss=0.5106, mean_e=0.6932, mean_r=-0.8594, L1_reg=101.8560
## epoch=220, loss=0.5096, mean_e=0.6926, mean_r=-0.8382, L1_reg=115.2662
## epoch=240, loss=0.5111, mean_e=0.6975, mean_r=-0.8435, L1_reg=125.9257
## epoch=260, loss=0.5100, mean_e=0.6927, mean_r=-0.8407, L1_reg=130.4958
## epoch=280, loss=0.5091, mean_e=0.6935, mean_r=-0.8184, L1_reg=134.8845
## epoch=300, loss=0.5119, mean_e=0.7010, mean_r=-0.8375, L1_reg=133.4451
## epoch=320, loss=0.5090, mean_e=0.6961, mean_r=-0.8279, L1_reg=132.6670
## epoch=340, loss=0.5067, mean_e=0.6890, mean_r=-0.8316, L1_reg=138.9634
## epoch=360, loss=0.5065, mean_e=0.6896, mean_r=-0.8208, L1_reg=150.0667
## epoch=380, loss=0.5041, mean_e=0.6831, mean_r=-0.8002, L1_reg=165.2169
## epoch=400, loss=0.5020, mean_e=0.6768, mean_r=-0.8066, L1_reg=179.9017
## epoch=420, loss=0.5038, mean_e=0.6805, mean_r=-0.7947, L1_reg=194.2641
## epoch=440, loss=0.5049, mean_e=0.6836, mean_r=-0.7970, L1_reg=196.4059
## epoch=460, loss=0.5066, mean_e=0.6864, mean_r=-0.8276, L1_reg=198.7805
## epoch=480, loss=0.5122, mean_e=0.6960, mean_r=-0.8778, L1_reg=188.8816
## epoch=500, loss=0.5145, mean_e=0.7086, mean_r=-0.8525, L1_reg=178.2662
## epoch=520, loss=0.5087, mean_e=0.6888, mean_r=-0.8559, L1_reg=167.9366
## epoch=540, loss=0.5090, mean_e=0.6886, mean_r=-0.8542, L1_reg=165.3694
## epoch=560, loss=0.5088, mean_e=0.6838, mean_r=-0.8849, L1_reg=166.4413
## epoch=580, loss=0.5040, mean_e=0.6790, mean_r=-0.8373, L1_reg=173.5159
## epoch=600, loss=0.5018, mean_e=0.6738, mean_r=-0.7803, L1_reg=194.7737
## epoch=620, loss=0.5025, mean_e=0.6742, mean_r=-0.8352, L1_reg=214.9658
## epoch=640, loss=0.5032, mean_e=0.6785, mean_r=-0.7668, L1_reg=218.9161
## epoch=660, loss=0.5034, mean_e=0.6833, mean_r=-0.7691, L1_reg=220.2961
## epoch=680, loss=0.5386, mean_e=0.7457, mean_r=-0.8257, L1_reg=211.9185
## epoch=700, loss=0.5123, mean_e=0.7031, mean_r=-0.8365, L1_reg=176.3528
## epoch=720, loss=0.5085, mean_e=0.6913, mean_r=-0.8557, L1_reg=156.0370
## epoch=740, loss=0.5040, mean_e=0.6851, mean_r=-0.7974, L1_reg=166.0924
## epoch=760, loss=0.5013, mean_e=0.6742, mean_r=-0.7561, L1_reg=192.0086
## epoch=780, loss=0.4979, mean_e=0.6697, mean_r=-0.7633, L1_reg=211.9528
## epoch=800, loss=0.4959, mean_e=0.6627, mean_r=-0.6958, L1_reg=247.8159
## epoch=820, loss=0.4939, mean_e=0.6558, mean_r=-0.7044, L1_reg=281.8339
## epoch=840, loss=0.4959, mean_e=0.6618, mean_r=-0.6953, L1_reg=287.1404
## epoch=860, loss=0.5023, mean_e=0.6849, mean_r=-0.6994, L1_reg=277.1692
## epoch=880, loss=0.5067, mean_e=0.6888, mean_r=-0.7619, L1_reg=258.4803
## epoch=900, loss=0.5158, mean_e=0.7237, mean_r=-0.7126, L1_reg=243.4805
## epoch=920, loss=0.5559, mean_e=0.8295, mean_r=-0.9953, L1_reg=142.2109
## epoch=940, loss=0.5376, mean_e=0.7621, mean_r=-0.9999, L1_reg=138.4370
## epoch=960, loss=0.5204, mean_e=0.6979, mean_r=-0.9940, L1_reg=140.0235
## epoch=980, loss=0.5098, mean_e=0.6768, mean_r=-0.9107, L1_reg=145.3437
## epoch=1000, loss=0.5052, mean_e=0.6683, mean_r=-0.8637, L1_reg=155.3289
## epoch=1020, loss=0.4992, mean_e=0.6637, mean_r=-0.7041, L1_reg=172.8886
## epoch=1040, loss=0.4957, mean_e=0.6614, mean_r=-0.6318, L1_reg=184.1977
## epoch=1060, loss=0.4988, mean_e=0.6754, mean_r=-0.5746, L1_reg=197.6059
## epoch=1080, loss=0.4961, mean_e=0.6719, mean_r=-0.6054, L1_reg=208.1211
## epoch=1100, loss=0.5031, mean_e=0.6855, mean_r=-0.6141, L1_reg=213.6350
## epoch=1120, loss=0.4942, mean_e=0.6773, mean_r=-0.6249, L1_reg=219.0787
## epoch=1140, loss=0.4973, mean_e=0.6847, mean_r=-0.6007, L1_reg=230.1927
## epoch=1160, loss=0.4865, mean_e=0.6612, mean_r=-0.5667, L1_reg=250.4260
## epoch=1180, loss=0.4827, mean_e=0.6525, mean_r=-0.5773, L1_reg=271.4144
## epoch=1200, loss=0.4811, mean_e=0.6430, mean_r=-0.5366, L1_reg=304.0663
## epoch=1220, loss=0.4821, mean_e=0.6433, mean_r=-0.5292, L1_reg=329.3247
## epoch=1240, loss=0.4885, mean_e=0.6600, mean_r=-0.5239, L1_reg=322.0790
## epoch=1260, loss=0.4849, mean_e=0.6544, mean_r=-0.5429, L1_reg=317.0919
## epoch=1280, loss=0.4958, mean_e=0.6734, mean_r=-0.5859, L1_reg=303.5660
## epoch=1300, loss=0.5046, mean_e=0.7037, mean_r=-0.6003, L1_reg=282.6642
## epoch=1320, loss=0.4953, mean_e=0.6808, mean_r=-0.6131, L1_reg=272.8597
## epoch=1340, loss=0.4884, mean_e=0.6678, mean_r=-0.5812, L1_reg=280.5821
## epoch=1360, loss=0.4835, mean_e=0.6563, mean_r=-0.5381, L1_reg=296.5276
## epoch=1380, loss=0.4828, mean_e=0.6512, mean_r=-0.5479, L1_reg=308.6104
## epoch=1400, loss=0.4744, mean_e=0.6284, mean_r=-0.4841, L1_reg=334.0766
## epoch=1420, loss=0.4788, mean_e=0.6306, mean_r=-0.5067, L1_reg=360.8175
## epoch=1440, loss=0.4837, mean_e=0.6445, mean_r=-0.5243, L1_reg=351.2635
## epoch=1460, loss=0.4898, mean_e=0.6606, mean_r=-0.5271, L1_reg=346.8650
## epoch=1480, loss=0.5020, mean_e=0.6907, mean_r=-0.5802, L1_reg=337.7504
## epoch=1500, loss=0.5185, mean_e=0.7161, mean_r=-0.7800, L1_reg=310.8184
## epoch=1520, loss=0.4962, mean_e=0.6775, mean_r=-0.6213, L1_reg=282.0571
## epoch=1540, loss=0.4878, mean_e=0.6629, mean_r=-0.5661, L1_reg=282.5089
## epoch=1560, loss=0.4849, mean_e=0.6487, mean_r=-0.5741, L1_reg=292.3632
## epoch=1580, loss=0.4817, mean_e=0.6427, mean_r=-0.4344, L1_reg=307.2570
## epoch=1600, loss=0.4808, mean_e=0.6340, mean_r=-0.5572, L1_reg=332.8128
## epoch=1620, loss=0.4833, mean_e=0.6297, mean_r=-0.6440, L1_reg=355.0899
## epoch=1640, loss=0.4812, mean_e=0.6386, mean_r=-0.5208, L1_reg=352.6810
## epoch=1660, loss=0.4840, mean_e=0.6496, mean_r=-0.5084, L1_reg=354.8397
## epoch=1680, loss=0.4993, mean_e=0.6871, mean_r=-0.5817, L1_reg=344.6962
## epoch=1700, loss=0.4969, mean_e=0.6830, mean_r=-0.5823, L1_reg=331.5757
## epoch=1720, loss=0.4933, mean_e=0.6802, mean_r=-0.5909, L1_reg=320.1424
## epoch=1740, loss=0.4847, mean_e=0.6614, mean_r=-0.5080, L1_reg=323.7273
## epoch=1760, loss=0.4816, mean_e=0.6534, mean_r=-0.5185, L1_reg=339.1985
## epoch=1780, loss=0.4750, mean_e=0.6340, mean_r=-0.4764, L1_reg=359.1059
## epoch=1800, loss=0.4695, mean_e=0.6183, mean_r=-0.4517, L1_reg=391.1357
## epoch=1820, loss=0.4720, mean_e=0.6134, mean_r=-0.4572, L1_reg=419.3006
## epoch=1840, loss=0.4833, mean_e=0.6393, mean_r=-0.4671, L1_reg=406.6838
## epoch=1860, loss=0.4826, mean_e=0.6476, mean_r=-0.4553, L1_reg=396.1197
## epoch=1880, loss=0.4932, mean_e=0.6652, mean_r=-0.5051, L1_reg=377.5341
## epoch=1900, loss=0.4982, mean_e=0.6792, mean_r=-0.5540, L1_reg=357.5332
## epoch=1920, loss=0.4969, mean_e=0.6792, mean_r=-0.6122, L1_reg=340.0559
## epoch=1940, loss=0.4822, mean_e=0.6555, mean_r=-0.4997, L1_reg=340.4936
## epoch=1960, loss=0.4809, mean_e=0.6510, mean_r=-0.5255, L1_reg=354.7948
## epoch=1980, loss=0.4754, mean_e=0.6327, mean_r=-0.5227, L1_reg=378.1031
## ---train time: 20.95675015449524 seconds ---
##
## Number of examples rejected= 4506 / 6350
## ---test time: 0.039072513580322266 seconds ---
Adding the pencil’s results into the seurat object to visualize.
pred.time <- as.vector(py$pred)
sc_data.2 <- AddMetaData(sc_data.2, metadata = pred.time, col.name='pred.time' )
sc_data.2 <- AddMetaData(sc_data.2, metadata = py$confidence, col.name='confidence.score')
FeaturePlot(sc_data.2, features = 'confidence.score', pt.size=0.3, reduction = 'umap')
FeaturePlot(sc_data.2, features = 'pred.time', cells=Cells(sc_data.2)[sc_data.2$confidence.score > 0], pt.size = 0.3, reduction = 'umap') + scale_colour_gradientn(colours=c("red","green","blue"))
And visualizing the gene weights learned by PENCIL, we can see that the selected genes are indeed also mostly located in the range of mvg1000-1300 (GT genes).
# in python chunck
w = pencil.gene_weights(plot=True)
plt.close()
Supervised learning of high-confidence phenotypic subpopulations from single-cell data (2022).
Tao Ren, Ling-Yun Wu and Zheng Xia
R packages loaded in this tutorial:
Seurat 4.0.5
reticulate
1.25
scater 1.22.0
ggplot2 3.3.5
Python packages that pencil depends on:
numpy
1.20.3
pandas 1.3.4
torch 1.10.0
seaborn 0.11.2
umap-learn 0.5.2
mlflow 1.23.1