-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgenerateSCE_Tcell.Rmd
executable file
·186 lines (132 loc) · 6.56 KB
/
generateSCE_Tcell.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
title: "generateSCE_Tcell"
author: "SandraTietscher"
date: "2020-12-01"
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
---
# Introduction
This script reads in Cellprofiler output data, image-level metadata and panel information and combines these into a `SingleCellExperiment` (SCE) object.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message=FALSE)
library(plyr)
library(stringr)
library(purrr)
library(SingleCellExperiment)
library(S4Vectors)
library(dittoSeq)
library(RColorBrewer)
```
## Data import
* __cells.csv__ : contains the cell-level information of different features, the majority of which should be regarded as cell-level metadata (e.g. shape and position). The _MeanIntensity_ and __MeanIntensityCorrected__ columns contain the raw and spill-over-corrected counts per cell and per channel.
* __Image.csv__: contains image-level metadata. These include the image name, number of detected cells, and most importantly the scaling factor, with which the counts need to be multiplied to account for the image encoding (e.g. 16-bit images: scaling factor = 2^{16}-1 = 65535).
* __Panel.csv__: a file that contains the link between metal-tag and channel name.
```{r read-in-data}
input_folder <- "~/projects/BCmet/workflowr_projects/BCmet_immune_wflow/data/"
cells <- read.csv(file = paste0(input_folder, "cell.csv"), stringsAsFactors = FALSE)
image <- read.csv(file = paste0(input_folder, "Image.csv"), stringsAsFactors = FALSE)
panel <- read.csv(file = paste0(input_folder, "panel_BCmet_immune.csv"), stringsAsFactors = FALSE)
```
### Selecting cell-specific intensities
For all count-based analysis, we use the spillover corrected counts averaged across each cell after removing 'hot pixels'. To avoid negative values, we select the non-negative least-squares corrected counts.
```{r select-counts}
cur_counts <- cells[,grepl("Intensity_MeanIntensityCorrected_FullStackFiltered", colnames(cells))]
# Add rownames (combination of image number and object number)
rownames(cur_counts) <- paste0(cells$ImageNumber, "_", cells$ObjectNumber)
```
By default, CellProfiler scales all pixel intensities between 0 and 1 and stores the scaling factor in the image metadata file. To obtain raw cell-specific intensities, mean intensites have to be scaled by this value.
```{r scale-counts-1}
image$Scaling_FullStack
cur_counts <- cur_counts * image$Scaling_FullStack[1]
```
## Cell-specific metadata
Cell-level metadata can be obtained from the __cell.csv__ file and include:
* the cell's number (identifier)
* the cell's location
* the cell's shape features
```{r cell-metadata}
cell_meta <- data.frame(CellNumber = cells$ObjectNumber,
Center_X = cells$AreaShape_Center_X,
Center_Y = cells$AreaShape_Center_Y,
Area = cells$AreaShape_Area,
MajorAxisLength = cells$AreaShape_MajorAxisLength,
MinorAxisLength = cells$AreaShape_MinorAxisLength,
ImageNumber = factor(cells$ImageNumber),
row.names = paste0(cells$ImageNumber, "_", cells$ObjectNumber))
```
### Image-level metadata
Add all relevant image-level metadata to the cell-level metadata.
Patient IDs have been encoded.
```{r clean-meta}
# Read in patient metadata.
meta <- read.csv("data/immune_images_patient_metadata_encoded.csv")
```
The `cells$ImageNumber` entry refers to the individual entries of the `Image` metadata file.
The `image$FileName_FullStack` entry contains information about the acquisition name and the ROI and is equivalent to the `meta$file` entry, thus can be used to associate the central metadata file to the entries in the image metadata file.
```{r select-meta}
# Use mask names as image IDs
imageIDs <- image$FileName_FullStack
# Select and order the central metadata file
rownames(meta) <- meta$file
meta <- meta[imageIDs,]
meta$ImageNumber <- image$ImageNumber
# Add image metadata to cell-specific metadata
cell_meta <- join(cell_meta, meta, by = "ImageNumber")
# Assign unique rownames to the cell-specific metadata
rownames(cell_meta) <- paste0(cell_meta$ImageNumber, "_", cell_meta$CellNumber)
```
Make sure that the current metadata is in the same order as the original cells dataset.
```{r reorder}
all.equal(rownames(cell_meta), rownames(cur_counts))
```
Transform the cell metadata into a DataFrame in order to add it to the SingleCellExperiment as colData.
```{r}
cell_meta <- DataFrame(cell_meta)
```
## Feature-level metadata
Associate the entries in the `panel` object to the channels recorded by `CellProfiler`.
This information can be found in the `_full.csv` files located in the `tiffs` output folder.
Since all images have been processed at the same time, the ordering of channels is the same for all images.
```{r reorder-panel}
# Read in one example file to order channels
tags <- read.csv("data/20200607_ZTMA_266_immune_rows_17_22_s0_a1_ac_full.csv", header = FALSE)
# Order panel by tag number
panel <- panel[match(tags[,1], panel$Metal.Tag),]
# Add channel number
panel$channelNumber <- 1:nrow(tags)
# Use clean target as rownames
rownames(panel) <- panel$target
```
The channels are not correctly ordered in the `cells.csv` files --> reorder the mean intensity entries by channel number.
```{r reorder-channels}
# Get channel number
channelNumber <- as.numeric(sub("^.*_c", "", colnames(cur_counts)))
# Order counts based on channel number
cur_counts <- cur_counts[,order(channelNumber, decreasing = FALSE)]
# Make sure ordering works
as.numeric(sub("^.*_c", "", colnames(cur_counts)))
```
These steps now produced an object that stores the cell- and marker-specific intensities, an object that stores the cell-specific metadata and an object that stores the marker-specific metadata.
# Create the `SingleCellExperiment` object
Create the `SingleCellExperiment` object only based on the counts.
As of Bioconductor convention, cells are stored in the columns and features are stored in the rows.
Cell-level metadata are stored in the `colData()` slot, the panel information in the `rowData()` slot.
```{r create-SCE}
# Create SCE object
sce <- SingleCellExperiment(assays = list(counts = t(cur_counts)))
# Set marker name as rownames and cellID as colnames
rownames(sce) <- rownames(panel)
colnames(sce) <- rownames(cell_meta)
# Store metadata
colData(sce) <- cell_meta
rowData(sce) <- panel
```
Assign unique cell IDs to the `SingleCellExperiment` object.
```{r set-rownames}
colnames(sce) <- paste(sce$ROI_ID, sce$CellNumber, sep = "_")
```
# Save SCE object
```{r save-RDS}
saveRDS(sce, "output/SCEs/Tcell/TcellSCE_initial.rds")
```