code/generateSCE_Tcell.Rmd

---
title: "generateSCE_Tcell"
author: "SandraTietscher"
date: "2020-12-01"
output: workflowr::wflow_html
editor_options:
  chunk_output_type: console
---

# Introduction

This script reads in Cellprofiler output data, image-level metadata and panel information and combines these into a `SingleCellExperiment` (SCE) object.

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message=FALSE)
library(plyr)
library(stringr)
library(purrr)
library(SingleCellExperiment)
library(S4Vectors)
library(dittoSeq)
library(RColorBrewer)
```

## Data import

* __cells.csv__ : contains the cell-level information of different features, the majority of which should be regarded as cell-level metadata (e.g. shape and position). The _MeanIntensity_ and __MeanIntensityCorrected__ columns contain the raw and spill-over-corrected counts per cell and per channel.
* __Image.csv__: contains image-level metadata. These include the image name, number of detected cells, and most importantly the scaling factor, with which the counts need to be multiplied to account for the image encoding (e.g. 16-bit images: scaling factor = 2^{16}-1 = 65535).
* __Panel.csv__: a file that contains the link between metal-tag and channel name. 


```{r read-in-data}
input_folder <- "~/projects/BCmet/workflowr_projects/BCmet_immune_wflow/data/"

cells <- read.csv(file = paste0(input_folder, "cell.csv"), stringsAsFactors = FALSE)
image <- read.csv(file = paste0(input_folder, "Image.csv"), stringsAsFactors = FALSE)
panel <- read.csv(file = paste0(input_folder, "panel_BCmet_immune.csv"), stringsAsFactors = FALSE)
```

### Selecting cell-specific intensities

For all count-based analysis, we use the spillover corrected counts averaged across each cell after removing 'hot pixels'. To avoid negative values, we select the non-negative least-squares corrected counts.

```{r select-counts}
cur_counts <- cells[,grepl("Intensity_MeanIntensityCorrected_FullStackFiltered", colnames(cells))]

# Add rownames (combination of image number and object number)
rownames(cur_counts) <- paste0(cells$ImageNumber, "_", cells$ObjectNumber)
```

By default, CellProfiler scales all pixel intensities between 0 and 1 and stores the scaling factor in the image metadata file. To obtain raw cell-specific intensities, mean intensites have to be scaled by this value. 

```{r scale-counts-1}
image$Scaling_FullStack
cur_counts <- cur_counts * image$Scaling_FullStack[1]
```

## Cell-specific metadata

Cell-level metadata can be obtained from the __cell.csv__ file and include:

* the cell's number (identifier)
* the cell's location
* the cell's shape features

```{r cell-metadata}
cell_meta <- data.frame(CellNumber = cells$ObjectNumber,
                       Center_X = cells$AreaShape_Center_X,
                       Center_Y = cells$AreaShape_Center_Y,
                       Area = cells$AreaShape_Area,
                       MajorAxisLength = cells$AreaShape_MajorAxisLength,
                       MinorAxisLength = cells$AreaShape_MinorAxisLength, 
                       ImageNumber = factor(cells$ImageNumber),
                       row.names = paste0(cells$ImageNumber, "_", cells$ObjectNumber))
```

### Image-level metadata

Add all relevant image-level metadata to the cell-level metadata.

Patient IDs have been encoded.
```{r clean-meta}
# Read in patient metadata. 
meta <- read.csv("data/immune_images_patient_metadata_encoded.csv")
```

The `cells$ImageNumber` entry refers to the individual entries of the `Image` metadata file.

The `image$FileName_FullStack` entry contains information about the acquisition name and the ROI and is equivalent to the `meta$file` entry, thus can be used to associate the central metadata file to the entries in the image metadata file. 

```{r select-meta}
# Use mask names as image IDs
imageIDs <- image$FileName_FullStack

# Select and order the central metadata file
rownames(meta) <- meta$file
meta <- meta[imageIDs,]
meta$ImageNumber <- image$ImageNumber

# Add image metadata to cell-specific metadata
cell_meta <- join(cell_meta, meta, by = "ImageNumber")

# Assign unique rownames to the cell-specific metadata
rownames(cell_meta) <- paste0(cell_meta$ImageNumber, "_", cell_meta$CellNumber)
```

Make sure that the current metadata is in the same order as the original cells dataset.

```{r reorder}
all.equal(rownames(cell_meta), rownames(cur_counts))
```

Transform the cell metadata into a DataFrame in order to add it to the SingleCellExperiment as colData.

```{r}
cell_meta <- DataFrame(cell_meta)
```

## Feature-level metadata

Associate the entries in the `panel` object to the channels recorded by `CellProfiler`.
This information can be found in the `_full.csv` files located in the `tiffs` output folder.
Since all images have been processed at the same time, the ordering of channels is the same for all images.

```{r reorder-panel}
# Read in one example file to order channels
tags <- read.csv("data/20200607_ZTMA_266_immune_rows_17_22_s0_a1_ac_full.csv", header = FALSE)

# Order panel by tag number
panel <- panel[match(tags[,1], panel$Metal.Tag),]

# Add channel number
panel$channelNumber <- 1:nrow(tags)

# Use clean target as rownames
rownames(panel) <- panel$target
```

The channels are not correctly ordered in the `cells.csv` files --> reorder the mean intensity entries by channel number.

```{r reorder-channels}
# Get channel number
channelNumber <- as.numeric(sub("^.*_c", "", colnames(cur_counts)))

# Order counts based on channel number
cur_counts <- cur_counts[,order(channelNumber, decreasing = FALSE)] 

# Make sure ordering works
as.numeric(sub("^.*_c", "", colnames(cur_counts)))
```

These steps now produced an object that stores the cell- and marker-specific intensities, an object that stores the cell-specific metadata and an object that stores the marker-specific metadata.


# Create the `SingleCellExperiment` object

Create the `SingleCellExperiment` object only based on the counts.
As of Bioconductor convention, cells are stored in the columns and features are stored in the rows.
Cell-level metadata are stored in the `colData()` slot, the panel information in the `rowData()` slot.

```{r create-SCE}
# Create SCE object
sce <- SingleCellExperiment(assays = list(counts = t(cur_counts)))

# Set marker name as rownames and cellID as colnames
rownames(sce) <- rownames(panel)
colnames(sce) <- rownames(cell_meta)

# Store metadata
colData(sce) <- cell_meta
rowData(sce) <- panel
```

Assign unique cell IDs to the `SingleCellExperiment` object.

```{r set-rownames}
colnames(sce) <- paste(sce$ROI_ID, sce$CellNumber, sep = "_")
```

# Save SCE object

```{r save-RDS}
saveRDS(sce, "output/SCEs/Tcell/TcellSCE_initial.rds")
```