-
Notifications
You must be signed in to change notification settings - Fork 1
Building Block
A building block represents a single simple task, such as : segment the first channel of those images, straighten the second channel of those images using this set of masks, etc.
To be efficiently implemented, all workflows need to be divided into single building blocks. As for now, those are the building block types available:
- segmentation : "segmentation" building block
- straightening : "straightening" building block
- morphology computation : "morphology_computation" building block
- classification : "classification" building block
- molt detection : "molt_detection" building block
- fluorescence quantification : "fluorescence_quantification" building block
More may be added in the future and others might be merged together as to facilitate the creation of more complex and personalized pipelines.
Everything is centralized in a big YAML configuration file, each configuration option is a list. By default those lists only contain 1 element (either an int, a string or another list). If you have multiple building blocks of the same type in your analysis workflow, configuration options with only one element will be the same for all the building block of this type. For example : you have two segmentation blocks, the segmentation method is set to:
segmentation_method: ["ilastik"]
Both your blocks will thus use ilastik to perform their segmentation. However, by specifying as many options as you have of a building block of a same type, the blocks will use the different options in order. In our last example, if the segmentation method was set to:
segmentation_method: ["ilastik", "edge_based"]
The first segmentation block would use ilastik and the second one would use Sobel segmentation.
If you ever want to leave an option empty, use the keyword null.
All the results of the block executions will be stored in the experiment's csv file. In it, they will be under the column "analysis/tast_output_name". For example, if you segment do straightening on the first channel of the raw images, the corresponding column would be : analysis/ch1_seg
Performs a segmentation task, you can choose which images and which channels to segment, what method to use, and to add or not some preprocessing on the images. Here are the most important configuration options:
- segmentation_method: selects the segmentation method, can be either edge_based, double_threshold, deep_learning, or ilastik.
- segmentation_channels: selects the channel(s) of the images used to perform segmentation, eg: [0] will only use the first channel, [0,1] will use the first and second channel at once (usefull for deep learning for example). You have to put the channel number into brackets, and remember, Python starts counting at 0.
- model_path: path to the saved deep learning model you want to use.
- ilastik_project_path: path to the ilastik project you want to use.
- straightening_source: The images that will be straightened. The first element of the list is the column of the csv file you want to process, the second element is the channel(s) you want to straighten.
- straightening_masks: The masks that will be used to straighten the source image.
straightening_source: [['analysis/ch1_seg', null]]
straightening_masks: ['analysis/ch1_seg']
Will straighten the masks of channel 1 using the masks of channel 1.
- morphology_computation_masks: The masks used to compute morphological features.
- morphological_features: List of morphological features to compute. Available features are 'length', 'area', 'volume', 'width_mean', 'width_median', 'width_std', 'width_cv', 'width_skew', 'width_kurtosis', 'width_max', 'width_middle'
morphology_computation_masks: ['analysis/ch1_seg']
morphological_features: [['length', 'area', 'volume']]
Computes the length, area and volume of the straightened masks of channel 1.
- classification_source: images you want classified. For now, it only works with either one channel or mask images, but this could be changed later.
- classifier: path to the XGBoost model used for classification.
classification_source: ['analysis/ch1_seg_str']
classifier: ["./models/worm_type_classifier.json"]
- molt_detection_volume: the column containing the volume computations.
- molt_detection_worm_type: the column containing the worm type classifications.
molt_detection_volume: ['ch1_seg_str_volume']
molt_detection_worm_type: ['ch1_seg_str_worm_type']
- fluorescence_quantification_source: The image column and channel (one channel only) that will be used to measure the fluorescence levels.
- fluorescence_quantification_masks: The mask column. The fluorescence will be measure only in the masked parts of each image.
- fluorescence_quantification_aggregation: How the measurement will be aggregated, can be either: sum, mean, median, min, max, or std
- fluorescence_background_aggregation: ['median']: How the background measurement is aggregated, can be either median, mean, or min.
fluorescence_quantification_source: [['raw', 0]]
fluorescence_quantification_masks: ['analysis/ch1_seg']
fluorescence_quantification_aggregation: ['sum']
fluorescence_background_aggregation: ['median']
This block lets you run any alien script as part of the pipeline, and to integrate the results in the filemap. As it's impossible to predict what the output will be, the rerun option for this script is way less smart. If rerun is false, the script won't run if the folder or file it's supposed to output already exists. If rerun is set to true, then everything will be rerun. It is important to note that if you want the output if your script to be added correctly to the filemap, it should follow the Time, Point convention. Output images should contain Time{time_nb}_Point{point_nb} in their name and output csv should have a Time and Point column so they can be concatenated.
- custom_script_path : The path to your custom script. It can either be a python '.py' or bash '.sh' file.
- custom_script_name : The name that will be used for naming the filemap column, the output file / output folder
- custom_script_return_type : "subdir" if it outputs a directory of images, "csv" if it outputs a csv (kinda obvious), or null if your script outputs nothing.
- custom_script_parameters : A list of parameters that will be added to the command when your script is called.
custom_script_path: ["/home/spsalmon/test_custom_script.py"]
custom_script_name: ["test_custom_script"]
custom_script_return_type: ["subdir"]
custom_script_parameters: [["-i testinput", "-e example"]]
By default every custom script will be fed the path to the analysis filemap through the -f argument and the output path through the -o argument, so your custom scripts should have a way to parse them. Here is a small example python script that does just that and outputs either and random csv or a random set of images. In this example, -i is a custom option (as defined above in the configuration example), it is not mandatory.
import pandas as pd
import numpy as np
import os
import argparse
from tifffile import imwrite
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input', type=str, help='input')
parser.add_argument('-f', '--filemap', type=str, help='filemap path')
parser.add_argument('-o', '--output', type=str, help='output')
return parser.parse_args()
def process_data(df):
# create a dataframe with the same Time and Point columns as df
# and random values
new_df = pd.DataFrame(columns=['Time', 'Point', 'Value'])
new_df['Time'] = df['Time']
new_df['Point'] = df['Point']
new_df['Value'] = np.random.randint(0, 100, size=len(df))
return new_df
def create_test_images(df, output_path):
# create a test image for each row in df
for i, row in df.iterrows():
img = np.random.randint(0, 255, size=(100, 100))
img_path = os.path.join(output_path, f'Time{row["Time"]}_Point{row["Point"]}.tiff')
imwrite(img_path, img, compression='zlib')
def main():
args = get_args()
print(f'input: {args.input}')
print(f'output: {args.output}')
df = pd.read_csv(args.filemap)
# test_output = process_data(df)
# test_output.to_csv(args.output, index=False)
create_test_images(df, args.output)
if __name__ == '__main__':
main()