A python toolbox for phenotype analysis of arrayed microbial colonies written by Stephan Kamrad (stephan.kamrad at crick.ac.uk).
For a quick overview, please see our 10 minute video tutorial.
For more information, please see our eLife paper.
Please cite as:
Kamrad, S., RodrĂguez-LĂłpez, M., Cotobal, C., Correia-Melo, C., Ralser M., Bähler J. (2020). Pyphe, a python toolbox for assessing microbial growth and cell viability in high-throughput colony screens. eLife 9:e55160
- Most tools are cross-platform compatible but scanning will only work on a Linux OS. The scanners need to be accessible by SANE and ImageMagick needs to be installed and callable from the comman line.
- Pyphe requires Python 3 and a few common packages, available through the anaconda distribution.
- Install pyphe by running 'pip install pyphe' in your terminal.
- Open a new terminal and try and run 'pyphe-quantify -h' which should show the help page of one of pyphe's command line tools. On Windows, make sure you are using the Anaconda Prompt, not the Anaconda Powershell Prompt.
A typical fitness screen with pyphe will involve:
- Image acquisition with pyphe-scan, or pyphe-scan-timecourse
- Quantification of colony properties from images using pyphe-quantify. In the case of growth curves, parameters are additionally extracted with pyphe-growthcurves.
- Normalisation and data aggregation using pyphe-analyse.
- Statistics and hit calling using pyphe-interpret Please see our paper for a detailed protocol and explanations of the algorithms.
Please check the manuals below carefully, they are also available in the terminal by running the command with the -h option only. If things are still not working, please email me ([email protected]) and I will try and help. If you think you have discovered a bug, or would like to request a new feature, please raise an issue on www.github.com/Bahler-Lab/pyphe.
If you get an error like this, make sure you are not using the Anaconda Powershell Prompt:
python: can't open file 'C:\Users\user1\Anaconda3\Scripts"C:\Users\user1\Anaconda3\Scripts\pyphe-quantify.bat -h ': [Errno 22] Invalid argument
All pyphe tools have a similar command line interface, based on the python argparse package. Generally, parameters are set using --<parameter_name> optionally followed by a value. All pyphe tools can be used with relative file paths so make sure to navigate to the correct working directory before running a pyphe command.
This tools allows you to take consecutive scans of sets of plates, which are then automatically cropped, rotated and named in in a continuos filename scheme of your choice.
-
This tool will only run on Linux operating systems and uses the SANE library for image acquisition.
-
Make sure your scanner is installed correctly and you can acquire images using the scanimage command. The Gray mode will only work on Epson V800 and V850 scanners (potentially the V700 and V750 model as well) and the TPU8x10 transmission scanning source must be enabled. This should work by default if you are using the V800/850 model and a recent Linux OS. Otherwise, there is excellent documentation available from Zackrisson et al. and the scanomatics pipeline for how to make this work using a hacked SANE driver. Please see the instructions in their wiki.
-
Make sure ImageMagick is installed and the 'convert' tool can be called from the command line.
-
If the Pyphe toolbox has been installed correctly, you should be able to run pyphe-scan in your terminal.
-
With a laser cutter, make a fixture to hold your plates in place. We provide an svg file with the cutting shape in the Documentation directory. Use tape to hold your fixture into place, it should be pushed against the back of the scanner (where the cables are) with the top of the plates facing left. Pyphe-scan and pyphe-quantify come pre-configured for using the provided fixture on an Epson V800/V850 scanner but it is easy to add your own fixture and cropping settings. If you want to use your own fixture, see below of how to add the geometry information to pyphe-scan.
-
Open the file manager and navigate to the folder in which you want to save your images. The script will create a sub-folder that begins with the current date to save all your images.
-
Right click and select 'Open in Terminal'
-
Run scanplates with the options as detaild below.
usage: pyphe-scan [-h] [--nplates NPLATES] [--start START] [--prefix PREFIX]
[--postfix POSTFIX] [--fixture {som3_edge,som3}]
[--resolution {150,300,600,900,1200}] [--scanner {1,2,3}]
[--mode {Gray,Color}]
optional arguments:
-h, --help show this help message and exit
--nplates NPLATES Number of plates to scan. This defaults to 100 and the
script can be terminated by Ctr+C when done.
--start START Where to start numbering from. Defaults to 1.
--prefix PREFIX Name prefix for output files. The default is the
current date YYYYMMDD.
--postfix POSTFIX Name postfix for output files. Defaults to empty
string.
--fixture {som3_edge,som3,petrie}
ID of the fixture you are using.
--resolution {150,300,600,900,1200}
Resolution for scanning in dpi. Default is 600.
--scanner {1,2,3} Which scanner to use. Scanners are not uniquely
identified and may switch when turned off/unplugged.
This option does not need to be set when only one
scanner is connected.
--mode {Gray,Color} Which color mode to use for scanning. Defaults to
Gray.
All arguments except the fixture have default values and are optional. A folder prefix_postfix will be created in your current directory and the program will abort if a folder with this name already exists.
This tool acquires image timeseries by scanning in fixed time intervals. For each position in the fixture, a folder is created. Image names contain number of scan. Other options for this tool are similar to pyphe-scan. More than one scanner can be connected and used at the same time. Scanner numbers are defined by the order in which they are connected to the computer. Proceed as follows: (1) disconnect all scanners, (2) prepare the first scanner with plates, connect it and turn it on. (3) start scanning with --scanner 1 option, (4) prepare the second scanner, connect it and turn it on, (5) start scanning with --scanner 2 option. Repeat step (4) and (5), each time incrementing the --scanner argument.
usage: pyphe-scan-timecourse [-h] [--nscans NSCANS] [--interval INTERVAL]
[--prefix PREFIX] [--postfix POSTFIX]
[--fixture {som3_edge,som3}]
[--resolution {150,300,600,900,1200}]
[--scanner {1,2,3}] [--mode {Gray,Color}]
optional arguments:
-h, --help show this help message and exit
--nscans NSCANS Number of time points to scan. This defaults to 100
and the script can be terminated by Ctr+C when done.
--interval INTERVAL Time in minutes between scans. Defaults to 20.
--prefix PREFIX Name prefix for output files. The default is the
current date YYYYMMDD.
--postfix POSTFIX Name postfix for output files. Defaults to empty
string.
--fixture {som3_edge,som3,petrie}
ID of the fixture you are using.
--resolution {150,300,600,900,1200}
Resolution for scanning in dpi. Default is 600.
--scanner {1,2,3} Which scanner to use. Scanners are not uniquely
identified and may switch when turned off/unplugged.
This option does not need to be set when only one
scanner is connected.
--mode {Gray,Color} Which color mode to use for scanning. Defaults to
Gray.
This tool performs non-parametric analysis of growth curves. It was written specifically to analyse colony size timeseries data obtained with pyphe-quantify timeseries.
It is important that your csv with the growth data is in the right format. The file must contain one growth curve per column. The first column must be the timepoints and there must be a header row with unique identifiers for each curve. For example data and expected outputs, check out the files included in this Documentation folder. Sensible default parameters are set for all options but, depending on your data, you may wish to customise these, so check out the help section below.
usage: pyphe-growthcurves [-h] --input INPUT [--fitrange FITRANGE]
[--lag-method {abs,rel}]
[--lag-threshold LAG_THRESHOLD]
[--t0-fitrange T0_FITRANGE] [--plots]
[--plot-ylim PLOT_YLIM]
optional arguments:
-h, --help show this help message and exit
--input INPUT Path to the growth curve file to analyse. This file
contains one growth curve per column. The first column
must be the timepoints and there must be a header row
with unique identifiers for each curve.
--fitrange FITRANGE Number of timepoint over which to fit linear
regression. Defaults to 4. Please adjust this to the
density of your timepoints and use higher values for
more noisy data.
--lag-method {abs,rel}
Method to use for determining lag. "abs" will measure
time until the defined biomass threshold is crossed.
"rel" will fist determine the inital biomass and
measure the time until the biomass has passed this
value times the threshold. Defaults to "rel".
--lag-threshold LAG_THRESHOLD
Threshold to use for determining lag. With method
"abs", this will measure time until the defined
biomass threshold is crossed. With "rel" will fist
determine the inital biomass and measure the time
until the biomass has passed this value times the
threshold. Defaults to 2.0, so with method "rel", this
will measure the time taken for the first doubling.
--t0-fitrange T0_FITRANGE
Specify the number of timepoint to use at the
beginning of the growth curve to determine the initial
biomass by averaging them. Defaults to 3.
--plots Set this option (no argument required) to produce a
plot of all growthcurves as pdf.
--plot-ylim PLOT_YLIM
Specify the upper limit of the y-axis of growth curve
plots. Useful if you want curves to be directly
comparable. If not set, the axis of each curve is
scaled to the data.
Pyphe-growthcurves will produce a csv file with extracted growth parameters. The maximum slope is determined by fitting all possible linear regressions in sliding windows of length n and chosing the one with the highest slope. The lag phase is determined as the first timepoint which exceeds a settable relative or absolute threshold.
Parameter | Explanation |
---|---|
initial biomass | The average of the first n timepoints of the growth curve |
lag | Lag phase |
max_slope | The maximum slope of the growth curve |
r2 | The R2 parameter of the linear regression that produced the highest maximum slope |
t_max | Time at which maximum growth slope is reached (center of the sliding window) |
y-intercept | Y-intercept of the regression which produced the maximum slope |
x-intercept | X-intercept of the regression which produced the maximum slope. This is interpreted as lag phase by some people |
Pyphe quantify extracts colony parameters from images. In can operate in three distinct modes analysing colony sizes for each image individually (batch mode), analysing redness for each image individually (redness mode) or obtaining a growth curve from an image timeseries (timeseries mode). The --grid parameter is required define the position of colonies on the plate. You can either use automatic grid detection, one of our preconfigured positions if you are using the pp3 fixture or define your own (see the manual below). Images can be in any format (e.g. jpg, tiff, png). Images should be cropped closely to the colonies (this is important for good thresholding and automatic grid detection), i.e. not contain parts of the plate edges or surroundings. In batch and timecourse mode, pyphe-quantify assumes that images were acquired using transmission scanning, where colonies appear darker then the surrounding agar. If this is not the case and you took images by reflective scanning or with a camera, use --negate False. In batch and timecourse mode, images are epxected to be grayscale. If they are not, they will be converted (by simply summing all channels) and a warning will be thrown.
usage: pyphe-quantify [-h] --grid GRID [--pattern PATTERN] [--t T] [--d D]
[--s S] [--negate NEGATE] [--reportAll]
[--reportFileNames]
[--hardImageThreshold HARDIMAGETHRESHOLD]
[--hardSizeThreshold HARDSIZETHRESHOLD] [--qc QC]
[--out OUT]
{batch,timecourse,redness}
positional arguments:
{batch,timecourse,redness}
Pyphe-quantify can be run in three different modes. In
batch mode, it quantifies colony sizes for all images
mathcing the pattern individually. A separate results
table and qc image is produced for each. Redness mode
is similar except that the redness of each colony is
quantified. In timecourse mode, all images matching
the pattern are analysed jointly. The final image
matching the pattern is used to create a mask of where
the colonies are and this mask is then applied to all
previous images in the timeseries. A single output
table, where the timepoints are the rows and each
individual colony is a row.
optional arguments:
-h, --help show this help message and exit
--grid GRID This option is required (all others have defaults set)
and specifies the grid in which the colonies are
arranged. The argument has to be in the form of 6
integer numbers separated by "-": <number of colony
rows>-<number of colony columns>-<x-position of the
top left colony>-<y-position of the top left
colony>-<x-position of the bottom right
colony>-<y-position of the bottom right colony>.
Positions must be integers and are the distance in
number of pixels from the image origin in each
dimension (x is width dimension, y is height
dimension). The image origin is, in line with scikit-
image, in the top left corner.
--pattern PATTERN Pattern describing files to analyse. This follows
standard unix convention and can be used to specify
subfolders in which to look for images
(<subfolder>/*.jpg) or the image format (*.tiff,
*.png, etc.). By default, all jpg images in the
working directory are analysed.
--t T By default the intensity threshold to distinguish
colonies from the background is determined by the Otsu
method. The determined value will be multiplied by
this argument to give the final threshold. Useful for
easily fine-tuning colony detection.
--d D The distance between two grid positions will be
divided by this number to compute the maximum distance
a putative colony can be away from its reference grid
position. Decreasing this number towards 2 makes
colony-to-grid-matching more permissive (might help
when some of your plates are at a slight angle or out
of position).
--s S Detected putative colonies will be filtered by size
and small components (usually image noise) will be
excluded. The default threshold is the image
area*0.00005 and is therefore independent of scanning
resolution. This default is then multiplied by this
argument to give the final threshold. Useful for when
colonies have unusual sizes.
--negate NEGATE In images acquired by transmission scanning, the
colonies are darker than the background. Before
thresholding, the image needs to be inverted/negated.
Ignored in redness mode.
--reportAll Sometimes, two putative colonies are identified that
are within the distance threshold of a grid position.
By default, only the closest colony is reported. This
can be changed by setting this option (without
parameter). This option allows pyphe quantify to be
used even if colonies are not arrayed in a regular
grid (you still need to provide a grid parameter
though that spans the colonies you are interested i).
--reportFileNames Only for timecourse mode, otherwise ignored. Use
filenames as index for output table instead of
timepoints. Useful when the ordering of timepoints is
not the same as returned by the pattern.
--hardImageThreshold HARDIMAGETHRESHOLD
Allows a hard (fixed) intensity threshold in the range
[0,1] to be used instead of Otsu thresholding. But
images intensities are re-scaled to [0,1] before
thresholding.
--hardSizeThreshold HARDSIZETHRESHOLD
Allows a hard (fixed) size threshold [number of
pixels] to be used for filtering small colonies.
--qc QC Directory to save qc images in. Defaults to
"qc_images".
--out OUT Directory to save output files in. Defaults to
"pyphe_quant".
Pyphe-analyse is a tool for spatial normalisation and data aggregation across many plates. It implements a grid normalisation based on the concept proposed by Zackrisson et al. 2016 and row/column median normalisation. Please see our paper and the protocol in it to find out more. Pyphe-analyse can be run from the command line, with options below, or using the graphical user interface by running pyphe-analyse-gui.
usage: pyphe-analyse.txt [-h] --edt EDT --format
{gitter,pyphe-redness,pyphe-growthcurves} [--out OUT]
[--load_layouts]
[--gridnorm {standard384,standard1536}]
[--extrapolate_corners] [--rcmedian] [--check CHECK]
[--qc_plots QC_PLOTS]
Welcome to pyphe-analyse, part of the pyphe toolbox. Written by
[email protected] and maintained at https://github.com/Bahler-
Lab/pyphe
optional arguments:
-h, --help show this help message and exit
--edt EDT Path to the Experimental Design Table (EDT) listing
all plates of the experiment. The table must be in csv
format, the first column must contain unique plate IDs
and there must be a column named 'Data_path' that
contains abolute or relative file paths to each
plate's data file. A 'Layout_path' column can be
included, see below. Any additional columns included
in this file will bestored in each plate's meta-data
and included in the final data output.
--format {gitter,pyphe-redness,pyphe-growthcurves}
Type of inout data.
--out OUT Specifies the path where to save the output data
result. By default, the data report is saved in the
working directory as "pyphe-analyse_data_report.csv"
and will overwrite the file if it exists.
--load_layouts Set this option (without parameters) to load layouts
(requires Layout_path column in the EDT).
--gridnorm {standard384,standard1536}
Perform reference grid normalisation. Standard384
refers to plates which are in 384 (16x24) format with
the reference grid in 96 format in the top left
corner. Standard1536 refers to plates in 1536 format
(32x48( with two 96 reference grids in the top left
and bottom right corners.
--extrapolate_corners
If working in standard1536 format, set this option to
extrapolate the reference grid in the bottom left and
top right corner. A linear regression will be trained
across all top left and bottom right corners on plates
in the experiment to predict hypothetical grid colony
sizes in the other two corners.
--rcmedian Perform row/column median normalisation. If --gridnorm
will be performed first if both parameters are set.
--check CHECK Check colony sizes after normalisation for negative
and infinite colony sizes *(normalisation artefacts),
throw a warning and set to NA.
--qc_plots QC_PLOTS Specify a folder in which to save qc plots for each
plate.
Pyphe-interpret reports summary statistics and tests for differential fitness using t-tests. It is flexible and can in theory be used with any dataset in tidy format.
usage: pyphe-interpret [-h] --ld LD [--out OUT] --grouping_column
GROUPING_COLUMN --axis_column AXIS_COLUMN
[--values_column VALUES_COLUMN] --control CONTROL
[--ld_encoding LD_ENCODING] [--circularity CIRCULARITY]
[--set_missing_na]
Welcome to pyphe-interpret, part of the pyphe toolbox. Written by
[email protected] and maintained at https://github.com/Bahler-
Lab/pyphe. Pyphe-interpret calculates summary statistics and p-values from the
data reports generated by pyphe-analyse. For this, specifiying your column
names correctly is crucial. Let us assume you have measured many strains in
many conditions. Now you would like to know for each strain in each condition
(for each condition-strain pair) if it is "significant". There are essentially
two ways of doing this, asking different biological questions. (1) Check for
each condition separately (--grouping_column <condition_column>) if there is a
significant difference in means between a mutant strain and a control strain
(--axis_column <strain_id_column>). Or (2) Check for each strain separately
(--grouping_column <strain_id_column>) if there is a significant difference in
the means of the strain in the assay condition versus the control condition
(--axis_column <condition_column>). The second option tests for condition-
specific growth effects (i.e. is does not return significant results if a
strain is always faster or always slower growing than the grid strain). In
both cases you need to specify the control against which to test using
--control and this has to be a value that appears in the axis column. You
should define the dependent variable of the t-test using --values_column. FDR
correction with the Benjamini-Hochberg method will be applied on each level
set of the grouping_column separately, ie for case (1) p-values will be
corrected across each strain separately, ie more conditions means more
stringent correction, and for case (2) p-values will be corrected for each
condition separately, ie more strains means mpre stringent correction.
optional arguments:
-h, --help show this help message and exit
--ld LD Path to the Data Report Table produced by pyphe-
analyse.
--out OUT Specifies the path where to save the output data
result. By default, a table with all replicates will
be saved as pyphe-interpret-report_reps.csv and the
statistic table will be saved as pyphe-interpret-
report_summaryStats.csv in the current working
directory. Existing files will be overwritten.
--grouping_column GROUPING_COLUMN
Name of the column in the data report to use for
forming groups on which to perform independent sets of
t-tests.
--axis_column AXIS_COLUMN
Name of the column in the data report to repeat
t-tests along within each group. Levels in this column
will be the explanatory/independent variable used for
t-tests.
--values_column VALUES_COLUMN
Name of the column in the data report to use as
fitness values. This will be the dependent variable
for t-tests. Defaults to "Colony_size_corr_checked".
--control CONTROL Name of the control to compare against. This must be a
value found in the axis column.
--ld_encoding LD_ENCODING
Encoding of the data report table to be passed to
pandas.read_csv().
--circularity CIRCULARITY
Exclude colonies from the analysis with a circularity
below the one specified. A circularity of 1
corresponds to a perfect circle. We recommend a
threshold around 0.85.
--set_missing_na Set 0-sized colonies to NA. This is recommended if you
expect no missing colonies in your data, which means
these are probably due to pinning errors.