Skip to content

Optimizing the Visualization

Timothy Tickle edited this page Jun 28, 2016 · 19 revisions

How the Visualization is Made and How you can Affect it

Several functionality are available to customize the view on your data. Here they are described in the order that they are applied to your data with mention of the other steps in the analysis for context. You can also get a usage for all available arguments with either of the following commands.

inferCNV.R -h
inferCNV.R --help
  1. Expression data and positional data is read into the system.

  2. --transform : Input data is expected to be log2(TPM+1), if it is instead TPM use this flag to have data transformed to the correct scale on load.

  3. --cutoff : All genes that do not have an average expression of this value in log2((TPM+1)/10) space are removed. The larger the value, the less genes will be used for analysis. It is recommended a cutoff is used unless your data is already pre-filtered. Single-cell RNA-Seq data can have many genes lowly expressed, noisy genes.

  4. If positional files are given, genes are reordered to the order given with the files. Genes not in the files are removed.

  5. --max_centered_expression : After data is mean centered, any value above or below this value is set to this value or -1 * this value (depending on if the original value was positive or negative). This thresholding is used to reduce the effect of outlier within a cell.

  6. --window : A moving average is used to smooth expression of a cell along genomically ordered genes. Window defines the length of the window used for smoothing. The larger the window the less variable / smoother the data will look. Making this variable larger or smaller modulates focus on smoothing of noise (large values) with sensitivity to changes held by small groups of contiguous genes (smaller values). This value should be positive and odd.

  7. Cells are centered to reduce the effect of differences in cell complexity.

  8. Remove Background Signal. If no reference samples are given the average expression of each gene throughout the study is removed from each gene. If one group of reference samples are given, the average expression is calculated in the reference samples per gene but is removed from genes in all samples (reference or observational). If multiple reference groups are given, per gene, an average gene expression is calculated in each reference group. The minimum and maximum average expression is then calculated per gene. A gene must be more than the maximum value or less than the minimum value or the gene's expression is set to 0. If a gene is more than the maximum value, the gene's expression is adjusted by subtracting the maximum value; if a gene is less than the minimum value, the gene's expression is adjusted by the subtracting the minimum value.

More information about reference samples

  1. --tail : In the visualization, the beginning and ends of contigs are less reliable measurements due to the moving average. The beginning and end tails are clipped and not shown in the output figure or matrix. These clipped regions, by default, are of the length of (window-1)/2 where window is the value given in --window. When this value is too long for an unusually short contig, the value is 1/3 the length of the contig. Those are the default behavior, if this flag is given with a value, that value is used for the length of the ends clipped off of each contig.

  2. --noise_filter : This adds an additional amount gene expression must differ from the max / min / average gene expression. Increasing this value will set more expression to 0 and require more distinct expression from the background signal to be shown in the visualization.

  3. Current values are written to a file as output.

  4. --vis_bound_threshold : The values in the visualization are bound to a max / min of this value and -1 * value. This allows a more vivid use of the color gradient. If a small number of outliers in the expression matrix exist they will take the most extreme color in the color palette, leaving the rest of the values washed out.

  5. --color_safe : Adding this flag to a call to the program switches the colors used in the matrices of the figure to a color-safe palette.

  6. Data is plotted.

Clone this wiki locally