From 30807fea8ff0156feb5e94e540cde496931cf612 Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 23 Aug 2024 15:03:38 +0200 Subject: [PATCH 01/10] Rewrite example1 as hello world --- src/pages/example1.md | 86 ++++++++++++++++++++++++------------------- 1 file changed, 48 insertions(+), 38 deletions(-) diff --git a/src/pages/example1.md b/src/pages/example1.md index 7c68420e..d48ecfe1 100644 --- a/src/pages/example1.md +++ b/src/pages/example1.md @@ -7,96 +7,106 @@ layout: "@layouts/MarkdownPage.astro"

Basic pipeline

- This example shows how to write a pipeline with two simple Bash processes, so that the results produced by the first process are consumed by the second process. + This example shows a simple Nextflow pipeline consisting of two Bash processes, where the output from the first process is used as input for the second process.

```groovy #!/usr/bin/env nextflow -params.in = "$baseDir/data/sample.fa" +params.greeting = "Hello World!" /* - * Split a fasta file into multiple files + * Redirect a string to a text file */ -process splitSequences { +process sayHello { input: - path 'input.fa' + val x output: - path 'seq_*' + path 'output.txt' """ - awk '/^>/{f="seq_"++d} {print > f}' < input.fa + echo '$x' > output.txt """ } /* - * Reverse the sequences + * Convert lowercase letters to uppercase letters */ -process reverse { +process convertToUpper { input: - path x + path y output: stdout """ - cat $x | rev + cat $y | tr '[a-z]' '[A-Z]' """ } /* - * Define the workflow + * Workflow definition */ workflow { - splitSequences(params.in) \ - | reverse \ - | view + sayHello(params.greeting) + | convertToUpper + | view } ``` -### Synopsis +### Try it -- **Line 1** The script starts with a shebang declaration. This allows you to launch your pipeline just like any other Bash script. +To try this pipeline: -- **Line 3**: Declares a pipeline parameter named `params.in` that is initialized with the value `$HOME/sample.fa`. This value can be overridden when launching the pipeline, by simply adding the option `--in ` to the script command line. +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +2. Copy the script above and save it as `hello-world.nf`. +3. Launch the pipeline: -- **Lines 8-19**: The process that splits the provided file. + nextflow run hell-world.nf - - **Line 10**: Opens the input declaration block. The lines following this clause are interpreted as input definitions. +4. Launch the pipeline again with a custom greeting: - - **Line 11**: Declares the process input file, which will be named `input.fa` in the process script. + nextflow run hello-world.nf --greeting "Bonjour le monde!" - - **Line 13**: Opens the output declaration block. The lines following this clause are interpreted as output declarations. +### Script synopsis - - **Line 14**: Files whose names match the pattern `seq_*` are declared as the output of this process. +- **Line 1** Declares Nextflow as the interpreter. - - **Lines 16-18**: The actual script executed by the process to split the input file. +- **Line 3**: Declares a pipeline parameter named `greeting` that is initialized with the value `"Hello World!"`. -- **Lines 24-35**: The second process, which receives the splits produced by the - previous process and reverses their content. +- **Lines 8-19**: Declares a process named `sayHello` that redirects a string to a text file. - - **Line 26**: Opens the input declaration block. Lines following this clause are - interpreted as input declarations. + - **Line 10**: Opens the input declaration block. - - **Line 27**: Defines the process input file. + - **Line 11**: Defines the process input `x`. - - **Line 29**: Opens the output declaration block. Lines following this clause are - interpreted as output declarations. + - **Line 13**: Opens the output declaration block. - - **Line 30**: The standard output of the executed script is declared as the process - output. + - **Line 14**: Defines the process output `'output.txt'`. - - **Lines 32-34**: The actual script executed by the process to reverse the content of the input files. + - **Lines 16-18**: Defines a script that redirects the string `x` to a text file named `output.txt`. -- **Lines 40-44**: The workflow that connects everything together! +- **Lines 24-35**: Declares a process named `convertToUpper` that concatenates a file and transforms all of the lowercase letters to uppercase letters. - - **Line 41**: First, the input file specified by `params.in` is passed to the `splitSequences` process. + - **Line 26**: Opens the input declaration block. - - **Line 42**: The outputs of `splitSequences` are passed as inputs to the `reverse` process, which processes each split file in parallel. + - **Line 27**: Defines the process input `y`. - - **Line 43**: Finally, each output emitted by `reverse` is printed. + - **Line 29**: Opens the output declaration block. + + - **Line 30**: Defines standard output (`stdout`) as the output. + + - **Lines 32-34**: Defines a script that concatenates the variable `y` and transforms all of the lowercase letters to uppercase letters. + +- **Lines 40-44**: Declares the workflow that connects everything together! + + - **Line 41**: Passes the string specified by `params.greeting` to the `sayHello` process. + + - **Line 42**: Passes of the output from `sayHello` to the `convertToUpper` process. + + - **Line 43**: Prints the standard output from `convertToUpper`. From 3dc475dc9c75db26f14b868c6b44984b3a3e787a Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 08:16:13 +0200 Subject: [PATCH 02/10] Update menu and first two examples --- src/pages/example1.md | 69 ++++++++++------------------ src/pages/example2.md | 40 ++++++++++++++--- src/pages/example3.md | 102 +++++++++++++++++++----------------------- src/pages/example4.md | 96 +++++++++++++-------------------------- src/pages/example5.md | 69 ---------------------------- 5 files changed, 134 insertions(+), 242 deletions(-) delete mode 100644 src/pages/example5.md diff --git a/src/pages/example1.md b/src/pages/example1.md index d48ecfe1..5861fc6d 100644 --- a/src/pages/example1.md +++ b/src/pages/example1.md @@ -7,12 +7,17 @@ layout: "@layouts/MarkdownPage.astro"

Basic pipeline

- This example shows a simple Nextflow pipeline consisting of two Bash processes, where the output from the first process is used as input for the second process. + Nextflow pipelines are made by joining together different processes.

```groovy #!/usr/bin/env nextflow +/* + * Pipeline parameters + */ + +// Primary input params.greeting = "Hello World!" /* @@ -26,6 +31,7 @@ process sayHello { output: path 'output.txt' + script: """ echo '$x' > output.txt """ @@ -42,6 +48,7 @@ process convertToUpper { output: stdout + script: """ cat $y | tr '[a-z]' '[A-Z]' """ @@ -51,62 +58,32 @@ process convertToUpper { * Workflow definition */ workflow { + + // Redirects a string to a text file sayHello(params.greeting) - | convertToUpper - | view + + // Concatenates a text file and transforms lowercase letters to uppercase letters + convertToUpper(sayHello.out) + + // View convertToUpper output + convertToUpper.out.view() } ``` +### Script synopsis + +This example shows a simple Nextflow pipeline consisting of two Bash processes. The `sayHello` process takes a string as input and redirects it to an output text file. The `convertToUpper` process takes the output text file from `sayHello` as input, concatenates the text, and converts all of the lowercase letters to uppercase letters. The output from the `convertToUpper` process is then printed to screen. + ### Try it To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. -2. Copy the script above and save it as `hello-world.nf`. +2. Copy the script above and save as `hello-world.nf`. 3. Launch the pipeline: - nextflow run hell-world.nf - -4. Launch the pipeline again with a custom greeting: - - nextflow run hello-world.nf --greeting "Bonjour le monde!" - -### Script synopsis - -- **Line 1** Declares Nextflow as the interpreter. - -- **Line 3**: Declares a pipeline parameter named `greeting` that is initialized with the value `"Hello World!"`. - -- **Lines 8-19**: Declares a process named `sayHello` that redirects a string to a text file. - - - **Line 10**: Opens the input declaration block. - - - **Line 11**: Defines the process input `x`. - - - **Line 13**: Opens the output declaration block. - - - **Line 14**: Defines the process output `'output.txt'`. - - - **Lines 16-18**: Defines a script that redirects the string `x` to a text file named `output.txt`. - -- **Lines 24-35**: Declares a process named `convertToUpper` that concatenates a file and transforms all of the lowercase letters to uppercase letters. - - - **Line 26**: Opens the input declaration block. - - - **Line 27**: Defines the process input `y`. - - - **Line 29**: Opens the output declaration block. - - - **Line 30**: Defines standard output (`stdout`) as the output. - - - **Lines 32-34**: Defines a script that concatenates the variable `y` and transforms all of the lowercase letters to uppercase letters. - -- **Lines 40-44**: Declares the workflow that connects everything together! - - - **Line 41**: Passes the string specified by `params.greeting` to the `sayHello` process. - - - **Line 42**: Passes of the output from `sayHello` to the `convertToUpper` process. + nextflow run hello-world.nf - - **Line 43**: Prints the standard output from `convertToUpper`. +**NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. diff --git a/src/pages/example2.md b/src/pages/example2.md index 4df392d3..605402ec 100644 --- a/src/pages/example2.md +++ b/src/pages/example2.md @@ -7,18 +7,27 @@ layout: "@layouts/MarkdownPage.astro"

Mixing scripting languages

- With Nextflow, you are not limited to Bash scripts -- you can use any scripting language! In other words, for each process you can use the language that best fits the specific task or that you simply prefer. + You are not limited to Bash scripts with Nextflow -- you can use any scripting language that can be executed by the Linux platform.

```groovy #!/usr/bin/env nextflow +/* + * Pipeline parameters + */ + +// Range params.range = 100 /* * A trivial Perl script that produces a list of number pairs */ process perlTask { + + input: + val x + output: stdout @@ -29,7 +38,7 @@ process perlTask { use warnings; my $count; - my $range = !{params.range}; + my $range = !{x}; for ($count = 0; $count < 10; $count++) { print rand($range) . ', ' . rand($range) . "\n"; } @@ -40,12 +49,14 @@ process perlTask { * A Python script which parses the output of the previous script */ process pyTask { + input: stdin output: stdout + script: """ #!/usr/bin/env python import sys @@ -64,7 +75,15 @@ process pyTask { } workflow { - perlTask | pyTask | view + + // A Perl script that produces a list of number pairs + perlTask(params.range) + + // A Python script which parses the output of the previous script + pyTask(perlTask.out) + + // View pyTask output + pyTask.out.view() } ``` @@ -72,9 +91,16 @@ workflow { ### Synopsis -In the above example we define a simple pipeline with two processes. +This example shows a simple Nextflow pipeline consisting of two processes written in different languages. The `perlTask` process starts with a Perl _shebang_ declaration and executes a Perl script that produces pairs of numbers. Since Perl uses the `$` character for variables, the special `shell` block is used instead of the normal `script` block to distinguish the Perl variables from Nextflow variables. Similarly, the `pyTask` process starts with a Python _shebang_ declaration. It takes the output from the Perl script and executes a Python script that averages the number pairs. The output from the `pyTask` process is then printed to screen. + +### Try it + +To try this pipeline: + +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +2. Copy the script above and save as `mixed-languages.nf`. +3. Launch the pipeline: -The first process executes a Perl script, because the script block definition starts -with a Perl _shebang_ declaration (line 14). Since Perl uses the `$` character for variables, we use the special `shell` block instead of the normal `script` block to easily distinguish the Perl variables from the Nextflow variables. + nextflow run mixed-languages.nf -In the same way, the second process will execute a Python script, because the script block starts with a Python shebang (line 36). +**NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. diff --git a/src/pages/example3.md b/src/pages/example3.md index 89373cab..31e87c20 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -1,96 +1,86 @@ --- -title: BLAST pipeline +title: RNA-Seq pipeline layout: "@layouts/MarkdownPage.astro" ---
-

BLAST pipeline

+

RNA-Seq pipeline

- This example splits a FASTA file into chunks and executes a BLAST query for each chunk in parallel. Then, all the sequences for the top hits are collected and merged into a single result file. + This example shows how to put together a basic RNA-Seq pipeline. It maps a collection of read-pairs to a given reference genome and outputs the respective transcript model.

```groovy #!/usr/bin/env nextflow /* - * Defines the pipeline input parameters (with a default value for each one). - * Each of the following parameters can be specified as command line options. + * The following pipeline parameters specify the reference genomes + * and read pairs and can be provided as command line options */ -params.query = "$baseDir/data/sample.fa" -params.db = "$baseDir/blast-db/pdb/tiny" -params.out = "result.txt" -params.chunkSize = 100 +params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq" +params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" +params.outdir = "results" -db_name = file(params.db).name -db_dir = file(params.db).parent +workflow { + read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true ) + INDEX(params.transcriptome) + FASTQC(read_pairs_ch) + QUANT(INDEX.out, read_pairs_ch) +} -workflow { - /* - * Create a channel emitting the given query fasta file(s). - * Split the file into chunks containing as many sequences as defined by the parameter 'chunkSize'. - * Finally, assign the resulting channel to the variable 'ch_fasta' - */ - Channel - .fromPath(params.query) - .splitFasta(by: params.chunkSize, file:true) - .set { ch_fasta } - - /* - * Execute a BLAST job for each chunk emitted by the 'ch_fasta' channel - * and emit the resulting BLAST matches. - */ - ch_hits = blast(ch_fasta, db_dir) - - /* - * Each time a file emitted by the 'blast' process, an extract job is executed, - * producing a file containing the matching sequences. - */ - ch_sequences = extract(ch_hits, db_dir) - - /* - * Collect all the sequences files into a single file - * and print the resulting file contents when complete. - */ - ch_sequences - .collectFile(name: params.out) - .view { file -> "matching sequences:\n ${file.text}" } +process INDEX { + tag "$transcriptome.simpleName" + + input: + path transcriptome + + output: + path 'index' + + script: + """ + salmon index --threads $task.cpus -t $transcriptome -i index + """ } +process FASTQC { + tag "FASTQC on $sample_id" + publishDir params.outdir -process blast { input: - path 'query.fa' - path db + tuple val(sample_id), path(reads) output: - path 'top_hits' + path "fastqc_${sample_id}_logs" + script: """ - blastp -db $db/$db_name -query query.fa -outfmt 6 > blast_result - cat blast_result | head -n 10 | cut -f 2 > top_hits + fastqc.sh "$sample_id" "$reads" """ } +process QUANT { + tag "$pair_id" + publishDir params.outdir -process extract { input: - path 'top_hits' - path db + path index + tuple val(pair_id), path(reads) output: - path 'sequences' + path pair_id + script: """ - blastdbcmd -db $db/$db_name -entry_batch top_hits | head -n 10 > sequences + salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id """ } ```
-### Try it on your computer +### Try it in your computer To run this pipeline on your computer, you will need: @@ -100,12 +90,12 @@ To run this pipeline on your computer, you will need: Install Nextflow by entering the following command in the terminal: - $ curl -fsSL https://get.nextflow.io | bash + $ curl -fsSL get.nextflow.io | bash Then launch the pipeline with this command: - $ ./nextflow run blast-example -with-docker + $ nextflow run rnaseq-nf -with-docker -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/blast-example) and the associated Docker images, thus the first execution may take a few minutes to complete depending on your network connection. +It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/rnaseq-nf) and the associated Docker images, thus the first execution may take a few minutes to complete depending on your network connection. **NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. diff --git a/src/pages/example4.md b/src/pages/example4.md index 31e87c20..dda443da 100644 --- a/src/pages/example4.md +++ b/src/pages/example4.md @@ -1,81 +1,49 @@ --- -title: RNA-Seq pipeline +title: Machine Learning pipeline layout: "@layouts/MarkdownPage.astro" ---
-

RNA-Seq pipeline

+

Machine Learning pipeline

- This example shows how to put together a basic RNA-Seq pipeline. It maps a collection of read-pairs to a given reference genome and outputs the respective transcript model. + This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria.

```groovy #!/usr/bin/env nextflow -/* - * The following pipeline parameters specify the reference genomes - * and read pairs and can be provided as command line options - */ -params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq" -params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" -params.outdir = "results" +params.dataset_name = 'wdbc' +params.train_models = ['dummy', 'gb', 'lr', 'mlp', 'rf'] +params.outdir = 'results' workflow { - read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true ) - - INDEX(params.transcriptome) - FASTQC(read_pairs_ch) - QUANT(INDEX.out, read_pairs_ch) -} - -process INDEX { - tag "$transcriptome.simpleName" - - input: - path transcriptome - - output: - path 'index' - - script: - """ - salmon index --threads $task.cpus -t $transcriptome -i index - """ -} - -process FASTQC { - tag "FASTQC on $sample_id" - publishDir params.outdir - - input: - tuple val(sample_id), path(reads) - - output: - path "fastqc_${sample_id}_logs" - - script: - """ - fastqc.sh "$sample_id" "$reads" - """ + // fetch dataset from OpenML + ch_datasets = fetch_dataset(params.dataset_name) + + // split dataset into train/test sets + (ch_train_datasets, ch_predict_datasets) = split_train_test(ch_datasets) + + // perform training + (ch_models, ch_train_logs) = train(ch_train_datasets, params.train_models) + + // perform inference + ch_predict_inputs = ch_models.combine(ch_predict_datasets, by: 0) + (ch_scores, ch_predict_logs) = predict(ch_predict_inputs) + + // select the best model based on inference score + ch_scores + | max { + new JsonSlurper().parse(it[2])['value'] + } + | subscribe { dataset_name, model_type, score_file -> + def score = new JsonSlurper().parse(score_file) + println "The best model for ${dataset_name} was ${model_type}, with ${score['name']} = ${score['value']}" + } } -process QUANT { - tag "$pair_id" - publishDir params.outdir +// view the entire code on GitHub ... - input: - path index - tuple val(pair_id), path(reads) - - output: - path pair_id - - script: - """ - salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id - """ -} ```
@@ -94,8 +62,8 @@ Install Nextflow by entering the following command in the terminal: Then launch the pipeline with this command: - $ nextflow run rnaseq-nf -with-docker + $ nextflow run ml-hyperopt -profile wave -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/rnaseq-nf) and the associated Docker images, thus the first execution may take a few minutes to complete depending on your network connection. +It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/ml-hyperopt) and build a Docker image on-the-fly using [Wave](https://seqera.io/wave/), thus the first execution may take a few minutes to complete depending on your network connection. -**NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. +**NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. diff --git a/src/pages/example5.md b/src/pages/example5.md deleted file mode 100644 index dda443da..00000000 --- a/src/pages/example5.md +++ /dev/null @@ -1,69 +0,0 @@ ---- -title: Machine Learning pipeline -layout: "@layouts/MarkdownPage.astro" ---- - -
-

Machine Learning pipeline

- -

- This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria. -

- -```groovy -#!/usr/bin/env nextflow - -params.dataset_name = 'wdbc' -params.train_models = ['dummy', 'gb', 'lr', 'mlp', 'rf'] -params.outdir = 'results' - -workflow { - // fetch dataset from OpenML - ch_datasets = fetch_dataset(params.dataset_name) - - // split dataset into train/test sets - (ch_train_datasets, ch_predict_datasets) = split_train_test(ch_datasets) - - // perform training - (ch_models, ch_train_logs) = train(ch_train_datasets, params.train_models) - - // perform inference - ch_predict_inputs = ch_models.combine(ch_predict_datasets, by: 0) - (ch_scores, ch_predict_logs) = predict(ch_predict_inputs) - - // select the best model based on inference score - ch_scores - | max { - new JsonSlurper().parse(it[2])['value'] - } - | subscribe { dataset_name, model_type, score_file -> - def score = new JsonSlurper().parse(score_file) - println "The best model for ${dataset_name} was ${model_type}, with ${score['name']} = ${score['value']}" - } -} - -// view the entire code on GitHub ... - -``` - -
- -### Try it in your computer - -To run this pipeline on your computer, you will need: - -- Unix-like operating system -- Java 11 (or higher) -- Docker - -Install Nextflow by entering the following command in the terminal: - - $ curl -fsSL get.nextflow.io | bash - -Then launch the pipeline with this command: - - $ nextflow run ml-hyperopt -profile wave - -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/ml-hyperopt) and build a Docker image on-the-fly using [Wave](https://seqera.io/wave/), thus the first execution may take a few minutes to complete depending on your network connection. - -**NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. From f146ed26adcb951860144c2dc087bcaf59b09039 Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 08:28:32 +0200 Subject: [PATCH 03/10] Fix headings --- src/components/Menu.astro | 13 +++---------- src/pages/example2.md | 4 ++-- 2 files changed, 5 insertions(+), 12 deletions(-) diff --git a/src/components/Menu.astro b/src/components/Menu.astro index 8eed03b9..0484d2dd 100644 --- a/src/components/Menu.astro +++ b/src/components/Menu.astro @@ -37,16 +37,9 @@ const isHomepage = currentPath === "/" || currentPath === "/index.html"; Examples diff --git a/src/pages/example2.md b/src/pages/example2.md index 605402ec..95fac98e 100644 --- a/src/pages/example2.md +++ b/src/pages/example2.md @@ -1,10 +1,10 @@ --- -title: Mixing scripting languages +title: Mixed language pipeline layout: "@layouts/MarkdownPage.astro" ---
-

Mixing scripting languages

+

Mixed language pipeline

You are not limited to Bash scripts with Nextflow -- you can use any scripting language that can be executed by the Linux platform. From c8b1df49a01cecc1f5cbf5109ce0d06493e19b5e Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 13:08:38 +0200 Subject: [PATCH 04/10] Add variant calling --- src/components/Menu.astro | 3 +- src/pages/example1.md | 4 +- src/pages/example2.md | 4 +- src/pages/example3.md | 59 +++++++----- src/pages/example4.md | 194 +++++++++++++++++++++++++++++--------- src/pages/example5.md | 69 ++++++++++++++ 6 files changed, 263 insertions(+), 70 deletions(-) create mode 100644 src/pages/example5.md diff --git a/src/components/Menu.astro b/src/components/Menu.astro index 0484d2dd..51665723 100644 --- a/src/components/Menu.astro +++ b/src/components/Menu.astro @@ -39,7 +39,8 @@ const isHomepage = currentPath === "/" || currentPath === "/index.html";

  • Basic pipeline
  • Mixed language pipeline
  • RNA-Seq pipeline
  • -
  • Machine Learning pipeline
  • +
  • Variant calling pipeline
  • +
  • Machine Learning pipeline
  • diff --git a/src/pages/example1.md b/src/pages/example1.md index 5861fc6d..df0011d6 100644 --- a/src/pages/example1.md +++ b/src/pages/example1.md @@ -7,7 +7,7 @@ layout: "@layouts/MarkdownPage.astro"

    Basic pipeline

    - Nextflow pipelines are made by joining together different processes. + This example shows a simple Nextflow pipeline consisting of two Bash processes.

    ```groovy @@ -80,7 +80,7 @@ This example shows a simple Nextflow pipeline consisting of two Bash processes. To try this pipeline: -1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow (if not already available). 2. Copy the script above and save as `hello-world.nf`. 3. Launch the pipeline: diff --git a/src/pages/example2.md b/src/pages/example2.md index 95fac98e..c07e0113 100644 --- a/src/pages/example2.md +++ b/src/pages/example2.md @@ -7,7 +7,7 @@ layout: "@layouts/MarkdownPage.astro"

    Mixed language pipeline

    - You are not limited to Bash scripts with Nextflow -- you can use any scripting language that can be executed by the Linux platform. + This example shows a simple Nextflow pipeline consisting of two processes written in different languages.

    ```groovy @@ -97,7 +97,7 @@ This example shows a simple Nextflow pipeline consisting of two processes writte To try this pipeline: -1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow (if not already available). 2. Copy the script above and save as `mixed-languages.nf`. 3. Launch the pipeline: diff --git a/src/pages/example3.md b/src/pages/example3.md index 31e87c20..74c90358 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -7,27 +7,24 @@ layout: "@layouts/MarkdownPage.astro"

    RNA-Seq pipeline

    - This example shows how to put together a basic RNA-Seq pipeline. It maps a collection of read-pairs to a given reference genome and outputs the respective transcript model. + This example shows how to put together a basic RNA-Seq pipeline.

    ```groovy #!/usr/bin/env nextflow /* - * The following pipeline parameters specify the reference genomes - * and read pairs and can be provided as command line options + * Pipeline parameters */ -params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq" -params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" -params.outdir = "results" -workflow { - read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true ) +// Input data +params.reads = "${projectDir}/data/ggal/ggal_gut_{1,2}.fq" - INDEX(params.transcriptome) - FASTQC(read_pairs_ch) - QUANT(INDEX.out, read_pairs_ch) -} +// Reference file +params.transcriptome = "${projectDir}/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" + +// Output directory +params.outdir = "results" process INDEX { tag "$transcriptome.simpleName" @@ -78,24 +75,40 @@ process QUANT { } ``` +workflow { + + // Paired reference data + read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true ) + + // Index reference transcriptome file + INDEX(params.transcriptome) + + // Generate FastQC reports + FASTQC(read_pairs_ch) + + // Quantify reads + QUANT(INDEX.out, read_pairs_ch) +} +``` +
    -### Try it in your computer +### Synopsis -To run this pipeline on your computer, you will need: +This example shows a basic Nextflow pipeline consisting of three processes. The `INDEX` process creates index files for the input BAM files. The `GATK_HAPLOTYPECALLER` process takes the index bam files created by the `SAMTOOLS_INDEX` process and accessory files and creates variant call files. Finally, the `GATK_JOINTGENOTYPING` process consolidates the variant call files generated by `GATK_HAPLOTYPECALLER` and applies a joint genotyping analysis. -- Unix-like operating system -- Java 11 (or higher) -- Docker +### Try it -Install Nextflow by entering the following command in the terminal: +This pipeline is available on the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) GitHub repository. - $ curl -fsSL get.nextflow.io | bash +An active internet connection and Docker are required for Nextflow to download the pipeline and necessary Docker images and run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. -Then launch the pipeline with this command: +To try this pipeline: - $ nextflow run rnaseq-nf -with-docker +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. +2. Launch the pipeline: -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/rnaseq-nf) and the associated Docker images, thus the first execution may take a few minutes to complete depending on your network connection. + nextflow run nextflow-io/rnaseq-nf -with-docker -**NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. +**NOTE**: The `rnaseq-nf` pipeline on GitHub is under active development and may differ from the example shown above. diff --git a/src/pages/example4.md b/src/pages/example4.md index dda443da..b9b5cb07 100644 --- a/src/pages/example4.md +++ b/src/pages/example4.md @@ -1,69 +1,179 @@ --- -title: Machine Learning pipeline +title: Simple variant calling pipeline layout: "@layouts/MarkdownPage.astro" ---
    -

    Machine Learning pipeline

    +

    Simple variant calling pipeline

    - This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria. + This example shows a simple variant calling pipeline using container technology.

    ```groovy -#!/usr/bin/env nextflow +/* + * Pipeline parameters + */ -params.dataset_name = 'wdbc' -params.train_models = ['dummy', 'gb', 'lr', 'mlp', 'rf'] -params.outdir = 'results' +// Primary input +params.reads_bam = "${workflow.projectDir}/data/bam/*.bam" -workflow { - // fetch dataset from OpenML - ch_datasets = fetch_dataset(params.dataset_name) - - // split dataset into train/test sets - (ch_train_datasets, ch_predict_datasets) = split_train_test(ch_datasets) - - // perform training - (ch_models, ch_train_logs) = train(ch_train_datasets, params.train_models) - - // perform inference - ch_predict_inputs = ch_models.combine(ch_predict_datasets, by: 0) - (ch_scores, ch_predict_logs) = predict(ch_predict_inputs) - - // select the best model based on inference score - ch_scores - | max { - new JsonSlurper().parse(it[2])['value'] - } - | subscribe { dataset_name, model_type, score_file -> - def score = new JsonSlurper().parse(score_file) - println "The best model for ${dataset_name} was ${model_type}, with ${score['name']} = ${score['value']}" - } +// Accessory files +params.reference = "${workflow.projectDir}/data/ref/ref.fasta" +params.reference_index = "${workflow.projectDir}/data/ref/ref.fasta.fai" +params.reference_dict = "${workflow.projectDir}/data/ref/ref.dict" +params.calling_intervals = "${workflow.projectDir}/data/ref/intervals.bed" + +// Base name for final output file +params.cohort_name = "family_trio" + +/* + * Generate BAM index file + */ +process SAMTOOLS_INDEX { + + container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464' + conda "bioconda::samtools=1.20" + + input: + path input_bam + + output: + tuple path(input_bam), path("${input_bam}.bai") + + """ + samtools index '$input_bam' + + """ +} + +/* + * Call variants with GATK HapolotypeCaller in GVCF mode + */ +process GATK_HAPLOTYPECALLER { + + container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867" + conda "bioconda::gatk4=4.5.0.0" + + input: + tuple path(input_bam), path(input_bam_index) + path ref_fasta + path ref_index + path ref_dict + path interval_list + + output: + path "${input_bam}.g.vcf" + path "${input_bam}.g.vcf.idx" + + """ + gatk HaplotypeCaller \ + -R ${ref_fasta} \ + -I ${input_bam} \ + -O ${input_bam}.g.vcf \ + -L ${interval_list} \ + -ERC GVCF + """ } -// view the entire code on GitHub ... +/* + * Consolidate GVCFs and apply joint genotyping analysis + */ +process GATK_JOINTGENOTYPING { + + container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867" + conda "bioconda::gatk4=4.5.0.0" + + input: + path vcfs + path idxs + val cohort_name + path ref_fasta + path ref_index + path ref_dict + path interval_list + + output: + path "${cohort_name}.joint.vcf" + path "${cohort_name}.joint.vcf.idx" + + script: + def input_vcfs = vcfs.collect { "-V ${it}" }.join(' ') + """ + gatk GenomicsDBImport \ + ${input_vcfs} \ + --genomicsdb-workspace-path ${cohort_name}_gdb \ + -L ${interval_list} + + gatk GenotypeGVCFs \ + -R ${ref_fasta} \ + -V gendb://${cohort_name}_gdb \ + -O ${cohort_name}.joint.vcf \ + -L ${interval_list} + """ +} +workflow { + + // Create input channel from BAM files + // We convert it to a tuple with the file name and the file path + // See https://www.nextflow.io/docs/latest/script.html#getting-file-attributes + bam_ch = Channel.fromPath(params.reads_bam, checkIfExists: true) + + // Create reference channels using the fromPath channel factory + // The collect converts from a queue channel to a value channel + // See https://www.nextflow.io/docs/latest/channel.html#channel-types for details + ref_ch = Channel.fromPath(params.reference, checkIfExists: true).collect() + ref_index_ch = Channel.fromPath(params.reference_index, checkIfExists: true).collect() + ref_dict_ch = Channel.fromPath(params.reference_dict, checkIfExists: true).collect() + calling_intervals_ch = Channel.fromPath(params.calling_intervals, checkIfExists: true).collect() + + // Create index file for input BAM file + SAMTOOLS_INDEX(bam_ch) + + // Call variants from the indexed BAM file + GATK_HAPLOTYPECALLER( + SAMTOOLS_INDEX.out, + ref_ch, + ref_index_ch, + ref_dict_ch, + calling_intervals_ch + ) + + all_vcfs = GATK_HAPLOTYPECALLER.out[0].collect() + all_tbis = GATK_HAPLOTYPECALLER.out[1].collect() + + // Consolidate GVCFs and apply joint genotyping analysis + GATK_JOINTGENOTYPING( + all_vcfs, + all_tbis, + params.cohort_name, + ref_ch, + ref_index_ch, + ref_dict_ch, + calling_intervals_ch + ) +} ```
    -### Try it in your computer +### Synopsis -To run this pipeline on your computer, you will need: +This example shows a basic variant calling Nextflow pipeline consisting of three processes. The `SAMTOOLS_INDEX` process creates index files for the input BAM files. The `GATK_HAPLOTYPECALLER` process takes the index bam files created by the `SAMTOOLS_INDEX` process and accessory files and creates variant call files. Finally, the `GATK_JOINTGENOTYPING` process consolidates the variant call files generated by `GATK_HAPLOTYPECALLER` and applies a joint genotyping analysis. -- Unix-like operating system -- Java 11 (or higher) -- Docker +### Try it -Install Nextflow by entering the following command in the terminal: +This pipeline is available on the [seqeralabs/nf-hello-gatk](https://github.com/seqeralabs/nf-hello-gatk) GitHub repository. - $ curl -fsSL get.nextflow.io | bash +An active internet connection and Docker are required for Nextflow to download the pipeline and necessary Docker images and run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. -Then launch the pipeline with this command: +To try this pipeline: - $ nextflow run ml-hyperopt -profile wave +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. +2. Launch the pipeline: -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/ml-hyperopt) and build a Docker image on-the-fly using [Wave](https://seqera.io/wave/), thus the first execution may take a few minutes to complete depending on your network connection. + nextflow run seqeralabs/nf-hello-gatk -**NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. +**NOTE**: The `nf-hello-gatk` pipeline will use Docker to manage software dependencies by default. To use an alternate method the Docker configuration option in the `nextflow.config` must be set to false. diff --git a/src/pages/example5.md b/src/pages/example5.md new file mode 100644 index 00000000..55f1e118 --- /dev/null +++ b/src/pages/example5.md @@ -0,0 +1,69 @@ +--- +title: Machine Learning pipeline +layout: "@layouts/MarkdownPage.astro" +--- + +
    +

    Machine Learning pipeline

    + +

    + This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria. +

    + +```groovy +#!/usr/bin/env nextflow + +params.dataset_name = 'wdbc' +params.train_models = ['dummy', 'gb', 'lr', 'mlp', 'rf'] +params.outdir = 'results' + +workflow { + // fetch dataset from OpenML + ch_datasets = fetch_dataset(params.dataset_name) + + // split dataset into train/test sets + (ch_train_datasets, ch_predict_datasets) = split_train_test(ch_datasets) + + // perform training + (ch_models, ch_train_logs) = train(ch_train_datasets, params.train_models) + + // perform inference + ch_predict_inputs = ch_models.combine(ch_predict_datasets, by: 0) + (ch_scores, ch_predict_logs) = predict(ch_predict_inputs) + + // select the best model based on inference score + ch_scores + | max { + new JsonSlurper().parse(it[2])['value'] + } + | subscribe { dataset_name, model_type, score_file -> + def score = new JsonSlurper().parse(score_file) + println "The best model for ${dataset_name} was ${model_type}, with ${score['name']} = ${score['value']}" + } +} + +// view the entire code on GitHub ... + +``` + +
    + +### Try it in your computer + +To run this pipeline on your computer, you will need: + +- Unix-like operating system +- Java 11 (or higher) +- Docker + +Install Nextflow by entering the following command in the terminal: + + curl -fsSL get.nextflow.io | bash + +Then launch the pipeline with this command: + + nextflow run ml-hyperopt -profile wave + +It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/ml-hyperopt) and build a Docker image on-the-fly using [Wave](https://seqera.io/wave/), thus the first execution may take a few minutes to complete depending on your network connection. + +**NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. From 4eea052c89383f9350169500dfcbd39ef9e3d79b Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 13:10:41 +0200 Subject: [PATCH 05/10] Fix number --- src/components/Menu.astro | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/components/Menu.astro b/src/components/Menu.astro index 51665723..5d7f469b 100644 --- a/src/components/Menu.astro +++ b/src/components/Menu.astro @@ -39,7 +39,7 @@ const isHomepage = currentPath === "/" || currentPath === "/index.html";
  • Basic pipeline
  • Mixed language pipeline
  • RNA-Seq pipeline
  • -
  • Variant calling pipeline
  • +
  • Variant calling pipeline
  • Machine Learning pipeline
  • From 4b3e7a784094153709a50da2116b5a7891338694 Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 16:15:29 +0200 Subject: [PATCH 06/10] Fix typos and add channel factory --- src/pages/example1.md | 7 +++++-- src/pages/example2.md | 7 +++++-- src/pages/example3.md | 2 +- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/src/pages/example1.md b/src/pages/example1.md index df0011d6..61924c0c 100644 --- a/src/pages/example1.md +++ b/src/pages/example1.md @@ -59,8 +59,11 @@ process convertToUpper { */ workflow { + // Creates channel using the Channel.of() channel factory + greeting_ch = Channel.of(params.greeting) + // Redirects a string to a text file - sayHello(params.greeting) + sayHello(greeting_ch) // Concatenates a text file and transforms lowercase letters to uppercase letters convertToUpper(sayHello.out) @@ -81,7 +84,7 @@ This example shows a simple Nextflow pipeline consisting of two Bash processes. To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow (if not already available). -2. Copy the script above and save as `hello-world.nf`. +2. Copy the script above and save it as `hello-world.nf`. 3. Launch the pipeline: nextflow run hello-world.nf diff --git a/src/pages/example2.md b/src/pages/example2.md index c07e0113..e57c867c 100644 --- a/src/pages/example2.md +++ b/src/pages/example2.md @@ -76,8 +76,11 @@ process pyTask { workflow { + // Creates channel using the Channel.of() channel factory + range_ch = Channel.of(params.range) + // A Perl script that produces a list of number pairs - perlTask(params.range) + perlTask(range_ch) // A Python script which parses the output of the previous script pyTask(perlTask.out) @@ -98,7 +101,7 @@ This example shows a simple Nextflow pipeline consisting of two processes writte To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow (if not already available). -2. Copy the script above and save as `mixed-languages.nf`. +2. Copy the script above and save it as `mixed-languages.nf`. 3. Launch the pipeline: nextflow run mixed-languages.nf diff --git a/src/pages/example3.md b/src/pages/example3.md index 74c90358..5ee138a7 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -107,7 +107,7 @@ To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. -2. Launch the pipeline: +3. Launch the pipeline: nextflow run nextflow-io/rnaseq-nf -with-docker From 501dd195b2d0fd9125ae5932aba648d72c90e5aa Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 17:05:31 +0200 Subject: [PATCH 07/10] Fix fixes --- src/pages/example3.md | 2 +- src/pages/example4.md | 2 +- src/pages/example5.md | 20 ++++++++------------ 3 files changed, 10 insertions(+), 14 deletions(-) diff --git a/src/pages/example3.md b/src/pages/example3.md index 5ee138a7..3e5dbabd 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -95,7 +95,7 @@ workflow { ### Synopsis -This example shows a basic Nextflow pipeline consisting of three processes. The `INDEX` process creates index files for the input BAM files. The `GATK_HAPLOTYPECALLER` process takes the index bam files created by the `SAMTOOLS_INDEX` process and accessory files and creates variant call files. Finally, the `GATK_JOINTGENOTYPING` process consolidates the variant call files generated by `GATK_HAPLOTYPECALLER` and applies a joint genotyping analysis. +This example shows a basic Nextflow pipeline consisting of three processes. The `INDEX` process creates index files for the input BAM files. The `FASTQC` process takes the index bam files created by the `SAMTOOLS_INDEX` process and accessory files and creates variant call files. Finally, the `GATK_JOINTGENOTYPING` process consolidates the variant call files generated by `GATK_HAPLOTYPECALLER` and applies a joint genotyping analysis. ### Try it diff --git a/src/pages/example4.md b/src/pages/example4.md index b9b5cb07..4ab9888c 100644 --- a/src/pages/example4.md +++ b/src/pages/example4.md @@ -172,7 +172,7 @@ To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. -2. Launch the pipeline: +3. Launch the pipeline: nextflow run seqeralabs/nf-hello-gatk diff --git a/src/pages/example5.md b/src/pages/example5.md index 55f1e118..3ec8e27a 100644 --- a/src/pages/example5.md +++ b/src/pages/example5.md @@ -48,22 +48,18 @@ workflow { -### Try it in your computer +### Try it -To run this pipeline on your computer, you will need: +This pipeline is available on the [nextflow-io/ml-hyperopt](https://github.com/nextflow-io/ml-hyperopt) GitHub repository. -- Unix-like operating system -- Java 11 (or higher) -- Docker +An active internet connection and Docker are required for Nextflow to download the pipeline and necessary images and run the pipeline. The data used by this pipeline will download automatically. -Install Nextflow by entering the following command in the terminal: +To try this pipeline: - curl -fsSL get.nextflow.io | bash +1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. +2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. +3. Launch the pipeline: -Then launch the pipeline with this command: - - nextflow run ml-hyperopt -profile wave - -It will automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/ml-hyperopt) and build a Docker image on-the-fly using [Wave](https://seqera.io/wave/), thus the first execution may take a few minutes to complete depending on your network connection. + nextflow run nextflow-io/ml-hyperopt -profile wave **NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. From ffcef841fc4cdecd3655aad7e155197cffae6dec Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 17:19:10 +0200 Subject: [PATCH 08/10] Fix commit --- src/pages/example1.md | 2 +- src/pages/example2.md | 2 +- src/pages/example3.md | 2 +- src/pages/example4.md | 2 +- src/pages/example5.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/pages/example1.md b/src/pages/example1.md index 61924c0c..c4df72a5 100644 --- a/src/pages/example1.md +++ b/src/pages/example1.md @@ -87,6 +87,6 @@ To try this pipeline: 2. Copy the script above and save it as `hello-world.nf`. 3. Launch the pipeline: - nextflow run hello-world.nf + nextflow run hello-world.nf **NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. diff --git a/src/pages/example2.md b/src/pages/example2.md index e57c867c..fe776608 100644 --- a/src/pages/example2.md +++ b/src/pages/example2.md @@ -104,6 +104,6 @@ To try this pipeline: 2. Copy the script above and save it as `mixed-languages.nf`. 3. Launch the pipeline: - nextflow run mixed-languages.nf + nextflow run mixed-languages.nf **NOTE**: To run this example with versions of Nextflow older than 22.04.0, you must include the `-dsl2` flag with `nextflow run`. diff --git a/src/pages/example3.md b/src/pages/example3.md index 3e5dbabd..487af6f4 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -109,6 +109,6 @@ To try this pipeline: 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. 3. Launch the pipeline: - nextflow run nextflow-io/rnaseq-nf -with-docker + nextflow run nextflow-io/rnaseq-nf -with-docker **NOTE**: The `rnaseq-nf` pipeline on GitHub is under active development and may differ from the example shown above. diff --git a/src/pages/example4.md b/src/pages/example4.md index 4ab9888c..059226d8 100644 --- a/src/pages/example4.md +++ b/src/pages/example4.md @@ -174,6 +174,6 @@ To try this pipeline: 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. 3. Launch the pipeline: - nextflow run seqeralabs/nf-hello-gatk + nextflow run seqeralabs/nf-hello-gatk **NOTE**: The `nf-hello-gatk` pipeline will use Docker to manage software dependencies by default. To use an alternate method the Docker configuration option in the `nextflow.config` must be set to false. diff --git a/src/pages/example5.md b/src/pages/example5.md index 3ec8e27a..20fc0df4 100644 --- a/src/pages/example5.md +++ b/src/pages/example5.md @@ -60,6 +60,6 @@ To try this pipeline: 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. 3. Launch the pipeline: - nextflow run nextflow-io/ml-hyperopt -profile wave + nextflow run nextflow-io/ml-hyperopt -profile wave **NOTE**: Nextflow 22.10.0 or newer is required to run this pipeline with Wave. From 6a39f76134d31d493d592805982791f736aa2386 Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Fri, 6 Sep 2024 17:21:49 +0200 Subject: [PATCH 09/10] Fix code block --- src/pages/example3.md | 1 - 1 file changed, 1 deletion(-) diff --git a/src/pages/example3.md b/src/pages/example3.md index 487af6f4..f7eaffdc 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -73,7 +73,6 @@ process QUANT { salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id """ } -``` workflow { From 626d888e157a639f1b1bcec0b6fa259f2590338b Mon Sep 17 00:00:00 2001 From: christopher-hakkaart Date: Mon, 9 Sep 2024 13:56:14 +0200 Subject: [PATCH 10/10] More updates --- src/pages/example3.md | 63 ++++++++++++++++++++++++++++++++++--------- src/pages/example4.md | 4 +-- src/pages/example5.md | 8 ++++-- 3 files changed, 59 insertions(+), 16 deletions(-) diff --git a/src/pages/example3.md b/src/pages/example3.md index f7eaffdc..d32c2661 100644 --- a/src/pages/example3.md +++ b/src/pages/example3.md @@ -7,7 +7,7 @@ layout: "@layouts/MarkdownPage.astro"

    RNA-Seq pipeline

    - This example shows how to put together a basic RNA-Seq pipeline. + This example shows a basic RNA-Seq pipeline.

    ```groovy @@ -18,16 +18,21 @@ layout: "@layouts/MarkdownPage.astro" */ // Input data -params.reads = "${projectDir}/data/ggal/ggal_gut_{1,2}.fq" +params.reads = "${workflow.projectDir}/data/ggal/ggal_gut_{1,2}.fq" // Reference file -params.transcriptome = "${projectDir}/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" +params.transcriptome = "${workflow.projectDir}/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa" // Output directory params.outdir = "results" +/* + * Index reference transcriptome file + */ process INDEX { tag "$transcriptome.simpleName" + container "community.wave.seqera.io/library/salmon:1.10.3--482593b6cd04c9b7" + conda "bioconda::salmon=1.10.3" input: path transcriptome @@ -41,9 +46,14 @@ process INDEX { """ } +/* + * Generate FastQC reports + */ process FASTQC { tag "FASTQC on $sample_id" - publishDir params.outdir + publishDir params.outdir, mode:'copy' + container "community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42" + conda "bioconda::fastqc:0.12.1" input: tuple val(sample_id), path(reads) @@ -53,13 +63,19 @@ process FASTQC { script: """ - fastqc.sh "$sample_id" "$reads" + mkdir fastqc_${sample_id}_logs + fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads} """ } +/* + * Quantify reads + */ process QUANT { tag "$pair_id" - publishDir params.outdir + publishDir params.outdir, mode:'copy' + container "community.wave.seqera.io/library/salmon:1.10.3--482593b6cd04c9b7" + conda "bioconda::salmon=1.10.3" input: path index @@ -74,6 +90,26 @@ process QUANT { """ } +/* + * Generate MultiQC report + */ +process MULTIQC { + publishDir params.outdir, mode:'copy' + container "community.wave.seqera.io/library/multiqc:1.24.1--789bc3917c8666da" + conda "bioconda::multiqc:1.24.1" + + input: + path '*' + + output: + path 'multiqc_report.html' + + script: + """ + multiqc . + """ +} + workflow { // Paired reference data @@ -87,6 +123,9 @@ workflow { // Quantify reads QUANT(INDEX.out, read_pairs_ch) + + // Generate MultiQC report + MULTIQC(QUANT.out.mix(FASTQC.out).collect()) } ``` @@ -94,20 +133,20 @@ workflow { ### Synopsis -This example shows a basic Nextflow pipeline consisting of three processes. The `INDEX` process creates index files for the input BAM files. The `FASTQC` process takes the index bam files created by the `SAMTOOLS_INDEX` process and accessory files and creates variant call files. Finally, the `GATK_JOINTGENOTYPING` process consolidates the variant call files generated by `GATK_HAPLOTYPECALLER` and applies a joint genotyping analysis. +This example shows a basic Nextflow pipeline consisting of four processes. The `INDEX` process indexes a reference transcriptome file. The `FASTQC` process creates reports for the input fastq files. The `QUANT` process takes the indexed transcriptome and input fastq files and quantifies the reads. The `MULTIQC` process collects the output from the `QUANT` and `FASTQC` processes and generates a html report. ### Try it -This pipeline is available on the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) GitHub repository. +This pipeline is available on the [nextflow-io/rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf/tree/example) GitHub repository. -An active internet connection and Docker are required for Nextflow to download the pipeline and necessary Docker images and run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. +An active internet connection and Docker are required for Nextflow to download the pipeline and the necessary Docker images to run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. To try this pipeline: 1. Follow the [Nextflow installation guide](https://www.nextflow.io/docs/latest/install.html#install-nextflow) to install Nextflow. 2. Follow the [Docker installation guide](https://docs.docker.com/get-started/get-docker/) to install Docker. -3. Launch the pipeline: +3. Launch the `example` branch of the pipeline: - nextflow run nextflow-io/rnaseq-nf -with-docker + nextflow run nextflow-io/rnaseq-nf -r example -**NOTE**: The `rnaseq-nf` pipeline on GitHub is under active development and may differ from the example shown above. +**NOTE**: The main branch of the `rnaseq-nf` pipeline on GitHub is under active development and differs from the example shown above. The `rnaseq-nf` pipeline will use Docker to manage software dependencies by default. diff --git a/src/pages/example4.md b/src/pages/example4.md index 059226d8..c1fbd253 100644 --- a/src/pages/example4.md +++ b/src/pages/example4.md @@ -166,7 +166,7 @@ This example shows a basic variant calling Nextflow pipeline consisting of three This pipeline is available on the [seqeralabs/nf-hello-gatk](https://github.com/seqeralabs/nf-hello-gatk) GitHub repository. -An active internet connection and Docker are required for Nextflow to download the pipeline and necessary Docker images and run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. +An active internet connection and Docker are required for Nextflow to download the pipeline and the necessary Docker images to run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. To try this pipeline: @@ -176,4 +176,4 @@ To try this pipeline: nextflow run seqeralabs/nf-hello-gatk -**NOTE**: The `nf-hello-gatk` pipeline will use Docker to manage software dependencies by default. To use an alternate method the Docker configuration option in the `nextflow.config` must be set to false. +**NOTE**: The `nf-hello-gatk` pipeline will use Docker to manage software dependencies by default. diff --git a/src/pages/example5.md b/src/pages/example5.md index 20fc0df4..cf3a45fb 100644 --- a/src/pages/example5.md +++ b/src/pages/example5.md @@ -7,7 +7,7 @@ layout: "@layouts/MarkdownPage.astro"

    Machine Learning pipeline

    - This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria. + This example shows how to put together a basic Machine Learning pipeline.

    ```groovy @@ -48,11 +48,15 @@ workflow { +### Synopsis + +This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria. + ### Try it This pipeline is available on the [nextflow-io/ml-hyperopt](https://github.com/nextflow-io/ml-hyperopt) GitHub repository. -An active internet connection and Docker are required for Nextflow to download the pipeline and necessary images and run the pipeline. The data used by this pipeline will download automatically. +An active internet connection and Docker are required for Nextflow to download the pipeline and the necessary Docker images to run the pipeline within containers. The data used by this pipeline is stored on the GitHub repository and will download automatically. To try this pipeline: