Skip to content

Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.

License

Notifications You must be signed in to change notification settings

bioconvert/bioconvert

Repository files navigation

Bioconvert

Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.

https://github.com/bioconvert/bioconvert/actions/workflows/main.yml/badge.svg?branch=main https://coveralls.io/repos/github/bioconvert/bioconvert/badge.svg?branch=main Documentation Status https://static.pepy.tech/personalized-badge/bioconvert?period=month&units=international_system&left_color=black&right_color=blue&left_text=Downloads/months https://raw.githubusercontent.com/bioconvert/bioconvert/main/doc/_static/logo_300x200.png
contributions:Want to add a convertor ? Please join #1
How to cite:Caro et al, BioConvert: a comprehensive format converter for life sciences (2023) NAR Genomics and Bioinformatics (5),3. https://doi.org/10.1093/nargab/lqad074

Overview

Life science uses many different formats. They may be old, or with complex syntax and converting those formats may be a challenge. Bioconvert aims at providing a common tool / interface to convert life science data formats from one to another.

Many conversion tools already exist but they may be dispersed, focused on few specific formats, difficult to install, or not optimised. With Bioconvert, we plan to cover a wide spectrum of format conversions; we will re-use existing tools when possible and provide facilities to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.

In Jan 2023, we had 50 formats, 100 direct conversions available.

https://raw.githubusercontent.com/bioconvert/bioconvert/main/doc/conversion.png

Installation

BioConvert is developped in Python. Please use conda or any Python environment manager to install BioConvert using the pip command:

pip install bioconvert

50% of the conversions should work out of the box. However, many conversions require external tools. This is why we recommend to use a conda environment. In particular, most external tools are available on the bioconda channel. For instance if you want to convert a SAM file to a BAM file you would need to install samtools as follow:

conda install -c bioconda samtools

Since bioconvert is available on bioconda on solution that installs BioConvert and all its dependencies is to use conda/mamba:

conda env create --name bioconvert mamba
conda activate bioconvert
mamba install bioconvert
bioconvert --help

See the Installation section for more details and alternative solutions (docker, singularity).

Quick Start

There are many conversions available. Type:

bioconvert --help

to get a list of valid method of conversions. Taking the example of a conversion from a FastQ file into a FastA file, you could do the conversion as follows:

bioconvert fastq2fasta input.fastq output.fasta
bioconvert fastq2fasta input.fq    output.fasta
bioconvert fastq2fasta input.fq.gz output.fasta.gz
bioconvert fastq2fasta input.fq.gz output.fasta.bz2

When there is no ambiguity, you can be implicit:

bioconvert input.fastq output.fasta

The default method of conversion is used but you may use another one. Checkout the available methods with:

bioconvert fastq2fasta --show-methods

For more help about a conversion, just type:

bioconvert fastq2fasta --help

and more generally:

bioconvert --help

You may also call BioConvert from a Python shell:

# import a converter
from bioconvert.fastq2fasta import FASTQ2FASTA

# Instanciate with infile/outfile names
convert = FASTQ2FASTA(infile, outfile)

# the conversion itself:
convert()

Available Converters

Conversion table
Converters CI testing Default method
abi2fasta BIOPYTHON
abi2fastq BIOPYTHON
abi2qual BIOPYTHON
bam2bedgraph BEDTOOLS
bam2bigwig DEEPTOOLS
bam2cov BEDTOOLS
bam2cram SAMTOOLS
bam2fasta SAMTOOLS
bam2fastq SAMTOOLS
bam2json BAMTOOLS
bam2sam SAMBAMBA
bam2tsv SAMTOOLS
bam2wiggle WIGGLETOOLS
bcf2vcf BCFTOOLS
bcf2wiggle WIGGLETOOLS
bed2wiggle WIGGLETOOLS
bedgraph2bigwig UCSC
bedgraph2cov BIOCONVERT
bedgraph2wiggle WIGGLETOOLS
bigbed2bed DEEPTOOLS
bigbed2wiggle WIGGLETOOLS
bigwig2bedgraph DEEPTOOLS
bigwig2wiggle WIGGLETOOLS
bplink2plink PLINK
bplink2vcf PLINK
bz22gz Unix commands
clustal2fasta BIOPYTHON
clustal2nexus GOALIGN
clustal2phylip BIOPYTHON
clustal2stockholm BIOPYTHON
cram2bam SAMTOOLS
cram2fasta SAMTOOLS
cram2fastq SAMTOOLS
cram2sam SAMTOOLS
csv2tsv BIOCONVERT
csv2xls Pandas
dsrc2gz DSRC software
embl2fasta BIOPYTHON
embl2genbank BIOPYTHON
fasta2clustal BIOPYTHON
fasta2faa BIOCONVERT
fasta2fasta_agp BIOCONVERT
fasta2fastq PYSAM
fasta2genbank BIOCONVERT
fasta2nexus GOALIGN
fasta2phylip BIOPYTHON
fasta2twobit UCSC
fasta_qual2fastq PYSAM
fastq2fasta BIOCONVERT available
fastq2fasta_qual BIOCONVERT
fastq2qual READFQ
genbank2embl BIOPYTHON
genbank2fasta BIOPYTHON
genbank2gff3 BIOCODE
gfa2fasta BIOCONVERT
gff22gff3 BIOCONVERT
gff32gff2 BIOCONVERT
gff32gtf BIOCONVERT
gz2bz2 pigz/pbzip2 software
gz2dsrc DSRC software
json2yaml Python
maf2sam BIOCONVERT
newick2nexus GOTREE
newick2phyloxml GOTREE
nexus2clustal GOALIGN
nexus2fasta BIOPYTHON
nexus2newick GOTREE
nexus2phylip GOALIGN
nexus2phyloxml GOTREE
ods2csv pyexcel library
pdb2faa BIOCONVERT
phylip2clustal BIOPYTHON
phylip2fasta BIOPYTHON
phylip2nexus GOALIGN
phylip2stockholm BIOPYTHON
phylip2xmfa BIOPYTHON
phyloxml2newick GOTREE
phyloxml2nexus GOTREE
plink2bplink PLINK
plink2vcf PLINK
sam2bam SAMTOOLS
sam2cram SAMTOOLS
sam2paf BIOCONVERT
scf2fasta BIOCONVERT
scf2fastq BIOCONVERT
sra2fastq FASTQDUMP
stockholm2clustal BIOPYTHON
stockholm2phylip BIOPYTHON
tsv2csv BIOCONVERT
twobit2fasta DEEPTOOLS
vcf2bcf BCFTOOLS
vcf2bed BIOCONVERT
vcf2bplink PLINK
vcf2plink PLINK
vcf2wiggle WIGGLETOOLS
wig2bed BEDOPS
xls2csv  
xlsx2csv Pandas library
xmfa2phylip BIOPYTHON
yaml2json Pandas library

Contributors

Setting up and maintaining Bioconvert has been possible thanks to users and contributors. Thanks to all:

https://contrib.rocks/image?repo=bioconvert/bioconvert

Changes

Version Description
1.1.1
  • Fix benchmark labels.
  • NEW: fast52pod5 conversion
  • FIX: set goalign and gotree instead of go requirements
1.1.0
  • Implement ability to benchmark the CPU and memory usage (not just time) benchmark incorporates CPU/memory usage
1.0.0
  • Fix bam2fastq for paired data that computed useless intermediate file #325
  • more realistic fastq simulator
  • pin openpyxl to <=3.0.10 to prevent regression error in v3.1.0
0.6.3
  • add picard method in bam2sam
  • Fixed all CI workflows to use mamba
  • drop python3.7 support and add 3.10 support
  • update bedops test file to fit the latest bedops 2.4.41 version
  • revisit logging system
0.6.2
  • added gff3 to gtf conversion.
  • Added pdb to faa conversion
  • Added missing --reference argument to the cram2sam conversion
0.6.1
  • output file can be in sub-directories allowing syntax such as 'bioconvert fastq2fasta test.fastq outputs/test.fasta
  • fix all CI actions
  • add more examples as notebooks in ./examples
  • add a Snakefile for the paper in ./doc/Snakefile_paper
0.6.0
  • Fix bug in bam2sam (method sambamba)
  • Fix graph layout
  • add threading in fastq2fasta (seqkit method)
  • multibenchmark feature added
  • stable version used for web interface
0.5.2
  • Update requirements and environment.yml and add a conda spec-file.txt file
0.5.1
  • add genbank2gff3 requirement material in bioconvert.utils.biocode
0.5.0
  • Add CI actions for all converters
  • remove sniffer (now in biosniff on pypi https://pypi.org/project/biosniff/)
  • A complete benchmarking suite (see doc/Snakefile_benchmark file and benchmarking)
  • documentation and tests for all converters
  • removed the validators (we assume intputs are correct)
0.4.X
  • (aug 2019) added nexus2fasta, cram2fasta, fasta2faa ... ; 1-to-many and many-to-one converters are now part of the API.
0.3.X may 2019. new methods abi2qual, bigbed2bed, etc. added --threads option
0.2.X aug 2018. abi2fastx, bioconvert_stats tool added
0.1.X major refactoring to have subcommands with implicit/explicit mode