metaquest
is a command-line tool designed to help users search through all SRA datasets to find containment of specified genomes. By analyzing the metadata information, it provides insights into where different species may be found.
- Clone the repository:
git clone https://github.com/FOI-Bioinformatics/MetaQuest.git
cd MetaQuest
- Install the requirements:
pip install -r requirements.txt
- Install MetaQuest:
python setup.py install
To get started, you can download a test genome using the download-test-genome
command. This command fetches a sample genome from NCBI based on a predefined accession number.
metaquest download_test_genome
After acquiring the genome, you can run the mastiff
command to search for matches in the SRA.
metaquest mastiff --genomes-folder genomes --matches-folder matches
genomes-folder
: The directory where genome files are located.matches-folder
: The directory where the results will be saved.
After the mastiff
run, you can summarize the results using the summarize
command. This will generate a summary file and a containment file.
metaquest parse_containment --matches_folder matches --parsed_containment_file parsed_containment.txt --summary_containment_file summary_containment.txt --step_size 0.05
Example output: summary.txt and containment.txt
To get additional information about each SRA dataset, you can download metadata using the download-metadata
command.
metaquest download_metadata --matches_folder matches --metadata_folder metadata --threshold 0.95 --email [EMAIL]
matches_folder
: Directory containing match files.metadata_folder
: Directory where the metadata files will be saved.threshold
: Only consider matches with containment above this threshold.
Once the metadata is downloaded, you can parse it to generate a more concise and readable format.
metaquest parse_metadata --metadata_folder metadata --metadata_table_file parsed_metadata.txt
Example output: parsed_metadata.txt
This step helps in understanding the distribution of metadata attributes.
metaquest check_metadata_attributes --file-path parsed_metadata.txt --output-file parsed_metadata_overview.txt
Example output: parsed_metadata_overview.txt
This step helps in understanding the distribution of genomes across different datasets.
metaquest genome_count --summary-file summary.txt --metadata-file parsed_metadata.txt --metadata-column Sample_Scientific_Name --threshold 0.95 --output-file genome_counts.txt
Example output: genome_counts.txt
To analyze a single sample from the summary, you can use the single_sample
command.
metaquest single_sample --summary-file summary.txt --metadata-file parsed_metadata.txt --summary-column GCF_000008985.1 --metadata-column Sample_Scientific_Name --threshold 0.95
Example output: collected_stats.txt
We welcome contributions to metaquest
! Whether you want to report a bug, suggest a feature, or contribute code, your input is valuable. Here's how to get started:
- Fork the Repository: Create your own fork of the
metaquest
repository. - Clone Your Fork: Clone your fork to your local machine and set the upstream repository.
- Create a New Branch: Make a new branch for your feature or bugfix.
- Make Your Changes: Implement your feature or fix the bug and commit your changes.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Create a Pull Request: From your fork, open a new pull request in the
metaquest
repository.