cravatp-galaxy-docker

A docker image flavor extended from the Galaxy docker image to include the CRAVAT-P tool and visualization plugin. https://jraysajulga.github.io/cravatp-galaxy-docker.

View the Project on GitHub

CRAVAT-P Galaxy Docker

A Docker image containing a fully-operational Galaxy instance with pre-installed demonstration material for CRAVAT-P.

main screen

Created as a demonstration for the following technical note for the Journal of Proteome Research:

Bridging the Chromosome-Centric and Biology and Disease Human Proteome Projects: Accessible and automated tools for interpreting biological and pathological impact of protein sequence variants detected via proteogenomics

Ray Sajulga, Subina Mehta, Praveen Kumar, James E. Johnson, Candace R. Guerrero, Michael C. Ryan, Rachel Karchin, Pratik D. Jagtap, and Timothy J. Griffin

What’s included


Table of Contents


Galaxy-P

Collaborators


Installation Guide

1.) Install Docker for Mac or PC. Open Docker.

2.) Open your terminal. Run the following command:

docker run -d -p 8080:80 galaxyp/cravatp

The image will now download from the public repository galaxyp/cravatp on Docker Hub. This should take around 15 minutes to download.

In the meanwhile, feel free to take some time to understand the different components of this Docker command. You can also read up on CRAVAT-P background information in the next section.

Component Type Description
docker Base command The base command for the Docker CLI (Command Language Interface)
run Command Run a command in a new container
-d, –detach OPTION Run container in background and print container ID
-p, –publish OPTION Publish a container’s port(s) to the host
galaxyp/cravatp IMAGE galaxyp’s cravatp image

More documentation can be found at Docker’s documentation website.

3.) Once the command is finished, wait a few moments for the Docker image to initialize as a container. Open http://localhost:8080 and follow the CRAVAT-P tutorial to access the CRAVAT-P suite. If you do not see the Galaxy screen, wait a few seconds and then reload the page.

Once you are finished using this container, you can clean up your workspace by simply exiting out of Docker.


Background

CRAVAT-P

(Cancer Related Analysis of VAriants Toolkit - Proteomics)

CRAVAT-P is a proteomic extension of CRAVAT (http://cravat.us) developed for the Galaxy-P (http://galaxyp.org) bioinformatics platform. CRAVAT-P exists as a downstream analysis suite for peptide variants. Current support is tailored towards workflows that generate peptide sequences mapped to genomic locations.


Galaxy Tool

tool

The figure above shows the Galaxy tool developed for submitting jobs to the CRAVAT server. It extends from an earlier version of In Silico Solutions Galaxy tool (cravat_score_and_annotate). In our CRAVAT-P tool, we added support for additional parameters: CHASM classifiers (e.g., breast, brain-glioblastoma-multiforme, etc.) and the older GRCh37/hg19 human genome build. We also added proteomic support, as highlighted by the outlined red box. Here, a proBED file can be provided for intersection with the genomic input file—VCF (Variant Call Format). You can specify whether you want to output the intersected VCF file or submit only the intersected variants.

Example input files

VCF (Variant Call Format)

ID Chr. Position Strand Ref. base Alt. base
VAR527 chr12 6561055 + T C
VAR529 chr12 110339630 + C T
VAR532 chr14 102083954 + C T
VAR539 chr19 17205335 + A T
VAR541 chr19 17205973 + T C
VAR542 chr19 18856059 + C T

ProBED (Proteomic Browser Extensible Data)

Chr. Start End Peptide Strand
chr12 6561014 6561056 STGVILANDANAER -
chr12 110339607 110339637 EWGSGSDILR +
chr14 102083930 102083972 GVVDSENLPLNISR -
chr19 17205327 17206022 GRMGEPGAEPGHFGVCVDSLTSDK +
chr19 18856027 18856078 EAIDSPVSFLVLHNQIR +

Galaxy Workflow

viewer

Galaxy workflows are tailored pipelines that promote reproducibility, ease-of-use, and preservation of complex analyses. Two workflows, both with differing complexities, are shown above. The simple workflow (top left panel) was used for the paper and Docker image to redirect focus to the downstream analysis i.e., CRAVAT-P’s outputs and viewer. A fully-fledged workflow (bottom panel) is shown as an example of a highly complex workflow. The top right panel shows how workflows can automate parameter selection and offer additional options such as e-mail notification and output cleanup.


Galaxy Viewer Plugin

Galaxy uses JavaScript-based visualization plugins to interactively explore your data.

Panel A shows the actual viewer, with panels B - E as blown-up images for further detail.

(A-i) Sidebar for showing additional information, mainly column visibility toggling. There are many columns to sift through > from CRAVAT’s annotation.

(A-ii) An embedded webpage from the CRAVAT server termed their “Single Variants Page” feature.

(B) Leveraging the DataTable.js library, this table can be sorted and filtered. By default, it is sorted by p-values (based on the machine learning analysis i.e., VEST or CHASM) from most impactful to least. The selected box exhibits a peptide column that highlights the variant amino acid within a peptide hit. Since some cells may have large amounts of text, the full datum is shown in the display box at the top.

viewer

(C) CRAVAT uses Protein Diagrams to show lollipop mutations from your given protein variant. You can also choose TCGA (The Cancer Genome Atlas) tissue mutations. You can mouse over different parts to show domains, binding sites, and other regions of interest.

(D) CRAVAT uses the cytoscape.js library to display gene enrichment networks housed by the NDEx (Network Data Exchange) infrastructure. You can move elements around and examine different pathways.

(E) CRAVAT uses another project developed by the same lab (Professor Rachel Karchin’s lab of John Hopkin’s University) called MuPIT (Mutation Position Imaging Toolbox) designed to show the location of single nucleotide variants (SNVs) on interactive three-dimensional protein structures. You can click on individual residues and adjust the display options.


CRAVAT-P Tutorial

Overview

Import the input files → Run the workflow → Access the viewer

1.) Import the input files from the data library

step-1

2.) Log in and run the workflow

step-2

3.) Access the viewer

step-3