============
== vtranq ==
============

Genome App

Takeaway

  • Varsome cung cấp ACMG classification đầy đủ nhất » đầy đủ hơn Clinvar và Intervar
  • Cách cost-effective là tích hợp Varsome API vào app
  • Không scrape Varsome được (cho dù dùng proxy) vì Varsome check authentication, rồi check limit của user đó

Overview

Filter

  • Trên client response, user có thể lọc kết quả theo gene (hỗ trợ 1 danh sách nhiều gene cách nhau bằng dấu phẩy)
  • Khi filter thì nó sẽ tự động add thêm 1 header ADDITIONAL_USER_CLINSIG ở cuối; và tự động thêm giá trị cho cột này (lấy thông tin từ table CLINVAR)

Troubleshooting

ERROR 1: Cứ quay quay hoài khi xử lý file tsv lớn (VD: > 100MB), hoặc bị hết memory khi xử lý

  • Có thể do tập tin error.bed không writable, dẫn tới việc liftOver bị loop khi xử lý

ClinVar

Chứa các thông tin clinical significance của các variant

ACGM

Annotation tools

git clone https://github.com/WGLab/InterVar.git

and then access the InterVar directory in which you find the python script InterVar.py Call it on a normal VCF file with, as example (I use python3.7 but any distribution >3.3 should work properly):

python3.7 InterVar.py  \
    -b hg19 \
    -i <your_input.vcf> \
     --input_type=VCF \
     -o <your_output>

And then you will find your output files where you specified. Please note, in the same InterVar directory you must download Annovar executables (annotate_variation.pl, convert2annovar.pl, retrieve_seq_from_fasta.pl, coding_change.pl, table_annovar.pl, variants_reduction.pl) and the first time you run it will take long in order for Annovar to download all its reference databases in the "humandb" directory that will appear in the same "InterVar" directory where you launch the script.

Genome testing labs

ClinPred data

NOTE: trong dbNSFP cũng có tích hợp tính các điểm ClinPred

API

dbNSFP

REF: https://sites.google.com/site/jpopgen/dbNSFP

dbNSFP is a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. Its current version is based on the Gencode release 29 / Ensembl version 94 and includes a total of 84,013,490 nsSNVs and ssSNVs (splicing-site SNVs). It compiles prediction scores from 37 prediction algorithms (SIFT, SIFT4G, Polyphen2-HDIV, Polyphen2-HVAR, LRT, MutationTaster2, MutationAssessor, FATHMM, MetaSVM, MetaLR, CADD, CADD_hg19, VEST4, PROVEAN, FATHMM-MKL coding, FATHMM-XF coding, fitCons x 4, LINSIGHT, DANN, GenoCanyon, Eigen, Eigen-PC, M-CAP, REVEL, MutPred, MVP, MPC, PrimateAI, GEOGEN2, BayesDel_addAF, BayesDel_noAF, ClinPred, LIST-S2, ALoFT), 9 conservation scores (PhyloP x 3, phastCons x 3, GERP++, SiPhy and bStatistic) and other related information including allele frequencies observed in the 1000 Genomes Project phase 3 data, UK10K cohorts data, ExAC consortium data, gnomAD data and the NHLBI Exome Sequencing Project ESP6500 data, various gene IDs from different databases, functional descriptions of genes, gene expression and gene interaction information, etc.

Some dbNSFP contents (may not be up-to-date though) can also be accessed through variant tools, ANNOVAR, KGGSeq, VarSome, UCSC Genome Browser’s Variant Annotation Integrator, Ensembl Variant Effect Predictor, SnpSift and HGMD. Please cite our papers (see below) if you used dbNSFP contents through those tools.

Varsome API

    "acmg_annotation": {
        "verdict": {
            "ACMG_rules": {
                "approx_score": -7,
                "verdict": "Benign",
                "pathogenic_subscore": "Uncertain Significance",
                "clinical_score": 1.245138902391938,
                "benign_subscore": "Benign"
            },
            "classifications": [
                "BS1",
                "BS2",
                "BP1"
            ]
        },
        "version_name": "10.2.4",
        "classifications": [
            {
                "user_explain": [
                    "GnomAD exomes allele frequency = 0.00304 is greater than 0.00161  (threshold derived from the 17 011 clinically reported variants in gene TTN) (unable to check gnomAD exomes coverage)."
                ],
                "met_criteria": true,
                "name": "BS1"
            },
            {
                "user_explain": [
                    "Observed in healthy adults: gnomAD genomes allele count = 343 is greater than 5 for dominant gene TTN (good gnomAD genomes coverage = 31.5)."
                ],
                "met_criteria": true,
                "name": "BS2"
            },
            {
                "user_explain": [
                    "1 141 out of 1 185 non-VUS missense variants in gene TTN are benign = 96.3% which is more than threshold of 51.0%, and 5 708 out of 17 011 clinically reported variants in gene TTN are benign = 33.6% which is more than threshold of 24.0%."
                ],
                "met_criteria": true,
                "name": "BP1"
            }
        ],
        "coding_impact": "missense",
        "gene_symbol": "TTN",
        "transcript": "NM_001256850.1",
        "transcript_reason": "user-selected",
        "gene_id": 35160
    },
  • Varsome API có hỗ trợ cung cấp các thông tin như giới tính, độ tuổi, chủng tộc
Available GET parameters:
-----------------------

add-all-data = 1 or 0
add-region-databases = 1 or 0
expand-pubmed-articles = 1 or 0
add-main-data-points = 1 or 0
add-varsome-user-entries = 1 or 0
add-source-databases = all or none or dbnsfp-premium,uniprot-variants,sanger-cosmic,refseq-transcripts,iarc-tp53-somatic,gnomad-genomes-coverage,ncbi-clinvar2,gerp,iarc-tp53-germline,gnomad-exomes,wustl-civic,gwas,ensembl-transcripts,nih-gdc,isb-kaviar3,ncbi-dbsnp,pharmgkb,dbnsfp,gnomad-genomes,cadd,dann-snvs,saphetor-known-pathogenicity,cancer-hotspots,bravo,dbnsfp-dbscsnv,mitomap,sanger-cosmic-licensed,jax-ckb,weill-cornell-medicine-pmkb,variant-pubmed-automap,phastcons100way,gnomad-mito,cbio-portal,gnomad-exomes-coverage,phylop100way,icgc-somatic
allele-frequency-threshold = float
add-ACMG-annotation = 1 or 0
minimum-clinvar-stars = 0 or 1 or 2 or 3 or 4
exclude-source-databases = dbnsfp-premium,uniprot-variants,sanger-cosmic,refseq-transcripts,iarc-tp53-somatic,gnomad-genomes-coverage,ncbi-clinvar2,gerp,iarc-tp53-germline,gnomad-exomes,wustl-civic,gwas,ensembl-transcripts,nih-gdc,isb-kaviar3,ncbi-dbsnp,pharmgkb,dbnsfp,gnomad-genomes,cadd,dann-snvs,saphetor-known-pathogenicity,cancer-hotspots,bravo,dbnsfp-dbscsnv,mitomap,sanger-cosmic-licensed,jax-ckb,weill-cornell-medicine-pmkb,variant-pubmed-automap,phastcons100way,gnomad-mito,cbio-portal,gnomad-exomes-coverage,phylop100way,icgc-somatic
use-canonical-transcript = 1 or 0
override-transcript = str
add-AMP-annotation = 1 or 0
cancer-type = str
tissue-type = str
sex = Female or Male or f or m or female or male
age = int
ethnicity = AFR or ASJ or EAS or FIN or NFE or AMR or SAS or OTH
annotation-mode = somatic or germline

Papers

  • GREAT: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7584987/
    • Có sử dụng Varsome, CLinvar, Intervar, CLinPred
    • Implement với MongoDB
      • The database is built on MongoDB v4.2.1, a NoSQL query engine, to speed up database user queries and the variant calling file (VCF)-oriented analysis (Figure 2). HADA is built in Shiny v1.3.2, an R v3.6.1 package (R Foundation for Statistical Computing, Vienna, Austria) for building web apps. Specifically, we used ShinyJS v1.0 to run JavaScript code within the web app frontend and Plotly v4.9.0 to generate interactive plots. ANNOVAR v18.04.16 is used to provide annotations from the database in the uploads. To preserve potentially sensitive information included in the uploaded VCFs, HADA uses an encrypting code based on Cryfa [20] that automatically secures the access to the file and decrypts it once returned to the user. This provides a high level of security and transfers the data control to the user. No sensitive sample information is stored or maintained in the server.
    • (Great) HADA: https://github.com/genomicsITER/HADA
    • Có dùng Innovar
    • not all variants affecting function in HAE are described in current ClinVar or InterVar versions
    • Pathogenic probabilities according to ClinPred [17] and the ACMG pathogenic classification as determined by ClinVar [18] (March 5, 2019 release), InterVar [19] (January 18, release), and VarSome (accessed June 13, 2020) were also annotated
    • ClinVar offered very limited information on this set of variants affecting function, as only 34 of them (7.6%) had corresponding ACMG class assignment: 2 are reported as benign or likely benign, 6 are indicated as VUS, and 26 are classified as pathogenic and likely pathogenic. InterVar included information for half of the set (226/450, 50.2%). However, 171 (75.7%) of these were classified as VUS. VarSome was the only resource that allowed assigning ACMG classes to all retrieved variants affecting function. According to VarSome, most of the HAE variants affecting function are classified as pathogenic (183/450, 40.6%) or likely pathogenic (171/450, 38.0%) (Figure 3). Although VarSome did not classify any of the HAE variants affecting function as benign or likely benign, 96 of them (21.3%) were still reported as VUS. Besides, precalculated pathogenicity predictors were available for a mean of 243 of the variants affecting function in the database. Taken together, these results highlight the existing gap in current interpretations of variant pathogenicity

gs.force.vn

  • Tạm thời tắt DIRECT_MUTATION_TASTING (chức năng chuyển đổi location 38 sang location 37 và tạo link đến mutationtasting cho cột Uploaded_variation)
    • Do nếu bật chức năng này thì khi lọc All variants chương trình sẽ chạy rất lâu khi xử lý các file dữ liệu lớn

Resources