About

Authors

D. Stanek1, D. Bis2, C. Saghira2,P. Lassuthova1, P. Seeman1, S. Zuchner2

1 Department of Pediatric Neurology, DNA Lab, 2nd Faculty of Medicine, Charles University in Prague and University Hospital Motol, Prague, Czech Republic

2 Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136, USA

Introduction:

Nowadays there are many tools for NGS variants evaluation, none of them provides information if a specific variant is translated into an annotated domain or feature of protein. NCBI database provides information about domains and features but without connection to genomic coordinates. The availability of such mapped information will be advantageous in characterizing and annotating DNA variants detected in exome and genome studies.

Materials and methods:

For gathering all of the necessary data were used Entrez Programming Utilities [1]. This tool helps to access all desired information about proteins from NCBI. After processing protein features, they were reverse translated and mapped to the reference genome hg19 and stored in a SQL based database (SQL server and My-SQL).

Results:

The resulting database provides information about 760,487 features, from 42,371 proteins. There are 522,660 protein Regions (19,375 unique types) with an average length 647.37 bp and 237,827 protein Sites (19 unique types) with average length 14.78 bp. After entering genomic coordinates (e.g. chr2:15229777) into our website a user will retrieve a list of protein features, which covers the entered coordinates. Each record is structured into Gene name, Refseq IDs (NM_xxxx and NP_xxxx), type (Region/Site), all NCBI content and chromosomal location of start and end of feature (divided to cover only exons).

Conclusion:

We have created the most comprehensive mapping of protein features of chromosomal coordinates to enhance DNA variant annotation. The information in this database will help identify novel functional variation and further assists in sifting through the large number of coding variants produced by NGS.

Supported by: AZV-16-30206, GAUK 388217 and NINDS R01NS075764

Project was presented:

ESHG 2017 poster in pdf

manuscript (in preparation)