|
Management Of A Molecular Database In OI
Raymond Dalgleish
Department of Genetics, University of Leicester, Leicester, United Kingdom.
|
Osteogenesis imperfecta is the result of mutations in the genes encoding the alpha1 and alpha2 chains of type I collagen, COL1A1 and COL1A2 respectively. The types of mutations observed in these genes are comparable with those in many other genes and can be categorised as:
Single amino acid substitutions
Mutations affecting RNA splicing
Insertions, deletions and other complex rearrangements
Polymorphisms not resulting in a discernible phenotype
The mutation data for OI are available on the web at http://www.le.ac.uk/genetics/collagen as part of a larger project to list all known mutations for collagen type I and III mutations (Dalgleish, 1997;1998). It is intended that data for collagen types II and V will be added in the near future.
A necessary early aspect of the development of the database was the establishment of reference cDNA sequences for the two alpha chains. These reference sequences were compiled from existing published sequences and were annotated and deposited in the DNA databases with accession numbers Z74615 and Z74616 respectively for the alpha1(I) and alpha2(I) cDNAs. Initially, the reporting of mutations was based on a numbering system which designated the first base of each cDNA as "1". In accordance with the recommendations of Antonarakis et al. (1998), a revised numbering system has now been adopted in which base "1" is the first base of the start codon of each chain. As a consequence, the numbers now used to designate mutations are 119 and 139 less for alpha1(I) and alpha2(I) respectively.
The reference cDNA sequences have served their intended purpose well but they are based solely on the published cDNAs available in 1996. Since then, the Human Genome Project has brought about a vast increase in the amount of sequence information available for type I collagen. The UniGene project (http://www.ncbi.nlm.nih.gov/UniGene/index.html) collates cDNA data for individual genes derived from expressed sequence tags (ESTs) into clusters. In June 1999 there were 961 a1(I) and 875 alpha2(I) cDNA sequences in the UniGene clusters Hs.172928 and Hs.179573 respectively. If the individual cDNA sequences of each cluster were aligned with one another, it would provide two types of information about the coding sequences for each chain. First, new reference consensus sequences could be compiled, based on much more extensive data and, secondly, the data would reveal naturally occurring polymorphisms amongst the individuals whose cDNAs are represented in the clusters. However, the computational resources necessary to carry out these analyses are formidable and there is also the requirement for the sequences to be individually checked, where possible, against the automated DNA sequencing traces from which they are derived. Thankfully, the first aspect has already been undertaken by the Institute for Genome Research (http://www.tigr.org) and the reference sequences will be updated in light of these data. Mining the EST data for polymorphisms remains a problem but preliminary attempts are underway.
The primary means of distribution of the OI mutation data is the web pages. At present, the presentation format is deliberately kept simple so that the information can (mostly) be accommodated on an 800 pixel-wide screen, reducing the need to scroll the screen to the right to read long lines. Where possible, the citation to the published account of each mutation is linked to PubMed (http://www.ncbi.nlm.nih.gov/PubMed/) and, in time, links will be provided to more detailed reports of individual mutations.
In compliance with published guidelines (Scriver et al., 1999), data concerning all type I collagen mutations are being stored in a Microsoft Access 97 database from which web pages can be generated. Ideally, the data should also be searchable by links from the web pages to the database itself. Alternatively, the database can be used to output structured data files that can be can be searched by generic search engines such as SRS (http://srs.ebi.ac.uk/). Both approaches are being taken.
Mutation maps indicating the single amino acid substitutions were developed during the early years of the OI database but were subsequently abandoned due to the ever increasing density of data and lack of an easy method for distributing the maps. The wider availability of low-cost colour inkjet printers means that maps no longer need to be monochrome and so the plotting of data has become less of a problem. Also, the widespread availability of the Adobe Acrobat plug-in for web browsers has now solved the distribution problem.
The first OI mutation was published in 1984 but, following this, progress in finding many others was slow. From 1990, greater progress was made, driven by widespread adoption of the polymerase chain reaction (PCR) which markedly speeded up mutation detection. The recent downturn in the number of published mutations probably has little to do with mutation detection per se. Rather, it is a reflection of the increasing difficulty in having papers accepted for publication where no major new insight into the pathology of OI is provided by precise knowledge of the underlying mutation.
The possibility of having publishers accept many more reports of OI mutations must be in some doubt. Consequently, the future dissemination of information about these mutations almost certainly lies with well curated locus- or disease-specific databases. In an attempt to address the problem, the present OI mutation web site also hosts password-protected pages of unpublished mutations. The information has been contributed by a number of laboratories who have access to one another's data. The private nature of these data will need to be reviewed once a system of mutation accession numbering and comprehensive reporting is well established.
Antonarakis SE, the Nomenclature Working Group (1998) Recommendations for a nomenclature system for human gene mutations. Hum Mutat 11: 1-3
Dalgleish R (1997) The human type I collagen mutation database. Nucl Acids Res 25: 181-187
Dalgleish R (1998) The human collagen mutation database 1998. Nucl Acids Res 26: 253-255
Scriver CR, Nowacki PM, Lehväslaiho H (1999) Guidelines and recommendations for content, structure, and deployment of mutation databases. Hum Mutat 13: 344-350
Reference: Proceedings of the 7th International Conference on Osteogenesis Imperfecta. Montreal, Canada, 1999.
|
|
|