![]() | ||||||||
|
GENEVA is a supplementary website for Zheng, Y. | ||||||||
|
SVG stands for segmentally variable gene. They are genes that have one or more sizable variable regions interspersed among well-conserved regions by sequence comparison analysis. While most current sequence analyses focus on regions widely conserved across diverse lineages, variable regions are sometimes ignored during the analysis. | ||||||||
|
It has been known that the variable domains inside some gene families confer them multiple specificities,e.g., C-5 DNA methyltransferases, etc. Through bioinformatic approach we find hundreds of genes that have similar variability structures along sequences. Immediate following questions can be: What kind of genes tend to have such variable regions? What could be the function of those variable regions? Is there any commonality among these functions? The juxtaposition of well-conserved regions and the variable regions suggests their function may be related if any. In cases where the conserved portion has already been assigned a biochemical function, initial guesses toward the function of the variable region can always be formulated: either binding to a different target molecule, or having a different sequence specificity. These hypotheses can later be tested in hands of biochemists. Although answers to each case may differ, we have suggested that commonality in function of the variable regions is that they may mediate interaction with other molecules. | ||||||||
|
In this site, you will find many examples where such variable regions exist in a number of completely sequenced microbial genomes. Do some clicks and see how SVGs prevail in the sequence universe. You may find genes that have the function of your interest. Have the variable region been chracterized in that gene family? If yes, let us know (zhengyu@bu.edu). If no and you have some wild guesses, also let us know. We would be very happy to hear comments from you. | ||||||||
|
One needs to be extremely careful to interpret the result when looking at the variable region inside a gene family. Other factors that can contribute to the regional variability include:
|
| Sequences are searched against PFAM release 7.6 (2002). Since PFAM is evolving over time, the user is suggested to do a quick PFAM search if a PFAM domain listed here is not found. |
| In this diagram, three colored blocks are seen: blue - nongapped HSPs between the query protein sequence and the hit protein sequence; yellow and grey - unaligned regions, however, when two unaligned regions are of similar length and anchored by same sets of conserved HSPs, they are colored yellow and they are the candidate regions to be considered as "variable"; when two unaligned regions are of significantly different length (e.g., gap content larger than 0.3 as defined in the paper), they are likely due to segment insertion or deletion in protein sequences, they are colored grey and they are not considered as candidate variable regions. |
| Hierachical clustering is performed on the collection of corresponding variable regions (regions that are bounded by a same set of well-conserved regions) among a family of similar genes. Briefly, at each clustering step, two clusters with the smallest distance are joined. Distance between two variable regions is their percent identiy calculated from ClustalW report. Cluster-cluster distance is defined as an average of all pairwise distances between the elements in each cluster. The clustering procedure stops when no distance between any two clusters is beyond 30%. Usually, variable regions from phylogenetically close species will be grouped together. (The result is only shown in several genomes.) |
| Clusters are reported after hierachical clustering procedure on the variable regions. Usually phylogenetically close species are grouped into a same cluster. As a result the bias in the data is reduced. MEME is then used to look for conserved short motifs that may exist among these distinct clusters. One sequence is chosen from each cluster as a representative and then this dataset is fed into MEME. Reported short motifs may be suggestive to the common function of the variable regions if any. The user should be cautious when seeing short motifs at the end that is close to the conserved regions, because they may be just extensions of the conserved regions due to imprecise detection of the boundaries of the variable regions.(The result is only shown in several genomes.) |