Phylogenetic assessment and in silico characterization of cytochrome b protein of three alpheid shrimps Muthusamy THANGARAJ

Cytochrome b (cyt b) is one of the cytochrome proteins involved in electron transport in the respiratory chain of mitochondria. Cyt b is the only gene among the cytochrome complex coded by mitochondrial DNA. It is the most widely used gene for phylogenetic assessment and inter species variation studies. Here, the amino acid sequence of cyt b in three snapping shrimps such as, Alpheus lobidens, A. randali , A. bellulus was analysed and the results showed higher similarity in A. lobidens and A. randali as reflected in the phylogenetic tree. This study describes the applications of bioinformatics tools to predict the physico-chemical characters of cyt b protein. This protein was composed of least percentage of Cys (0.8%) and highest percentage of Leu (13.8%). The maximum molecular weight (MW) was predicted as 42.62 KDa in A. randali . The theoretical pI value was ranged from 8.35 to 8.36 and confirmed that cyt b was alkali in nature. The instability index value was in the range of 42.29 to 46.94 which showed the protein was more stable. The secondary structure of this protein was primarily composed of α-helixes and random coil, revealing the stable structure. The comparative modelling was performed by Swiss model where the 3-D crystal structure of bovine cyt bc1 (6haw1.c) was used as template. Ramachandran plot analysis showed that most of the amino acids (>92%) falling on the favoured region. Seven conserved motifs were identified by MEME analysis. The modelled 3-D structure of this protein was validated by PROCHECK and QMEAN. The transmembrane protein topology and helix probability curve was predicted by TMHMM server. Protein-protein interactions was analysed by STRING tool and found the network of cyt b with related proteins. The results of this study may provide valuable insights into fundamental characteristics of cyt b in Alpheid shrimps.


Introduction
Cytochrome b is one of the proteins occurs in eukaryotic mitochondria. Its main role is electron transport with the help of heme group and is the major subunit of transmembrane cytochrome bc1 and b6f complexes (Esposti et al., 1993). Cyt b is an integral membrane protein made up of about 400 amino acids composed of eight transmembrane sections (Berry et al., 2000). Though, the cyt b gene is highly conserved in some group of vertebrates, this is used in interspecific molecular systematic studies (Chen et al., 2009). Mutation in cyt b may lead some adverse effects in various animals and reduces their fitness (Singh et al., 2012).
The snapping shrimp, (Family: Alpheidae) is one of the most taxonomically and ecologically diverse groups of coral-reef fauna and sponge-dwelling shrimp with over 500 species worldwide, and many species are symbionts with a diverse range of larger host taxa such as corals, anemones, sponge, fishes and mangroves (Chace, 1988). Alpheid shrimps are commonly distributed in tropical and shallow waters and some species occur in deeper waters (Anker et al., 2015). In Alpheidae family, about 4 genera and 23 species were recorded in India and many of these species may cryptic species and need taxonomic revision (Jha et al., 2019). When comparing to the genomic data, the proteomic information in Alpheid shrimps is very meagre till date.
Comparative modelling or homology modelling is a type of methodology to build an atomic-resolution model of a target protein by its amino acid sequence (Martí-Renom et al., 2000). Comparative modelling can construct high-quality structural models, if the target and template are more similar. This method becomes more attractive in the field of bioinformatics because the insights of the 3-D structure of a protein would be a helpful support to understand the structural and functional aspects of a particular protein (Gupta et al., 2009).
Since there is no crystal structure or predicted structure of cyt b in Alpheid shrimps is available, this study was planned to predict the 3-D structure of cyt b in Alpheus lobidens, A. randali and A. bellulus and to share some important findings about this protein.

Materials and Methods
Sequence retrieval and phylogenetic analysis Aminoacid sequence of cyt b in Alpheus lobidens, A. randali and A. bellulus were retrieved from the National Centre for Biotechnology Information (https://www.ncbi.nlm.nih.gov/protein) in FASTA format. The genetic variation between the three species, phylogenetic tree using neighbour joining algorithm were analysed in MEGA 4.0 software (Tamura et al., 2007).

Physicochemical characterization
The physiochemical properties such as amino acid composition, isoelectric point (pI), molecular weight (MW), total number of positive (+R) and negative (-R) residues, instability index (II), aliphatic index (AI), and GRAVY (grand average of hydropathicity) values were identified by ExPASy -ProtParam.

Secondary structure and conserved motif prediction
The secondary structure was predicted using SOPMA (Self-Optimized Prediction Method with Alignment) (Geourjon and Deléage, 1995) tool by the default parameters (window width, 17; similarity threshold, 8; and number of states, 4). All domains and conserved protein motifs were analysed using a protein BLAST in the MEME (Multiple Em for Motif Elicitation) (http://meme-suite.org/doc/fasta-format.html) (Bailey et al., 2009).

3-D structure construction and evaluation
The modelling of the three-dimensional structure of the proteins was performed by Swiss model (Arnold et al., 2006). The suitable template protein was selected by performing PDB-BLAST for the 3-D structure from the protein data bank. The PDB-BLAST result showed 50 templates and out of which, the 3-D structure of bovine cyt bc1 (6haw1.c) crystal structure showed more similarity (66.31%). For homology modeling, the Swiss-model server (https://swissmodel.expasy.org/) (Bordoli et al., 2009) was used to model the cyt b structure using 6haw1.c as a template. The resulting model was saved as .pdb file and visualized by Swisspdb viewer (Guex and Peitsch, 1997). The stereochemical property (φ and ψ distributions) and structural consistency of the predicted model was evaluated by Ramachandran plot analysis in RAMPAGE (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) (Lovell et al., 2003). The predicted 3-D model was validated by PROCHECK (Laskowski et al., 1996), QMEAN (Quantitative Model Energy Analysis) (Benkert et al., 2011) and QMEAN-Disco (Studer et al., 2020). The transmembrane protein topology and helix probability curve was predicted by TMHMM server V.2.0 (http://www/cbs.dtu.dk/services/TMHMM.2.0) Tertiary structure analysis For tertiary structure of A. lobidens cyt b was analysed by PyMol (Schrodinger and DeLano, 2020). The quinone reduction (Qi) site residues such as His202, Lys228, and Asp229 were identified. The heme binding residues such as His84 and His183, His98 and His197 were verified and highlighted.

Protein interaction analysis
Protein-protein interactions are a central part of the cellular network and are known to have various impacts. To know the interaction about cyt b of A. lobidens with other closely related proteins, STRING v 10.0 (http://string-db.org/) (Szklarczyk et al., 2015) Server was used. Cyt b of A. lobidens was selected as a query sequence and functional protein association network was generated.

Results and Discussion
The K2P genetic distance between the three species is shown in Table 1. The maximum K2P distance (0.032) was found between A. randali and A. bellulus, the mean genetic distance was 0.030. Figure 1 shows the phylogenetic relationship based on cyt b protein sequences using Neighbor -Joining Algorithm. The tree shows a significant phylogenetic relationship among the studied shrimp species. The physicochemical properties of cyt b in three shrimps were predicted using ExPASy -ProtParam and the results are shown in Table 2. The amino acid sequence size was 378 and the amino acid composition of this protein is given in Figure 2. Cyt b was composed the least percentage amino acid of Cys (0.8%) and highest percentage of Leu (13.8%). Totally, 14 positive charged and 16 negatively charged amino acids were recorded in this protein. The maximum molecular weight (MW) was predicted as 42.62 KDa in A. randali. The theoretical pI value was ranged from 8.35 to 8.36 and confirmed that cyt b was alkali in nature. The instability index value was in the range of 42.29 to 46.94 which showed the protein was stable. The aliphatic index (AI) of the proteins of thermophilic bacteria has been found to be higher and the index could be used as a measure of thermostability of proteins. This index value is directly proportional to the number of Ala, Ile, Leu and Val in a protein (Idicula-Thomas and Balaji, 2005). In this study, these four amino acids were more in number and AI values were ranged from 114.29 to 115.05 (Table 2). The GRAVY was calculated to measure the protein hydrophilicity/hydrophobicity. Hydrophobicity is revealed by a positive GRAVY value while hydrophilicity is revealed by the negative value (Kyte and Doolittle, 1982). Here, the GRAVY value was found to be 0.764, 0.770 and 0.771 indicated that cyt b in three shrimps were hydrophobic in nature.  The secondary structure of cyt b was predicted by SOPMA and the results are given in Table 3. The secondary structure of this protein was primarily composed of α-helixes and random coil, revealed the stable structure. In A. lobidens, out of 378 amino acid residues, 37.30% residues formed α-helix, 21.42% formed extended structure, 7.673% amino acids formed β-turn and 33.61% residues formed random coiled structure.
The transmembrane topology and helix probability curve of cyt b was predicted by TMHMM server and the result is presented in the Figure 3D. Here, the cyt b found to be a trans-membrane protein containing N-terminal signal peptide and 9 transmembrane helices. The domains and conserved motifs of this protein was analyzed by MEME and the results are depicted in Figure 4. Seven conserved motifs were observed in three protein sequences and the motifs width was ranged from 29 to 50. Similar type of transmembrane topology was reported in fishes (Ebenezer et al., 2005;Gosh et al., 2020) and lizard (Chen et al., 2009).   The modeling of 3-D structure of the protein was performed by homology modeling program, Swiss-Model. The final modeled structure was visualized by Swiss-PDB Viewer and shown in Figure 3A. For model validation, the Ramachandran plot and PROCHECK were analysed. The geometrical and structural consistency of the predicted model was evaluated by different approaches. Ramachandran plot is a way to visualize backbone dihedral angles ψ against φ of amino acid residues in protein structure. The φ and ψ distributions of the Ramachandran map generated by non-glycine, non-proline residues were summarized in Table 3. Figure 3B. In A. lobidens, 94.2% of the amino acids were found in the favoured region, 5.5% were in the additionally allowed region and only 0.3% were in the outlier region. In A. randali, 92.6% were in the favoured region, 7.1% found in the additionally allowed region and 0.3% were in the outlier region. These findings reflect the results of previous studies on Carangoides equula (Ebenezer et al., 2005). In the case of A. bellulus, 95.1% were in the favoured region, 4.6% were in the additionally allowed region and the remaining residues were in the outlier region. Gly residue having a hydrogen atom as its variable (R) group and it can provide much flexibility for adjacent residues for conformational changes. Therefore, it is not surprising that Gly plays a crucial role in the structure and function of any protein (Yan and Sun, 1997). In this study, the highest amount of Gly residues (25) were detected among the three shrimps cyt b protein. The QMEAN score values are given in Table 4, which shows the quality of the modeled protein. The QMEAN-Z score values for c-β, interaction, packing, torsion, S-S agreement and QMEAN4 were confirmed the structural confidence.
The predicted result shows that the cyt b protein of A. lobidens spans the mitochondrial membrane with nine transmembrane (TM) helices with both the N-and the C-terminus are located in the mitochondrial matrix. In the tertiary structure, the nine helices (H1-H9) are arranged in two helical bundles, one consisting of three helices (H1-H3) and the other having six helices (H4-H9) ( Figure 5).  The transmembrane (TM) helices were connected by seven extra-membrane loops (L1-L7), including four long loops (L3, L4, L5, L6) and three short loops (L1, L2, L7). Near to the H1 and H7 transmembrane helixes, two small helixes (SH1, SH2) were found. Between the loops L4 and L7, two short helices forming a hairpin arrangement, namely, CD1 and CD2. Among these seven loops, L3, L5, L6 are in the matrix side, while the remaining four loops (L1, L2, L4, L7) are on the intermembrane space (IMS) side. In addition, a small helix (AH) was located very close to N-terminal. The cyt b contains two bound hemes and two ubiquinol/ubiquinone (Qo/Qi) binding sites. The heme binding amino acids are His84, His183, His98 and His197, these four histidines are highly conserved in nature (Gao et al., 2003). The four long loops maybe most important to the function of cyt b, as they are the primary participants in the formation of the quinol oxidation (Qo) and quinone reduction (Qi) site. In Qi site, the residues such as, His202, Lys228, and Asp229 are highly conserved (Chen et al., 2009). Gao et al. (2003) explained that the His202 binds to the carbonyl oxygen of the bound ubiquinone through a water molecule; Asp229 interacts with the other carbonyl oxygen via another water molecule that was stabilized by Lys228. Hacker et al. (1993) reported that, if any mutation occurs in these three residues may decrease the rate of quinone reduction. This suggests that these three residues play a crucial role in Qi binding site. In this study, it was noticed that, two short helices (CD1, CD2) existed between the loop L4 and L7. According to previous report by Chen et al. (2009) these L4 and L7 are the predominant portion for ISP (iron-sulfur protein) binding site, and the instability of these loops could affect the function of these two short helices (CD1, CD2). The protein-protein interaction (PPI) network of cyt b of A. lobidens is shown in Figure 6. By this PPI network, it clearly demonstrates, cyt b protein is interacted more with other mitochondrial proteins such as Nd1, Nd2, Nd3, Nd4, COI, COII and COIII.

Conclusions
Bioinformatics studies on protein, nucleic acid and other biomolecules are very supportive to solve many problems in almost all bio research fields. Advances in computing tools offer the opportunity to analyze the functional, physical and chemical properties of gene and gene product. Experiments used by sophisticated instruments to characterize a protein are an investment of high cost and time consumption. By molecular modeling technique, it is easy to design new drugs and molecules which accurately bind to targets. Comparative modeling is also a powerful method in the field of bioinformatics because the knowledge of 3-D structure of a protein would be an invaluable aid to delineate the structural and functional details of a particular protein.
Though the Alpheids are hiding in burrows and not involved in active swimming they may conserve much energy. The hypothetical structures of cyt b in alpheid shrimps may provide a good basis for experimental analysis related to their oxygen consumption and energy utilization.

Authors' Contributions
The author read and approved the final manuscript.
Ethical approval (for researches involving animals or humans) Not applicable.