Primate ABO Gene is under Weak Positive Selection

ABO locus presents three main alleles: A, B and O. A and B encode glycosyltransferases that catalyze the addiction of an N-GalNac and D-galactose to a precursor substance (H substance), producing A and B antigens, while the O allele does not produce a functional protein. The presence of A and B antigens have been associated to resistance against infectious agents which could use them as attachment factors increasing the virulence of some parasitic agents. As these antigens are not restrict to humans, analyses them in others species, for instance non-human primates, may be crucial to understand the relationship between pathogens and ABO phenotypes. Despite of the relevance of this issue, in the last decade few studies have addressed, mainly in New World Monkeys (NWM), natural reservoir of tropical diseases in Amazon Region. In order to understand the evolution of the ABO system in the primates, it has been obtained the partial sequence of the most important exon of ABO gene (exon 7), in platyrrhini families: Atelidae, Pithecidae and Cebidae. Then, it has been compared the sequences obtained those present in the literature, and measured the selective pressure. The present results shown that residues 266 and 268 are also crucial to distinguish A and B phenotypes in the platyrrhines, such as in catarrhines, and the 266 codon is under positive selection, although the most site codons are under action of purifying selection.


Introduction
The ABO histo-blood system is the most well known and clinically important blood group and was one of the first genetic traits applied to human population studies, but its biological and functional role has remained an enigma. The co-dominant alleles A and B, encode A and B glycosyltransferases that add N-acetyl-D-galactosamine (GalNac) or (and) D-galactose (galactose) to a precursor substance (H substance), producing the A or (and) B antigens, respectively. O allele is a recessive and null allele produces a non-functional enzyme due one single nucleotide deletion that changes the frame of codons (Bennett et al., 1995;Yamamoto et al., , 1995Yazer, 2005). After the identification of the molecular structure of the ABO gene, many molecular polymorphisms were found for this locus in human and in other primates. A current theory on this genetic variability it is that may be driven by infectious agents which use specific sugar of the erythrocyte surface such as the A and B antigens as receptors for invasion (Baum et al., 2002;Gagneux and Varki 1999), diversity would then be a response to constant pressure in the host to combat the pathogen (Haldane, 1949;Pfennig, 2001).
Indeed there is a growing number of publications that indicate the relationship between ABO group (and others blood systems) to infectious agents as Escherichia coli, Plasmodium falciparum, P. vivax, Candida spp., Helicobacter pylori, Vibrio cholerae, Human immunodeficiency virus (HIV), parvovirus and influenza virus (Aho et al., 1980;Anstee, 2010;Germain et al., 2011;Neil et al., 2005;Rowe et al., 2007;Tregouet et al., 2009). A well established example of selective pressure exerted by pathogens on ABO blood group is the high frequency of the O phenotype in the areas endemic for P. falciparum and the prevalence of groups A and B in the areas with V. cholerae, due to vulnerability related to ABO specific phenotypes (Harris et al., 2005;Williams, 2006).
No doubt, study the blood groups in non-human primates may clarify important points on the tolerance and selective pressure of the pathogens on human population, mainly because some of parasitological diseases are common to most primates. Despite of the relevance of this issue, in the last decade few studies have addressed, specialty in platyrrini, known also as New World Monkeys (NWM), an important natural reservoir of tropical diseases in Amazon Region.
Because of the pertinence and importance of this approach, it has been clarified the main region of the ABO gene (exon 7) in 14 species of the platyrrhini families: Atelidae, Cebidae and Pitheciidae. Some of these samples have ABO blood type unknown, but based on molecular characteristics of the gene region studied, it was possible estimate the possible phenotypes of them. These samples were compared to sequences of catarrhine primates (human, apes and Old World monkey) present in the literature in order of identify differences among primates, and to measure the selective pressure under each lineage and codons in this ABO gene region.
has been used the methods Branch Models and Site Models, respectively. For these analyses, it has been utilized primate data sets, excluding Rodentia sequences.

Materials and methods
Extraction, PCR amplification and sequencing of DNA The genomic DNA from 22 samples from NWM (acquired in Blood Sample Bank of Laboratory of Molecular Biology-UFPA) was obtained by standard phenol-chloroform extraction as described by Sambrook et al. (1989).
Only the exon 7 was amplified because it encodes 344 amino acids corresponding to 65% of catalytic domain of the ABO protein, and presents the critical residues located at codon 266 and 268 that distinguish A and B transferase in the catarrhine primates. The exon 6 shorter with only 45 amino acids, it was not used in this analysis, but was sequenced. The primers were constructed based on human sequence described by . They are as follows: ABO6.3: 5'-CACCATTGGGTTAACTG-3' , ABO1403: 5'-AGTGCTTCTCCGCCGTCTCC-3' , ABO 7a1: 5'-CCACTACTATGTCTTCACCG-3' and ABO 7c2: 5'-CAAACCCACCAAGGTGCTC-3' . The polymerase chain reaction (PCR) was performed using 1 X reaction buffer, 100 ng DNA, 0.4 mM of each primer, 0.03 U/μl Taq DNA polymerase, 1.4 mM MgCl 2 and 10 mM of each dNTP. Amplification was performed with the following temperature: 5 min at 95°C followed by 35 cycles of 95°C for 1 min, 55°C for 1 min, and 72°C for 1 min. The PCR product was purified using Kit Wizard PCR Preps (PROMEGA) and sequenced utilizing the dideoxynucleotide chain termination method (Sanger et al., 1977) the ABI 377 (Applied Biosystem) automatic sequencer.
These sequences were aligned using the MUSCLE of the MEGA 5.05, an algorithm that allows align the nucleotide based in the amino acid sequences (Edgar, 2004;Tamura et al., 2011). Mus musculus (AK035261) and Rattus norvegicus (AB081649) sequences were included to data set for compare the amino acid residues.

Analysis of molecular evolution
This analysis was performed using dN/dS ratio (nonsynonymous/synonymous substitution rates, known as ω parameter) obtained with Maximum likelihood framework. According to the value of ω is possible to estimate the natural selection on a protein, then ω < 1 means negative purifying selection; ω = 1, neutral evolution, and ω > 1, mean positive selection. In order of detect positive selection on the primate lineages and under the codon sites, it

Branch models
This model also known as "Free-ratio" assumes an independent ω ratio for each branch of a phylogeny, detecting positive selection on particular lineages . To make this analysis, it has been assumed a specie-tree topology described by Chatterjee et al. (2009), and the relationship phylogenetic among main human allele sequences (A 1 , A 2 , B and O) was obtained in Ogasawara et al. (1996). The samples of same specie and phenotype were deleted of the data set, so were used only 35 samples in the whole. The phylogenetic tree was manually constructed in newick format and was inputted in the Codeml, a program of the package PAML Version 4.5 (Yang, 2007).

Site models
The analysis to identify the sites under positive selection were performed using Codeml (Yang, 2007), which estimates the different parameters for M1a, M2a, M7, M8 and M8a, site models that are compared to pairs to test of neutrality and selection hypotheses. M1a, M7 and M8a are models for neutral evolution or null hypothesis, as M2a and M8 are models for positive selection, or alternative hypotheses. In each model, the data set and the phylogenetic tree were input in the Codeml and a loglikelihood value, and parameters were estimated. The fit of the pairs M1a-M2a, M7-M8 and M8-M8a were compared using the χ 2 distribution based on log-likelihood scores obtain in each model (likelihood ratio test-LRT) Yang and Bielawski, 2000). It has been used 2 degree of freedom (df ) for M1a-M2a and M7-M8 pairs, and 1 df for M8-M8a pair, a suggested alternative and conservative to use of the distribution with a 50:50 mixture of point mass O and χ 2 1 (Self and Liang, 1987). A site was admitted as under positive selection when the posterior probability on a codon belonging to the ω > 1 class was greater than 95% using the Bayes Empirical Bayes approach-BEB (Yang et al., 2005).

Exon 7 of ABO gene of the New World Monkeys
In this work, it has been examined 184 codons (126 to 309 codon) that include all site codons considered as most important for distinguish A and B transferases, i.e. 176, 235, 266 and 268 codons. The comparison among amino acid sequences from primates showed quite variations, mainly among platyrrhine sequences, that presented 25% of divergence. In the catarrhine sequences this variation is lower, of 9.78%. The greatest diversity in exon 7 of platyrrhini could be because of its ancient age, about of 35 million years (MYA), 10 MYA older than catarrhine (Glazko and Nei, 2003;Schrago, 2007).
In the spite of this, the majority of amino acids are conserved and some substitutions are shared among lineages, for instance the amino acids alanine, lysine and serine at position 166, 215, and 226 are common in NWM and rodents, seeming to have ancestor origin, because these residues are encoded by same codon or a similar codon i.e., for NWM and OWM, the amino acid alanine at position 166 is code by GCN (T or C or G or A) codons, lysine by AAG (at 215) and serine at codon 226 by TCC for NWM and TCA for rodents.
The specificity of the A and B transferases in NWM is determinate by codons 266 and 268, such as in the ABO glycosyltransferases of catarrhine. Among typed samples, Callicebus brunneus is an exception presenting methionine and serine in the codon position 266 and 268 and possesses A phenotype detected by saliva inhibition test (Rocha et al., 1992). The same molecular pattern was observed in others samples of Callicebus (C. personatus and C. moloch), but theirs phenotypes are unknown. In opposition, an experiment with A and B transferase chimeras performed by  indicated that methionine and serine at positions 266 and 268 results in both A and B transferase activities. Thus, the complex pattern of A transferase of Callicebus could happen due: (1) the weak activity of B determinant that could be insufficient for be detected by inhibition technique or, (2) the 266 position is not crucial to determinate the enzymatic specificity in this genera.
A or O "molecular patterns" with leucine and glycine at codons 266 and 268 respectively, it is predominant among the NWM analysed, including those with unknown phenotypes, therefore it is more likely that these NWM have A or O phenotypes. This observation is coherent with earlier works that found more specie platyrrhines with A or O phenotype (Corvelo et al., 1985;Hamel et al., 1988;Rocha et al., 1992;Schneider et al., 1993). However, Cebus olivaceus seem has AB phenotype because has methionine and glycine at 266 and 268.
Our results on the reconstruction ancestral sequences (performed by Codeml) corroborate the hypothesis suggested by Saitou and Yamamoto (1997) that B alleles arose independently in the primates, and A allele is ancestral form.

Free-ratio model
The results of the "Free ratio" M1 model revealed, in general very low dN/dS ratios, well below 1 to the most of the branches, indicating purifying selection. But these values vary considerably within some lineage primates, i.e., some branches (of Brachyteles and B allele human) indicated evidence of evolution under relaxed purifying selection (when ω > 0.5), and others, shown ω value infinity (when dS = 0), under adaptive selection. So, it is likely that ABO gene had evolved under different kinds of selective pressure in different primate lineages, because they are adapt to particular environmental conditions with vary demographic history and biological, beyond they may have differential tissue expression of A and B antigenic epitope, trait that may influence the selective pressure under each lineage (Kosiol et al., 2008). The phylogenetic tree and ω value obtained to each branch by "Free-ratio" model is illustrated in Fig. 1. may be critical for ABO polymorphism. The parameter estimated values are in Tab. 2.
Our results weakly support the hypothesis of positive selection, but few codons were analysed, it may have been an underestimation of codons under positive selection, hence a study more complete is necessary to really understand the evolution of ABO gene.
Indeed, ABO polymorphism in primates reflects adaptive evolution (restrict to one site) which has been associated to protection against pathogens and maternal-fetal incompatibilities, and it could had been maintain by balancing selection (Saitou and Yamamoto, 1997). Nevertheless, to find relationship between pathogens and ABO phenotypes is often difficult due the current effects of evolutionary and demographic forces (Hahn et al., 2004); despite of this, indication that support this association has been describe. Lindén et al. (2008) found strong evidence of adaptive evolution in rhesus monkeys with B and Se w (weak-secretor) phenotypes, which would be more tol-

Site model analyses-codeml results
LTR indicated that M1a-M2a (χ 2 = 1.50, 2 d.f, p<0.05), a pair that compare a model nearly neutral and other of positive selection respectively, were not significantly different, on the other hand the two other pairs of models M7-M8 and M8-M8a were statistically different from each other (M7-M8: χ 2 = 8.52, 2 df, p<0.05; M8-M8a: χ 2 = 5.80, 1 df, p<0.05), but if it consider the critical values at 1%, all pairs are statistically equal. M8 model estimated two sites under positive selection: 238 and 266, but neither had significant posterior probabilities (< 95%). M8 model estimated four sites, however only 266 site codon was statistically valid (266: P = 0.99, ω = 2.39 ± 1.38). According with Yang (2007), M8 model in half of such cases produce false positives for positive selection, but had found the codon 266 under positive selections is reasonable, because of its specific role in the protein, and was observed in the first nucleotide of this codon (at 796 nucleotide position) high nucleotide variation among sequences, indicating that it Fig. 1. Estimative of dN/dS (ω parameter) under M1 "free ratio" model measured by Codeml. Numbers of nonsynonymous and synonymous sites under this model are 488.5 and 63.5, respectively. It is shown species and their ABO phenotype (if known). (-lnL=-2283.96) erant to H. pylori infection than ones with others ABO groups. An interesting fact is that B phenotype is prevalent in the regions even in human populations (Malekasgar, 2005).
A support of the action of balancing selection on ABO gene of primates is the maintenance of different amino acids in varies site codons. Ferrer-Admetlla et al. (2008) described that the balancing selection is fundamental for the evolution of innate immunity genes. So, it is likely that it also act on others genes which take part in mechanism of protection against infectious agents, as ABO gene.

Conclusions
In the present study it has been detected that the majority of sites from ABO glycotransferases are under purifying selection, and one only site is under positive selection, thereby the present results agree with Anisimova et al. (2001) which observed that the adaptive selection occurs at only few sites and not in all the sites of a functional gene. The results obtained lead to three main conclusions: (1) codons 266 and 268 are important for distinguish A and B transferases also in platyrrhini; (2) a allele is ancestral allele in primates, (3) and adaptive and purifying selections act on ABO gene of primates, the first creates diversity and the second keeps the function and structure of the ABO glycosyltransferases.