Comparative Study of Various Genetic Distance Measures between Populations for the ABO Gene

Quantification of the genetic distance between populations is essential in many genetic research programs. Several formulae have been proposed for the estimation of the genetic distance between populations using gene frequency data. Nei’s D has been the most widely used genetic distance measure in different research programs. But the selection of a suitable measure to estimate genetic distance between real-world human populations is a very difficult task. The present study was undertaken to estimate the genetic distance between Barak Valley Muslims (BVM) and other twenty four nations with the ABO blood group gene frequency data using seven different formulae, as well as to estimate the correlation coefficients between distance measures and to work out the regression equations. Seven genetic distance measures namely Nei’s D, Nei’s Nm, La, Nei’s Da, Dc, Re and Nei’s Ne were calculated between BVM and other 24 nations. Correlation coefficients of Nei’s D with other measures were determined to find out which other genetic distance measures were similar to Nei’s D. Linear regression equations of Nei’s D with other distance measures were determined. Nei’s D showed a highly significant (p=0.01) positive correlation with Cavalli-Sforza and Edwards chord distance Dc (0.90), Reynolds Re (0.90), Nei’s Da (0.74) and Nei’s Ne (0.63) but a negative correlation with Nei’s Nm and La. Since Nei’s D had very high positive correlation with Dc and Re distance measures, any one of these measures could be reliably used in genetic analysis instead of all the three measures for estimating genetic distance between populations.


Introduction
Quantification of the genetic distance between populations is instrumental in many genetic research programs. A large number of formulae have been proposed for this purpose. However, the selection of an appropriate measure for assessing genetic distance between real-world human populations that diverged as a result of mechanisms that are not fully known can be a challenging task (Libiger et al., 2009).
Nei's standard genetic distance has been the most widely used genetic distance measure between populations. Since several formulae have already been proposed for genetic distance measurement, it is essential to identify which genetic measures show a close similarity with Nei's D measure. The present study was undertaken to estimate the genetic distance between Barak Valley Muslims (BVM) and each of other 24 populations for ABO blood group gene frequency data using seven different genetic distance measures. These seven measures were Nei's D, Nei's Nm, Latter's La, Nei's Da, Cavalli-Sforza and Edwards Dc (RE), Reynolds Re and Nei's Ne. To identify the distance measure(s) that shows similarity with Nei's D, a correlation analysis was performed between the estimates of Nei's D and other distance measures. Regression equations of different distance measures on Nei's D were worked out to determine the value of a particular distance measure with a given value of Nei's D.

Materials and methods
In this study, ABO blood group distribution data of 25 populations excluding Barak Valley Muslims were obtained from the published literature and websites. The ABO blood group distribution data in Barak Valley Muslims were estimated by the author (Chakraborty, 2010). The frequencies of O, A and B alleles belonging to ABO blood group system for each population were estimated from ABO blood group phenotyping data using the formulae suggested by Hedrick (2005) Latter's distance (La) according to Latter (1972) is given by: Nei's Da distance according to Nei et al. (1983) is given by: Cavalli-Sforza and Edwards chord distance (Dc or CE) according to Cavalli-Sforza and Edwards (1967) is given by: Reynolds genetic distance (Re) according to Reynolds et al. (1983) is given by:

Genetic distance measurement
The ABO gene frequency data (Tab. 1) were used to estimate the genetic distance between Barak Valley Muslims and each of the remaining 24 populations using seven distance measures as given below.
Let the genetic distance for 'm' loci with 'v' alleles per locus be studied in populations 1 and 2 with n 1 and n 2 individuals having n as the average number of individuals. Let lu1 P and lu2 P be the frequencies of allele 'u' at locus 'l' in population 1 and 2, respectively and let P lu1 and P lu2 be the number of individuals that carry allele 'u' at locus 'l' in populations 1 and 2 respectively, then seven distance measures can be estimated as follows: Nei's standard genetic distance (D) between two populations without bias correction according to Nei (1972) is estimated as: Nei's minimum distance (Nm) is given by the following equation:  Nei's geometric distance (Ne) based on genotype frequency data (not gene frequency) is given by:

Correlation and regression analysis
Correlation coefficient between any two distance measures was calculated as per Harris et al. (2007). Correlation coefficient was tested by the 't' test for significance at p=0.01 and 0.05. Linear regression equation of a distance measure (as dependent variable) on Nei's D as independent variable was estimated by the method of least squares as per Harris et al. (2007).

Results and discussion
The Barak Valley Area, named after the mighty river Barak flowing through the area, is located in southern part of Assam state in North East India. The valley has been inhabited by one of the major endogamous religious groups, the Muslims, for several centuries. Barak Valley has a total population of about 3.21 million including Hindus, Muslims and Christians with a land area of 6,992 square kilometers. This region is characterized by undulating topography with wide plain area, low lying water logged tracts and hillocks. The climate of the Barak valley is subtropical, warm and humid with average annual rainfall of 318cm and 146 rainy days. Nearly 80% of the total population depends on agriculture for livelihood.

Gene frequency
The frequencies of O, A and B alleles of ABO gene of different nations/populations were estimated from the ABO blood group distribution data of each population (Tab. 1). In general, the frequency of O allele was the highest in all the populations. B allele was not reported in Australians.

Genetic distance between populations
The estimates of various genetic distance measures (expressed in percent) between Barak Valley Muslims (BVM) and each of the twenty-four populations were calculated on the basis of ABO gene frequency data (Tab. 2).
Nei's D estimate was the lowest (0.0015) between BVM and India (in general) indicating the lowest genetic distance but highest genetic identity between these two populations for ABO gene. On the other hand, the highest Nei's D value (0.0395) was found between BVM and Australia suggesting greatest genetic distance but lowest genetic identity between these two populations for ABO gene out of 24 combinations. Nei's geometric distance (Ne), except all other genetic distance measures, was calculated on the basis of genotypic data estimated from ABO gene frequency.  Nei's Da estimate ranged from 0.0009 between BVM and India to 0.1000 between BVM and Australia. Cavalli-Sforza and Edwards chord distance (Dc) showed the range from 0.0265 between BVM and India to 0.2847 between BVM and Australia. Reynolds genetic distance (Re) ranged from the lowest estimate 0.00002 between BVM and Bulgaria to the highest value 0.0073 between BVM and Australia. Nei's Ne estimate ranged from the lowest value 0.0169 between BVM and Sudan to the highest value 0.0808 between BVM and South China.
Several studies were carried out on genetic distance measurements across different populations. Genetic distance and gene diversity studies by Roy et al. (1990) among 10 endogamous groups in Chattisgarh, India using the gene frequency data of three genetic loci revealed that the gene differentiation among these population groups is only about 2 per cent (0.02).
Genetic differentiation studies in Indian populations by Papiha et al. (1982) revealed that genetic differentiation in India populations was low (0.26-1.70%). In Assam, genetic variation studies by Das (1979) among three caste populations namely Brahmin, Kalita and Kaibarta on the basis of the ABO blood groups and other anthropometric characters revealed that the Kaibarta stand apart from the Brahmin and the Kalita, who are similar to each other. Genetic study by Danker-Hopfe et al. (1988) among 13 Assamese populations including two Muslim groups for the distribution of anthropometric, anthroposcopic and dermatoglyphic traits revealed that the Muslims in Assam were distinguished between Marias (who seemed to be more closely related to Mongoloid populations) and Sheikhs (whose phenotypic appearance was more like that of Hindu caste groups).
Genetic distance studies by Roychoudhury et al. (1982) between Jews and Non-Jews using gene frequency data of nine blood groups and protein loci revealed that the Yemenite Jews have a high degree of genetic affinity to the Israeli Arabs and the Iranian Jews to the Iranians. Genetic distance studies by Triantaphyllidis et al. (1983) between the inhabitants of nine Mediterranean countries and the three major human races using the gene frequency data of several genetic markers suggested that the Algerians were closer to Negroids while the other Mediterraneans were closer to Caucasoids.
Genetic and taxonomic distance studies by Sokal (1988) among 3466 samples of human populations in Europe based on 97 allele frequencies and 10 cranial variables demonstrated that speakers of different language families in Europe differ genetically and that this difference remains even after geographic differentiation.

Correlation analysis
The estimates of correlation coefficients between any two distance measures (Tab. 3) revealed that Nei's

Conclusions
The present study revealed that the Barak Valley Muslims had the highest genetic distance from Australians for the ABO gene but the lowest from the Indians. Nei's D genetic distance measure showed a highly significant, positive correlation with other distance measures namely Cavalli-Sforza and Edwards chord distance Dc and Reynolds Re measures indicating great similarity between these three distance measures. But Nei's D measure showed a negative correlation with Nei's minimum distance Nm and Latter's distance La. D showed highly significant (p=0.01) positive correlation with Cavalli-Sforza and Edwards chord distance Dc (0.90), Reynolds Re (0.90), Nei's Da (0.74) and Nei's Ne (0.63). This indicated great similarity between these four distance measures and any one of these measures could be used instead of all the four measures in genetic analysis. But due to very high magnitude of the positive correlation of Nei's D with Cavalli-Sforza and Edwards chord distance Dc and Reynolds Re, the use of any one out of these three measures would be more effective in genetic analysis. Nei's D showed non-significant negative correlation with Nei's minimum distance Nm and Latter's distance La.

Regression analysis
Nei's D is the most widely used genetic distance measure in research programs. Assuming Nei's D as a dependent variable and anyone of the remaining distance measures (Da, Dc, Re or Ne) as independent variable, the linear regression equations of the latter on Nei's D were estimated (Tab. 4). Since Nei's minimum distance Nm and Latter's distance La did not show significant correlation with Nei's D, hence Nm and La were not used as dependent variables in determining the linear regression equation with Nei's D.
These regression equations could be used to estimate the magnitude of the particular genetic distance measure with a given value of Nei's D between two populations. But the accuracy of the particular genetic estimates calculated from a given estimate of Nei's D using the above linear regression equations would decrease with the decreasing value of correlation coefficients. In the regression equation y = A+Bx, the B estimate represents the regression coefficient (slope) for linear regression and the regression constant A represents the magnitude of the y-intercept i.e. the distance from the origin to the point where the straight line intersects the y-axis.