| Title: | Phylogenetics via Root Distances Method Under the Coalescent |
|---|---|
| Description: | Estimates phylogenetic trees from allele count data using the root distance method under the Coalescent Model. Given a matrix of allele counts across taxa and loci, the package estimates pairwise root distances under the Coalescent Model using maximum likelihood estimation. Then, it estimates a labeled phylogenetic tree from the estimated root distances. See Peng et al. (2021) <doi:10.1016/j.ympev.2021.107142>. |
| Authors: | Arindam RoyChoudhury [aut, cre, cph], Ying Li [aut] |
| Maintainer: | Arindam RoyChoudhury <[email protected]> |
| License: | AGPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-28 12:10:03 UTC |
| Source: | https://github.com/arindamroychoudhury/coalescentphylo |
We provide a matrix of allele counts used to demonstrate the estimation of phylogenetic tree
with the CoalescentPhylo package. The dataset contains allele counts from 8 ingroup human
populations and 1 outgroup human population (San), measured across 2,000 loci.
data(Human_Allele_Count_Data)data(Human_Allele_Count_Data)
A numeric matrix with 9 rows and 2,000 columns, where:
Populations (8 ingroup + 1 outgroup). Row names correspond to population labels from the ALFRED database, with ALFRED sample IDs given in parentheses:
Papuan New Guinean (SA001501H)
Uyghur (SA001492Q)
Hazara (SA001477T)
Yi (SA001485S)
Dai (SA001493R)
Japanese (SA002260K)
Mongolian (SA001489W)
Karitiana (SA001514L)
San (SA001469U) — outgroup
Loci (2,000 SNP sites). Each entry is a non-negative
integer allele count not exceeding fixed.n.at.tips.
The populations were selected to represent diverse geographic regions across Africa, Central Asia, East Asia, Oceania, and the Americas, providing a broad test case for coalescent-based phylogenetic inference. San is included as the outgroup, consistent with the early divergence of Southern African populations in human evolutionary history.
ALFRED — The ALlele FREquency Database (https://alfred.med.yale.edu). A resource of gene frequency data on human populations supported by Biomedical Informatics and Data Science, Yale University.
data(Human_Allele_Count_Data) # Check dimensions: 9 populations x 2000 loci dim(Human_Allele_Count_Data) # View population names rownames(Human_Allele_Count_Data) # Estimate phylogenetic tree tree <- RD(mat_allele_count = Human_Allele_Count_Data, n.cores = 1) plot(tree$labeled_tree)data(Human_Allele_Count_Data) # Check dimensions: 9 populations x 2000 loci dim(Human_Allele_Count_Data) # View population names rownames(Human_Allele_Count_Data) # Estimate phylogenetic tree tree <- RD(mat_allele_count = Human_Allele_Count_Data, n.cores = 1) plot(tree$labeled_tree)
Estimates a phylogenetic tree from a matrix of allele counts using root distance method under The Coalescent Model.
RD( mat_allele_count, theta = 1, fixed.n.at.tips = 4, newick_br_length_digits = 3, n.cores = NULL )RD( mat_allele_count, theta = 1, fixed.n.at.tips = 4, newick_br_length_digits = 3, n.cores = NULL )
mat_allele_count |
A numeric matrix of allele counts where rows
represent populations or samples and columns represent alleles or loci.
Input as allele counts, each should be non-negative integer |
theta |
A positive numeric scalar representing the scaled mutation rate ( |
fixed.n.at.tips |
A positive integer specifying the fixed sample size assumed at each tip of the tree.
Used to correct for sampling effects when computing distances. Defaults to |
newick_br_length_digits |
A non-negative integer controlling the number of decimal places used when
formatting branch lengths in the Newick string output.
Defaults to |
n.cores |
Optional integer specifying the number of CPU cores used
for parallel computation. Defaults to |
A named list with two elements:
A phylogenetic tree of class "phylo" with
estimated branch lengths but no tip labels assigned.
A phylogenetic tree of class "phylo" with
estimated branch lengths and tip labels derived from the row names
of mat_allele_count.
Peng J, Rajeevan H, Kubatko L, RoyChoudhury A (2021). A fast likelihood approach for estimation of large phylogenies from continuous trait data. Molecular Phylogenetics and Evolution, 161, 107142. doi:10.1016/j.ympev.2021.107142
# Load built-in example dataset (9 taxa x 2000 loci) data(Human_Allele_Count_Data) # Inspect dimensions: rows = taxa, columns = loci dim(Human_Allele_Count_Data) # Preview first few rows and columns Human_Allele_Count_Data[1:3, 1:5] # Check for missing data anyNA(Human_Allele_Count_Data) # NOTE: For CRAN testing, we use n.cores = 1 for compatibility. # In practice, users may set n.cores = NULL to use all available cores # and speed up computation. # Estimate phylogenetic tree using the Coalescent Model tree <- RD(mat_allele_count = Human_Allele_Count_Data, n.cores = 1) # Summarize the result print(tree) # Plot the labeled phylogenetic tree plot(tree$labeled_tree)# Load built-in example dataset (9 taxa x 2000 loci) data(Human_Allele_Count_Data) # Inspect dimensions: rows = taxa, columns = loci dim(Human_Allele_Count_Data) # Preview first few rows and columns Human_Allele_Count_Data[1:3, 1:5] # Check for missing data anyNA(Human_Allele_Count_Data) # NOTE: For CRAN testing, we use n.cores = 1 for compatibility. # In practice, users may set n.cores = NULL to use all available cores # and speed up computation. # Estimate phylogenetic tree using the Coalescent Model tree <- RD(mat_allele_count = Human_Allele_Count_Data, n.cores = 1) # Summarize the result print(tree) # Plot the labeled phylogenetic tree plot(tree$labeled_tree)