Title: | Build Graphs for Landscape Genetics Analysis |
---|---|
Description: | Build graphs for landscape genetics analysis. This set of functions can be used to import and convert spatial and genetic data initially in different formats, import landscape graphs created with 'GRAPHAB' software (Foltete et al., 2012) <doi:10.1016/j.envsoft.2012.07.002>, make diagnosis plots of isolation by distance relationships in order to choose how to build genetic graphs, create graphs with a large range of pruning methods, weight their links with several genetic distances, plot and analyse graphs, compare them with other graphs. It uses functions from other packages such as 'adegenet' (Jombart, 2008) <doi:10.1093/bioinformatics/btn129> and 'igraph' (Csardi et Nepusz, 2006) <https://igraph.org/>. It also implements methods commonly used in landscape genetics to create graphs, described by Dyer et Nason (2004) <doi:10.1111/j.1365-294X.2004.02177.x> and Greenbaum et Fefferman (2017) <doi:10.1111/mec.14059>, and to analyse distance data (van Strien et al., 2015) <doi:10.1038/hdy.2014.62>. |
Authors: | Paul Savary [aut, cre]
|
Maintainer: | Paul Savary <[email protected]> |
License: | GPL-2 |
Version: | 1.8.0 |
Built: | 2025-02-15 04:11:53 UTC |
Source: | https://github.com/cran/graph4lg |
The function adds attributes to the nodes of a graph from
either an object of class data.frame
or from a shapefile layer.
The node IDs in the input objects must be the same as in the graph object.
add_nodes_attr( graph, input = "df", data, dir_path = NULL, layer = NULL, index = "Id", include = "all" )
add_nodes_attr( graph, input = "df", data, dir_path = NULL, layer = NULL, index = "Id", include = "all" )
graph |
A graph object of class |
input |
A character string indicating the nature of the input data from which come the attributes to add to the nodes.
In both cases, input attribute table or dataframe must have a column with the exact same values as the node IDs. |
data |
(only if 'input = "df"') The name of the object of
class |
dir_path |
(only if 'input = "shp"') The path (character string) to the directory containing the shapefile layer of type point whose attribute table contains the attributes to add to the nodes. |
layer |
(only if 'input = "shp"') The name (character string) of the shapefile layer of type point (without extension, ex.: "nodes" refers to "nodes.shp" layer) whose attribute table contains the attributes to add to the nodes. |
index |
The name (character string) of the column with the nodes names in the input data (column of the attribute table or of the dataframe). |
include |
A character string (vector) indicating which columns of the input data will be added as nodes' attributes. By default, 'include = "all"', i.e. every column of the input data is added. Alternatively, 'include' can be a vector with the names of the columns to add (ex.: "c('x', 'y', 'pop_name')"). |
The graph can be created with the function
graphab_to_igraph
by importing output from Graphab projects.
Values of the metrics computed at the node level with Graphab can then be
added to such a graph with this function.
A graph object of class igraph
P. Savary
data("data_tuto") graph <- data_tuto[[3]] df_nodes <- data.frame(Id = igraph::V(graph)$name, Area = runif(50, min = 10, max = 60)) graph <- add_nodes_attr(graph, data = df_nodes, input = "df", index = "Id", include = "Area")
data("data_tuto") graph <- data_tuto[[3]] df_nodes <- data.frame(Id = igraph::V(graph)$name, Area = runif(50, min = 10, max = 60)) graph <- add_nodes_attr(graph, data = df_nodes, input = "df", index = "Id", include = "Area")
The function computes modules from a graph by maximising modularity.
compute_graph_modul( graph, algo = "fast_greedy", node_inter = NULL, nb_modul = NULL )
compute_graph_modul( graph, algo = "fast_greedy", node_inter = NULL, nb_modul = NULL )
graph |
An object of class |
algo |
A character string indicating the algorithm used to create the modules with igraph.
|
node_inter |
(optional, default = NULL) A character string indicating whether the links of the graph are weighted by distances or by similarity indices. It is only used to compute the modularity index. It can be:
|
nb_modul |
(optional , default = NULL) A numeric or integer value indicating the number of modules in the graph. When this number is not specified, the optimal value is retained. |
A data.frame
with the node names and the corresponding
module ID.
P. Savary
data("data_tuto") mat_gen <- data_tuto[[1]] graph <- gen_graph_thr(mat_w = mat_gen, mat_thr = mat_gen, thr = 0.8) res_mod <- compute_graph_modul(graph = graph, algo = "fast_greedy", node_inter = "distance")
data("data_tuto") mat_gen <- data_tuto[[1]] graph <- gen_graph_thr(mat_w = mat_gen, mat_thr = mat_gen, thr = 0.8) res_mod <- compute_graph_modul(graph = graph, algo = "fast_greedy", node_inter = "distance")
The function computes graph-theoretic metric values at the node level.
compute_node_metric( graph, metrics = c("deg", "close", "btw", "str", "siw", "miw"), weight = TRUE )
compute_node_metric( graph, metrics = c("deg", "close", "btw", "str", "siw", "miw"), weight = TRUE )
graph |
An object of class |
metrics |
Character vector specifying the graph-theoretic metrics computed at the node-level in the graphs Graph-theoretic metrics can be:
By default, the vector |
weight |
Logical which indicates whether the links are weighted during
the calculation of the centrality indices betweenness and closeness.
(default: |
A data.frame
with the node names and the metrics computed.
P. Savary
data(data_ex_genind) mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph <- gen_graph_thr(mat_w = mat_gen, mat_thr = mat_gen, thr = 0.8) res_met <- compute_node_metric(graph)
data(data_ex_genind) mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph <- gen_graph_thr(mat_w = mat_gen, mat_thr = mat_gen, thr = 0.8) res_met <- compute_node_metric(graph)
The function fits a model to convert cost-distances into Euclidean distances as implemented in Graphab software.
convert_cd( mat_euc, mat_ld, to_convert, method = "log-log", fig = TRUE, line_col = "black", pts_col = "#999999" )
convert_cd( mat_euc, mat_ld, to_convert, method = "log-log", fig = TRUE, line_col = "black", pts_col = "#999999" )
mat_euc |
A symmetric |
mat_ld |
A symmetric |
to_convert |
A numeric value or numeric vector with Euclidean distances to convert into cost-distances. |
method |
A character string indicating the method used to fit the model.
|
fig |
Logical (default = TRUE) indicating whether a figure is plotted |
line_col |
(if 'fig = TRUE') Character string indicating the color used to plot the line (default: "blue"). It must be a hexadecimal color code or a color used by default in R. |
pts_col |
(if 'fig = TRUE') Character string indicating the color used to plot the points (default: "#999999"). It must be a hexadecimal color code or a color used by default in R. |
IDs in 'mat_euc' and 'mat_ld' must be the same and refer to the same
sampling site or populations, and both matrices must be ordered
in the same way.
Matrix of Euclidean distance 'mat_euc' can be computed using the function
mat_geo_dist
.
Matrix of landscape distance 'mat_ld' can be computed using the function
mat_cost_dist
.
Before the log calculation, 0 distance values are converted into 1,
so that they are 0 after this calculation.
A list of output (converted values, estimated parameters, R2) and optionally a ggplot2 object to plot
P. Savary
Foltête J, Clauzel C, Vuidel G (2012). “A software tool dedicated to the modelling of landscape networks.” Environmental Modelling & Software, 38, 316–327.
data("data_tuto") mat_ld <- data_tuto[[2]][1:10, 1:10] * 1000 mat_euc <- data_tuto[[1]][1:10, 1:10] * 50000 to_convert <- c(30000, 40000) res <- convert_cd(mat_euc = mat_euc, mat_ld = mat_ld, to_convert = to_convert, fig = FALSE)
data("data_tuto") mat_ld <- data_tuto[[2]][1:10, 1:10] * 1000 mat_euc <- data_tuto[[1]][1:10, 1:10] * 50000 to_convert <- c(30000, 40000) res <- convert_cd(mat_euc = mat_euc, mat_ld = mat_ld, to_convert = to_convert, fig = FALSE)
Genetic dataset from genetic simulation on CDPOP 200 individuals, 10 populations 20 microsatellite loci (3 digits coding) 100 generations simulated
data_ex_genind
data_ex_genind
An object of type 'genind'
The simulation was made with CDPOP during 100 generations. Dispersal was possible between the 10 populations. Its probability depended on the cost distance between populations, calculated on a simulated resistance surface (raster). Mutations were not possible. There were initially 600 alleles in total (many disappeared because of drift). Population stayed constant with a sex-ratio of 1. Generations did not overlap. This simulation includes a part of stochasticity and these data result from only 1 simulation run.
Landguth EL, Cushman SA (2010). “CDPOP: a spatially explicit cost distance population genetics program.” Molecular Ecology Resources, 10(1), 156–161.
data("data_ex_genind") length(unique(data_ex_genind@pop))
data("data_ex_genind") length(unique(data_ex_genind@pop))
Genetic dataset from genetic simulation on CDPOP 200 individuals, 10 populations 20 microsatellite loci (3 digits coding) 100 generations simulated
data_ex_gstud
data_ex_gstud
A 'data.frame' with columns:
Individual ID
Population name
20 loci columns with microsatellite data with 3 digits coding, alleles separated by ":", and blank missing data (class 'locus' from gstudio)
data("data_ex_gstud") str(data_ex_gstud) length(unique(data_ex_gstud$POP))
data("data_ex_gstud") str(data_ex_gstud) length(unique(data_ex_gstud$POP))
Genetic dataset from genetic simulation on CDPOP 200 individuals, 10 populations 20 microsatellite loci (3 digits coding) 100 generations simulated
data_ex_loci
data_ex_loci
An object of class 'loci' and 'data.frame' with the columns :
Population name
20 loci columns with microsatellite data with 3 digits coding, alleles separated by "/", and missing data noted "NA/NA"
Row names correspond to individuals' ID
data("data_ex_loci") length(unique(data_ex_loci$population))
data("data_ex_loci") length(unique(data_ex_loci$population))
Genetic dataset from genetic simulation on CDPOP 1500 individuals, 50 populations 20 microsatellite loci (3 digits coding) 50 generations simulated
data_simul_genind
data_simul_genind
An object of type 'genind'
The simulation was made with CDPOP during 50 generations. Dispersal was possible between the 50 populations. Its probability depended on the cost distance between populations, calculated on a simulated resistance surface (raster). Mutations were not possible. There were initially 600 alleles in total (many disappeared because of drift). Population stayed constant with a sex-ratio of 1. Generations did not overlap. This simulation includes a part of stochasticity and these data result from only 1 simulation run.
Landguth EL, Cushman SA (2010). “CDPOP: a spatially explicit cost distance population genetics program.” Molecular Ecology Resources, 10(1), 156–161.
data("data_simul_genind") length(unique(data_simul_genind@pop))
data("data_simul_genind") length(unique(data_simul_genind@pop))
Data used to generate the vignette
Data used to generate the vignette
data_tuto data_tuto
data_tuto data_tuto
Several outputs or inputs to show how the package works in a list
Genetic distance matrix example
Second genetic distance matrix example
Genetic independence graph example
Output of the function 'dist_max_corr'
Landscape graph example
Landscape distance matrix example
Several outputs or inputs to show how the package works in a list
Output of the function 'dist_max_corr'
Genetic independence graph example
Genetic distance matrix example
Second genetic distance matrix example
data("data_tuto") mat_dps <- data_tuto[[1]] str(mat_dps) data("data_tuto") mat_dps <- data_tuto[[1]] str(mat_dps)
data("data_tuto") mat_dps <- data_tuto[[1]] str(mat_dps) data("data_tuto") mat_dps <- data_tuto[[1]] str(mat_dps)
The function converts an edge-list data.frame into a symmetric pairwise matrix
df_to_pw_mat(data, from, to, value)
df_to_pw_mat(data, from, to, value)
data |
An object of class |
from |
A character string indicating the name of the column with the ID of the origins |
to |
A character string indicating the name of the column with the ID of the arrivals |
value |
A character string indicating the name of the column with the values corresponding to each pair |
The matrix is a symmetric matrix. Be careful, you shall not provide a data.frame with different values corresponding to the pair 1-2 and 2-1 as an example. Ideally, for a complete matrix, data should have n(n-1)/2 rows if values are computed between n objects.
A pairwise matrix
P. Savary
data(pts_pop_simul) suppressWarnings(mat_geo <- mat_geo_dist(pts_pop_simul, ID = "ID", x = "x", y = "y")) g <- gen_graph_topo(mat_w = mat_geo, mat_topo = mat_geo, topo = "comp") df <- data.frame(igraph::as_edgelist(g)) df$w <- igraph::E(g)$weight df_to_pw_mat(df, from = "X1", to = "X2", value = "w")
data(pts_pop_simul) suppressWarnings(mat_geo <- mat_geo_dist(pts_pop_simul, ID = "ID", x = "x", y = "y")) g <- gen_graph_topo(mat_w = mat_geo, mat_topo = mat_geo, topo = "comp") df <- data.frame(igraph::as_edgelist(g)) df$w <- igraph::E(g)$weight df_to_pw_mat(df, from = "X1", to = "X2", value = "w")
The function enables to compute the distance at which the correlation between genetic distance and landscape distance is maximal, using a method similar to that employed by van Strien et al. (2015). Iteratively, distance threshold values are tested. For each value, all the population pairs separated by a landscape distance larger than the threshold are removed before the Mantel correlation coefficient between genetic distance and landscape distance is computed. The distance threshold at which the correlation is the strongest is then identified. A figure showing the evolution of the correlation coefficients when landscape distance threshold increases is plotted.
dist_max_corr( mat_gd, mat_ld, interv, from = NULL, to = NULL, fig = TRUE, thr_gd = NULL, line_col = "black", pts_col = "#999999" )
dist_max_corr( mat_gd, mat_ld, interv, from = NULL, to = NULL, fig = TRUE, thr_gd = NULL, line_col = "black", pts_col = "#999999" )
mat_gd |
A symmetric |
mat_ld |
A symmetric |
interv |
A numeric or integer value indicating the interval between the different distance thresholds for which the correlation coefficients are computed. |
from |
(optional) The minimum distance threshold value at which the correlation coefficient is computed. |
to |
(optional) The maximum distance threshold value at which the correlation coefficient is computed. |
fig |
Logical (default = TRUE) indicating whether a figure is plotted. |
thr_gd |
(optional) A numeric or integer value used to remove genetic distance values from the data before the calculation. All genetic distances values above 'thr_gd' are removed from the data. This parameter can be used especially when there are outliers. |
line_col |
(optional, if fig = TRUE) A character string indicating the color used to plot the line (default: "blue"). It must be a hexadecimal color code or a color used by default in R. |
pts_col |
(optional, if fig = TRUE) A character string indicating the color used to plot the points (default: "#999999"). It must be a hexadecimal color code or a color used by default in R. |
IDs in 'mat_gd' and 'mat_ld' must be the same and refer to the same
sampling sites or populations, and both matrices must be ordered
in the same way.
The correlation coefficient between genetic distance and landscape distance
computed is a Mantel correlation coefficient. If there are less than 50
pairwise values, the correlation is not computed, as in
van Strien et al. (2015). Such a method can be subject to criticism from
a strict statistical point of view given correlation coefficients computed
from samples of different size are compared.
The matrix of genetic distance 'mat_gd' can be computed using
mat_gen_dist
.
The matrix of landscape distance 'mat_ld' can be computed using
mat_geo_dist
when the landscape distance needed is a
Euclidean geographical distance.
Mantel correlation coefficients are computed using
the function mantel
.
A list of objects:
The distance at which the correlation is the highest.
The vector of correlation coefficients at the different distance thresholds
The vector of the different distance thresholds
A ggplot2 object to plot
P. Savary
Van Strien MJ, Holderegger R, Van Heck HJ (2015). “Isolation-by-distance in landscapes: considerations for landscape genetics.” Heredity, 114(1), 27.
data("data_tuto") mat_gen <- data_tuto[[1]] mat_dist <- data_tuto[[2]]*1000 res_dmc <- dist_max_corr(mat_gd = mat_gen, mat_ld = mat_dist, from = 32000, to = 42000, interv = 5000, fig = FALSE)
data("data_tuto") mat_gen <- data_tuto[[1]] mat_dist <- data_tuto[[2]]*1000 res_dmc <- dist_max_corr(mat_gd = mat_gen, mat_ld = mat_dist, from = 32000, to = 42000, interv = 5000, fig = FALSE)
The function allows to prune a graph by removing the links with the largest weights until the graph breaks into two components. The returned graph is the last graph with only one component.
g_percol(x, val_step = 20)
g_percol(x, val_step = 20)
x |
A symmetric |
val_step |
The number of classes to create to search for the threshold value without testing all the possibilities. By default, 'val_step = 20'. |
A graph object of type igraph
P. Savary
data(data_ex_genind) suppressWarnings(mat_w <- graph4lg::mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y")) g_percol(x = mat_w)
data(data_ex_genind) suppressWarnings(mat_w <- graph4lg::mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y")) g_percol(x = mat_w)
The function allows to create genetic graphs from genetic data by applying the conditional independence principle. Populations whose allelic frequencies covary significantly once the covariance with the other populations has been taken into account are linked on the graphs.
gen_graph_indep( x, dist = "basic", cov = "sq", pcor = "magwene", alpha = 0.05, test = "EED", adj = "none", output = "igraph" )
gen_graph_indep( x, dist = "basic", cov = "sq", pcor = "magwene", alpha = 0.05, test = "EED", adj = "none", output = "igraph" )
x |
An object of class |
dist |
A character string indicating the method used to compute the multilocus genetic distance between populations
|
cov |
A character string indicating the formula used to compute the covariance matrix from the distance matrix
|
pcor |
A character string indicating the way the partial correlation matrix is computed from the covariance matrix.
|
alpha |
A numeric value corresponding to the statistical tolerance threshold used to test the difference from 0 of the partial correlation coefficients. By default, 'alpha=0.05'. |
test |
A character string indicating the method used to test the significance of the partial correlation coefficients.
|
adj |
A character string indicating the way of adjusting p-values to assess the significance of the p-values
|
output |
A character string indicating the matrices included in the output list.
|
The function allows to vary many parameters such as the genetic distance used, the formula used to compute the covariance, the statistical tolerance threshold, the p-values adjustment, among others.
A list
of objects of class matrix
, an object of
class matrix
or a graph object of class igraph
P. Savary
Dyer RJ, Nason JD (2004). “Population graphs: the graph theoretic shape of genetic structure.” Molecular ecology, 13(7), 1713–1727. Benjamini Y, Hochberg Y (1995). “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the royal statistical society. Series B (Methodological), 289–300. Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL (1994). “High resolution of human evolutionary trees with polymorphic microsatellites.” nature, 368(6470), 455–457. Everitt B, Hothorn T (2011). An introduction to applied multivariate analysis with R. Springer. Excoffier L, Smouse PE, Quattro JM (1992). “Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.” Genetics, 131(2), 479–491. Fortuna MA, Albaladejo RG, Fernández L, Aparicio A, Bascompte J (2009). “Networks of spatial genetic variation across species.” Proceedings of the National Academy of Sciences, 106(45), 19044–19049. Holm S (1979). “A simple sequentially rejective multiple test procedure.” Scandinavian journal of statistics, 65–70. Magwene PM (2001). “New tools for studying integration and modularity.” Evolution, 55(9), 1734–1745. Wermuth N, Scheidt E (1977). “Algorithm AS 105: fitting a covariance selection model to a matrix.” Journal of the Royal Statistical Society. Series C (Applied Statistics), 26(1), 88–92. Whittaker J (2009). Graphical models in applied multivariate statistics. Wiley Publishing.
data(data_ex_genind) dist_graph_test <- gen_graph_indep(x = data_ex_genind, dist = "basic", cov = "sq", pcor = "magwene", alpha = 0.05, test = "EED", adj = "none", output = "igraph")
data(data_ex_genind) dist_graph_test <- gen_graph_indep(x = data_ex_genind, dist = "basic", cov = "sq", pcor = "magwene", alpha = 0.05, test = "EED", adj = "none", output = "igraph")
The function allows to construct a genetic graph whose links' weights are larger or lower than a specific threshold
gen_graph_thr(mat_w, mat_thr = NULL, thr, mode = "larger")
gen_graph_thr(mat_w, mat_thr = NULL, thr, mode = "larger")
mat_w |
A symmetric (pairwise) |
mat_thr |
(optional) A symmetric (pairwise) distance |
thr |
The threshold value (logically between min(mat_thr) and max(mat_thr))(integer or numeric) |
mode |
|
If 'mat_thr' is not defined, 'mat_w' is used for the pruning. Matrices 'mat_w' and 'mat_thr' must have the same dimensions and the same rows' and columns' names. Values in 'mat_thr' matrix must be positive. Negative values from 'mat_w' are transformed into zeros. The function works only for undirected graphs. If dist objects are specified, it is assumed that colnames and row.names of mat_w and mat_thr refer to the same populations/locations.
A graph object of class igraph
P. Savary
mat_w <- mat_gen_dist(x = data_ex_genind, dist = 'DPS') suppressWarnings(mat_thr <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_thr <- mat_thr[row.names(mat_w), colnames(mat_w)] graph <- gen_graph_thr(mat_w, mat_thr, thr = 6000, mode = "larger")
mat_w <- mat_gen_dist(x = data_ex_genind, dist = 'DPS') suppressWarnings(mat_thr <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_thr <- mat_thr[row.names(mat_w), colnames(mat_w)] graph <- gen_graph_thr(mat_w, mat_thr, thr = 6000, mode = "larger")
The function constructs a genetic graph with a specific topology from genetic and/or geographical distance matrices
gen_graph_topo(mat_w, mat_topo = NULL, topo = "gabriel", k = NULL)
gen_graph_topo(mat_w, mat_topo = NULL, topo = "gabriel", k = NULL)
mat_w |
A symmetric (pairwise) |
mat_topo |
(optional) A symmetric (pairwise) distance |
topo |
Which topology does the created graph have?
|
k |
(if 'topo = 'knn”) An integer which indicates the number of nearest neighbors considered to create the K-nearest neighbor graph. k must be lower than the total number of nodes minus 1. |
If 'mat_topo' is not defined, 'mat_w' is used for the pruning. Matrices 'mat_w' and 'mat_topo' must have the same dimensions and the same rows' and columns' names. Values in 'mat_topo' matrix must be positive. Negative values from 'mat_w' are transformed into zeros. The function works only for undirected graphs. Note that the topology 'knn' works best when 'mat_topo' contains distance values from a continuous value range, thereby avoiding equal distances between a node and the others. are more than k nodes located at distances in the k-th smallest distances If dist objects are specified, it is assumed that colnames and row.names of mat_w and mat_topo refer to the same populations/locations.
A graph object of class igraph
P. Savary
Gabriel KR, Sokal RR (1969). “A new statistical approach to geographic variation analysis.” Systematic zoology, 18(3), 259–278.
mat_w <- mat_gen_dist(x = data_ex_genind, dist = 'DPS') suppressWarnings(mat_topo <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_topo <- mat_topo[row.names(mat_w), colnames(mat_w)] graph <- gen_graph_topo(mat_w, mat_topo, topo = "mst")
mat_w <- mat_gen_dist(x = data_ex_genind, dist = 'DPS') suppressWarnings(mat_topo <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_topo <- mat_topo[row.names(mat_w), colnames(mat_w)] graph <- gen_graph_topo(mat_w, mat_topo, topo = "mst")
The function converts a text file in the format used by GENEPOP software into a genind object
genepop_to_genind(path, n.loci, pop_names = NULL, allele.digit.coding = 3)
genepop_to_genind(path, n.loci, pop_names = NULL, allele.digit.coding = 3)
path |
A character string with the path leading to the GENEPOP file in format .txt, or alternatively the name of this file in the working directory. |
n.loci |
The number of loci in the GENEPOP file (integer or numeric). |
pop_names |
(optional) Populations' names in the same order as in the GENEPOP file. Vector object (class character) of the same length as the number of populations. Without this parameter, populations are numbered from 1 to the number of populations. |
allele.digit.coding |
Number indicating whether alleles are coded with 3 (default) or 2 digits. |
This function uses functions from pegas package. GENEPOP file should can include microsatellites loci or SNPs with allele names of length 2 or 3 (noted as 01, 02, 03 or 04 for SNPs). The loci line(s) must not start with a spacing.
An object of type genind
.
P. Savary
Raymond M (1995). “GENEPOP: Population genetics software for exact tests and ecumenism. Vers. 1.2.” Journal of Heredity, 86, 248–249.
For more details about GENEPOP file formatting :
https://genepop.curtin.edu.au:443/help_input.html
For the opposite conversion, see genind_to_genepop
.
The output file can be used to compute pairwise FST matrix
with mat_pw_fst
path_in <- system.file('extdata', 'gpop_simul_10_g100_04_20.txt', package = 'graph4lg') file_n <- file.path(tempdir(), "gpop_simul_10_g100_04_20.txt") file.copy(path_in, file_n, overwrite = TRUE) genepop_to_genind(path = file_n, n.loci = 20, pop_names = as.character(order(as.character(1:10)))) file.remove(file_n)
path_in <- system.file('extdata', 'gpop_simul_10_g100_04_20.txt', package = 'graph4lg') file_n <- file.path(tempdir(), "gpop_simul_10_g100_04_20.txt") file.copy(path_in, file_n, overwrite = TRUE) genepop_to_genind(path = file_n, n.loci = 20, pop_names = as.character(order(as.character(1:10)))) file.remove(file_n)
The function converts an object of class genind
into
a GENEPOP file.
It then allows to use the functionalities of the GENEPOP software and
its derived package GENEPOP on R, as well as some functions
from other packages (differentiation test, F-stats calculations,
HWE test,...).
It is designed to be used with diploid microsatellite data with
alleles coded with 2 or 3 digits or SNPs genind objects.
genind_to_genepop(x, output = "data.frame")
genind_to_genepop(x, output = "data.frame")
x |
An object of class |
output |
A character string indicating the option used to select what the function will return:
|
An object of type data.frame
if ouput = "data.frame"
.
If output
is the path and/or the file name of a text file, then
nothing is returned in R environment but a text file is created with the
specified file name, either in the current working directory or in the
specified folder.
Do not confound this function with genind2genpop
from adegenet. The latter converts an object of class genind
into an object of class genpop
, whereas genind_to_genepop
converts an object of class genind
into a text file compatible with
GENEPOP software (Rousset, 2008).
This function can handle genetic data with different allele coding: 2 or 3 digit coding for microsatellite data or 2 digit coding for SNPs (A,C,T,G become respectively 01, 02, 03, 04).
When individuals in input data are not ordered by populations, individuals from the same population can be separated by individuals from other populations. It can be problematic when calculating then pairwise distance matrices. Therefore, in such a case, individuals are ordered by populations and populations ordered in alphabetic order.
P. Savary
Raymond M (1995). “GENEPOP: Population genetics software for exact tests and ecumenism. Vers. 1.2.” Journal of Heredity, 86, 248–249.
For more details about GENEPOP file formatting :
https://genepop.curtin.edu.au:443/help_input.html.
For the opposite conversion, see genepop_to_genind
.
The output file can be used to compute pairwise FST matrix
with mat_pw_fst
data(data_ex_genind) x <- data_ex_genind df_genepop <- suppressWarnings(genind_to_genepop(x, output = "data.frame"))
data(data_ex_genind) x <- data_ex_genind df_genepop <- suppressWarnings(genind_to_genepop(x, output = "data.frame"))
The function checks for the presence of Graphab (.jar) on the user's machine and downloads it if absent. It also checks that users have installed java on their machine.
get_graphab(res = TRUE, return = FALSE)
get_graphab(res = TRUE, return = FALSE)
res |
Logical indicating whether a message says if Graphab has been downloaded or not. |
return |
Logical indicating whether the function returns a 1 or a 0 to indicate if Graphab has been downloaded or not. |
If the download does not work, you can create a directory named
'graph4lg_jar' in the directory rappdirs::user_data_dir()
and copy
Graphab software downloaded from
https://thema.univ-fcomte.fr/productions/download.php?name=graphab&version=2.8&username=Graph4lg&institution=R
If res = TRUE, the function displays a message indicating to users what has been done. If return = TRUE, it returns a 0 if Graphab is already on the machine and a 1 if it has been downloaded.
P. Savary
## Not run: get_graphab() ## End(Not run)
## Not run: get_graphab() ## End(Not run)
The function gets a linkset computed in the Graphab project
get_graphab_linkset(proj_name, linkset, proj_path = NULL)
get_graphab_linkset(proj_name, linkset, proj_path = NULL)
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
linkset |
A character string indicating the name of the link set
whose properties are imported. The link set has been created with Graphab
or using |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
See more information in Graphab 2.8 manual:
https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf.
This function works if link{get_graphab}
function works correctly.
A data.frame with the link properties (from, to, cost-distance, Euclidean distance)
P. Savary
## Not run: get_graphab_linkset(proj_name = "grphb_ex", linkset = "lkst1") ## End(Not run)
## Not run: get_graphab_linkset(proj_name = "grphb_ex", linkset = "lkst1") ## End(Not run)
The function extracts the cost values associated with a linkset in a Graphab project
get_graphab_linkset_cost(proj_name, linkset, proj_path = NULL)
get_graphab_linkset_cost(proj_name, linkset, proj_path = NULL)
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml will be created. |
linkset |
(optional, default=NULL) A character string indicating the
name of the link set used to create the graph. Link sets can be created
with |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
The function returns a data.frame with the cost values corresponding to every raster code value.
P. Savary
## Not run: proj_name <- "grphb_ex" get_graphab_linkset_cost(proj_name = proj_name, linkset = "lkst1") ## End(Not run)
## Not run: proj_name <- "grphb_ex" get_graphab_linkset_cost(proj_name = proj_name, linkset = "lkst1") ## End(Not run)
The function gets the metrics computed at the node-level in the Graphab project
get_graphab_metric(proj_name, proj_path = NULL)
get_graphab_metric(proj_name, proj_path = NULL)
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
The imported metrics describe the patches and have been computed from the different graphs created in the Graphab project. See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
A data.frame with metrics computed at the patch level.
P. Savary
## Not run: get_graphab_metric(proj_name = "grphb_ex") ## End(Not run)
## Not run: get_graphab_metric(proj_name = "grphb_ex") ## End(Not run)
The function extracts unique raster codes from a Graphab project
get_graphab_raster_codes(proj_name, mode = "all", proj_path = NULL)
get_graphab_raster_codes(proj_name, mode = "all", proj_path = NULL)
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml will be created. |
mode |
A character string equal to either 'all' (default) or 'habitat' indicating whether the returned codes are all the codes of the source raster used for creating the project or only the code corresponding to the habitat patches. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
The function returns a vector of integer values corresponding to the source raster codes (all the codes or only the one corresponding to habitat patches).
P. Savary
## Not run: proj_name <- "grphb_ex" get_graphab_raster_codes(proj_name = proj_name, mode = "all") ## End(Not run)
## Not run: proj_name <- "grphb_ex" get_graphab_raster_codes(proj_name = proj_name, mode = "all") ## End(Not run)
The function computes the Adjusted Rand Index (ARI) to compare two graphs' partitions into modules or clusters more generally. Both graphs must have the same number of nodes, but not necessarily the same number of links. They must also have the same node names and in the same order.
graph_modul_compar( x, y, mode = "graph", nb_modul = NULL, algo = "fast_greedy", node_inter = "distance", data = NULL )
graph_modul_compar( x, y, mode = "graph", nb_modul = NULL, algo = "fast_greedy", node_inter = "distance", data = NULL )
x |
The first graph object
|
y |
The second graph object
Same classes possible as for |
mode |
A character string indicating whether x and y are igraph objects,
vectors or columns from a data.frame. |
nb_modul |
(if x and y are igraph objects) A numeric or integer value or a numeric vector with 2 elements indicating the number of modules to create in both graphs.
|
algo |
(if x and y are igraph objects) A character string indicating the algorithm used to create the modules with igraph.
|
node_inter |
(optional, if x and y are igraph objects, default is 'none') A character string indicating whether the links of the graph are weighted by distances or by similarity indices. It is only used to compute the modularity index. It can be:
Two different weightings can be used to create the modules of the two graphs.
|
data |
(if x and y are columns from a data.frame) An object of class data.frame with at least two columns and as many rows as there are nodes in the graphs compared. The columns indicate the modules of each node in 2 different classifications. |
This index takes values between -1 and 1. It measures how often
pairs of nodes pertaining to the same module in one graph also pertain to
the same module in the other graph.
Therefore, large values indicate that both partitions are similar.
The Rand Index can be defined as the frequency of agreement between two
classifications into discrete classes. It is the number of times a pair of
elements are classified into the same class or in two different classes
in both compared classifications, divided by the total number of possible
pairs of elements. The Rand Index is between 0 and 1 but its maximum value
depends on the number of elements. Thus, another 'adjusted' index was
created, the Adjusted Rand Index. According to the Hubert et
Arabie's formula, the ARI is computed as follows:
where the values of Index, Expected index and Maximum index are computed
from a contingency table.
This function uses
adjustedRandIndex
from package mclust which
applies the Hubert and Arabie's formula for the ARI.
This function works for undirected graphs only.
The value of the ARI
P. Savary
Dyer RJ, Nason JD (2004). “Population graphs: the graph theoretic shape of genetic structure.” Molecular ecology, 13(7), 1713–1727. Hubert L, Arabie P (1985). “Comparing partitions.” Journal of classification, 2(1), 193–218. Clauset A, Newman ME, Moore C (2004). “Finding community structure in very large networks.” Physical review E, 70(6). Blondel VD, Guillaume J, Lambiotte R, Lefebvre E (2008). “Fast unfolding of communities in large networks.” Journal of Statistical Mechanics - Theory and Experiment, 10. Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (2008). “On modularity clustering.” IEEE transactions on knowledge and data engineering, 20(2), 172–188. Pons P, Latapy M (2006). “Computing communities in large networks using random walks.” J. Graph Algorithms Appl., 10(2), 191–218.
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data=pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 24000, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") ARI <- graph_modul_compar(x = graph_obs, y = graph_pred)
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data=pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 24000, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") ARI <- graph_modul_compar(x = graph_obs, y = graph_pred)
The function computes a correlation coefficient between the graph-theoretic metric values computed at the node-level in two graphs sharing the same nodes. It allows to assess whether the connectivity properties of the nodes in one graph are similar to that of the same nodes in the other graph. Alternatively, the correlation is computed between a graph-theoretic metric values and the values of an attribute associated to the nodes of a graph.
graph_node_compar( x, y, metrics = c("siw", "siw"), method = "spearman", weight = TRUE, test = TRUE )
graph_node_compar( x, y, metrics = c("siw", "siw"), method = "spearman", weight = TRUE, test = TRUE )
x |
An object of class |
y |
An object of class |
metrics |
Two-element character vector specifying the graph-theoretic metrics computed at the node-level in the graphs or the node attribute values to be correlated to these metrics. Graph-theoretic metrics can be:
Node attributes must have the same names as in the |
method |
A character string indicating which correlation coefficient
is to be computed ( |
weight |
Logical which indicates whether the links are weighted during
the calculation of the centrality indices betweenness and closeness.
(default: |
test |
Logical. Should significance testing be performed? (default = TRUE) |
The correlation coefficients between the metrics can be computed
in different ways, as initial assumptions (e.g. linear relationship) are
rarely verified. Pearson's r, Spearman's rho and Kendall's tau can be
computed (from function cor
).
When x
is similar to y
, then the correlation is computed
between two metrics characterizing the nodes of the same graph.
A list
summarizing the correlation analysis.
P. Savary
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 9500, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") res_cor <- graph_node_compar(x = graph_obs, y = graph_pred, metrics = c("siw", "siw"), method = "spearman", test = TRUE, weight = TRUE)
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 9500, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") res_cor <- graph_node_compar(x = graph_obs, y = graph_pred, metrics = c("siw", "siw"), method = "spearman", test = TRUE, weight = TRUE)
The function constructs a graph with a minimum planar graph topology
graph_plan(crds, ID = NULL, x = NULL, y = NULL, weight = TRUE)
graph_plan(crds, ID = NULL, x = NULL, y = NULL, weight = TRUE)
crds |
A
|
ID |
A character string indicating the name of the column
of |
x |
A character string indicating the name of the column
of |
y |
A character string indicating the name of the column
of |
weight |
A character string indicating whether the links of the graph are weighted by Euclidean distances (TRUE)(default) or not (FALSE). When the graph links do not have weights in Euclidean distances, each link is given a weight of 1. |
A delaunay triangulation is performed in order to get the planar graph.
A planar graph of class igraph
P. Savary
data(pts_pop_ex) g_plan <- graph_plan(crds = pts_pop_ex, ID = "ID", x = "x", y = "y")
data(pts_pop_ex) g_plan <- graph_plan(crds = pts_pop_ex, ID = "ID", x = "x", y = "y")
The function enables to compare two spatial graphs by plotting them highlighting the topological similarities and differences between them. Both graphs should share the same nodes and cannot be directed graphs.
graph_plot_compar(x, y, crds)
graph_plot_compar(x, y, crds)
x |
A graph object of class |
y |
A graph object of class |
crds |
A
|
The graphs x
and y
of class igraph
must have
node names (not necessarily in the same order as IDs in crds,
given a merging is done).
A ggplot2 object to plot
P. Savary
data(pts_pop_ex) data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") mat_dist <- mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y") mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] g1 <- gen_graph_topo(mat_w = mat_w, topo = "mst") g2 <- gen_graph_topo(mat_w = mat_w, mat_topo = mat_dist, topo = "gabriel") g <- graph_plot_compar(x = g1, y = g2, crds = pts_pop_ex)
data(pts_pop_ex) data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") mat_dist <- mat_geo_dist(data = pts_pop_ex, ID = "ID", x = "x", y = "y") mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] g1 <- gen_graph_topo(mat_w = mat_w, topo = "mst") g2 <- gen_graph_topo(mat_w = mat_w, mat_topo = mat_dist, topo = "gabriel") g <- graph_plot_compar(x = g1, y = g2, crds = pts_pop_ex)
The function converts a graph into a edge list data.frame
graph_to_df(graph, weight = TRUE)
graph_to_df(graph, weight = TRUE)
graph |
A graph object of class |
weight |
Logical. If TRUE (default), then the column 'link' of the output data.frame contains the weights of the links. If FALSE, it contains only 0 and 1. |
The 'graph' nodes must have names. Links must have weights if 'weight = TRUE'.
An object of class data.frame
with a link ID, the origin nodes
('from') and arrival nodes ('to') and the link
value ('link')(weighted or binary)
P. Savary
data(pts_pop_ex) suppressWarnings(mat_geo <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) g1 <- gen_graph_thr(mat_w = mat_geo, mat_thr = mat_geo, thr = 20000) g1_df <- graph_to_df(g1, weight = TRUE)
data(pts_pop_ex) suppressWarnings(mat_geo <- mat_geo_dist(pts_pop_ex, ID = "ID", x = "x", y = "y")) g1 <- gen_graph_thr(mat_w = mat_geo, mat_thr = mat_geo, thr = 20000) g1_df <- graph_to_df(g1, weight = TRUE)
The function enables to export a spatial graph to shapefile layers.
graph_to_shp( graph, crds, mode = "both", crds_crs, layer, dir_path, metrics = FALSE )
graph_to_shp( graph, crds, mode = "both", crds_crs, layer, dir_path, metrics = FALSE )
graph |
A graph object of class |
crds |
(if 'mode = 'spatial”) A
|
mode |
Indicates which shapefile layers will be created
|
crds_crs |
An integer indicating the EPSG code of the coordinates reference system to use. The projection and datum are given in the PROJ.4 format. |
layer |
A character string indicating the suffix of the name of the layers to be created. |
dir_path |
A character string corresponding to the path to the directory
in which the shapefile layers will be exported. If |
metrics |
(not considered if 'mode = 'link”) Logical. Should graph node attributes integrated in the attribute table of the node shapefile layer? (default: FALSE) |
Create shapefile layers in the directory specified with the parameter 'dir_path'.
P. Savary
## Not run: data(data_tuto) mat_w <- data_tuto[[1]] gp <- gen_graph_topo(mat_w = mat_w, topo = "gabriel") crds_crs <- 2154 crds <- pts_pop_simul layer <- "graph_dps_gab" graph_to_shp(graph = gp, crds = pts_pop_simul, mode = "both", crds_crs = crds_crs, layer = "test_fonct", dir_path = tempdir(), metrics = FALSE) ## End(Not run)
## Not run: data(data_tuto) mat_w <- data_tuto[[1]] gp <- gen_graph_topo(mat_w = mat_w, topo = "gabriel") crds_crs <- 2154 crds <- pts_pop_simul layer <- "graph_dps_gab" graph_to_shp(graph = gp, crds = pts_pop_simul, mode = "both", crds_crs = crds_crs, layer = "test_fonct", dir_path = tempdir(), metrics = FALSE) ## End(Not run)
The function computes several indices in order to compare two graph topologies. One of the graph has the "true" topology the other is supposed to reproduce. The indices are then a way to assess the reliability of the latter graph. Both graphs must have the same number of nodes, but not necessarily the same number of links. They must also have the same node names and in the same order.
graph_topo_compar(obs_graph, pred_graph, mode = "mcc", directed = FALSE)
graph_topo_compar(obs_graph, pred_graph, mode = "mcc", directed = FALSE)
obs_graph |
A graph object of class |
pred_graph |
A graph object of class |
mode |
A character string specifying which index to compute in order to compare the topologies of the graphs.
|
directed |
Logical (TRUE or FALSE) specifying whether both graphs are directed or not. |
The indices are calculated from a confusion matrix counting
the number of links that are in the "observed" graph ("true") and also
in the "predicted" graph (true positives : TP), that are in the "observed"
graph but not in the "predicted" graph (false negatives : FN), that are not
in the "observed" graph but in the "predicted" graph (false positives : FP)
and that are not in the "observed" graph and not in the "predicted" graph
neither (true negatives: TN). K is the total number of links in the graphs.
K is equal to if the graphs are directed and to
if they are not directed, with n the number
of nodes.
OP = TP + FN, ON = TN + FP, PP = TP + FP and PN = FN + TN.
The Matthews Correlation Coefficient (MCC) is computed as follows:
The Kappa index is computed as follows:
The False Discovery Rate (FDR) is calculated as follows:
The Accuracy is calculated as follows:
The Sensitivity is calculated as follows:
The Specificity is calculated as follows:
The Precision is calculated as follows:
Self loops are not taken into account.
The value of the index computed
P. Savary
Dyer RJ, Nason JD (2004). “Population graphs: the graph theoretic shape of genetic structure.” Molecular ecology, 13(7), 1713–1727. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000). “Assessing the accuracy of prediction algorithms for classification: an overview.” Bioinformatics, 16(5), 412–424. Matthews BW (1975). “Comparison of the predicted and observed secondary structure of T4 phage lysozyme.” Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442–451.
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data=pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 15000, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") graph_topo_compar(obs_graph = graph_obs, pred_graph = graph_pred, mode = "mcc", directed = FALSE)
data(data_ex_genind) data(pts_pop_ex) mat_dist <- suppressWarnings(graph4lg::mat_geo_dist(data=pts_pop_ex, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] graph_obs <- gen_graph_thr(mat_w = mat_dist, mat_thr = mat_dist, thr = 15000, mode = "larger") mat_gen <- mat_gen_dist(x = data_ex_genind, dist = "DPS") graph_pred <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") graph_topo_compar(obs_graph = graph_obs, pred_graph = graph_pred, mode = "mcc", directed = FALSE)
The function computes custom capacities of patches in the Graphab project
graphab_capacity( proj_name, mode = "area", patch_codes = NULL, exp = NULL, ext_file = NULL, thr = NULL, linkset = NULL, codes = NULL, cost_conv = FALSE, weight = FALSE, proj_path = NULL, alloc_ram = NULL )
graphab_capacity( proj_name, mode = "area", patch_codes = NULL, exp = NULL, ext_file = NULL, thr = NULL, linkset = NULL, codes = NULL, cost_conv = FALSE, weight = FALSE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name.
The project name is also the name of the project directory in which the
file proj_name.xml is. It can be created with |
mode |
A character string indicating the way capacities are computed. It must be either:
|
patch_codes |
(optional, default=NULL) An integer value or vector
specifying the codes corresponding to the habitat pixel whose corresponding
patches are included to compute the capacity as the area of the habitat
when |
exp |
An integer value specifying the power to which patch area are
raised when |
ext_file |
A character string specifying the name of the .csv file in
which patch capacities are stored. It must be located either in the working
directory or in the directory defined by |
thr |
(optional, default=NULL) An integer or numeric value indicating
the maximum distance in cost distance units (except when
|
linkset |
(optional, default=NULL) A character string indicating the
name of the link set used to take distance into account when computing
the capacity. Only used when |
codes |
An integer value or a vector of integer values specifying the
codes of the raster cells taken into account when computing the capacity in
the neighbourhood of the patches, when |
cost_conv |
FALSE (default) or TRUE. Logical indicating whether numeric
|
weight |
A logical indicating whether the cells are weighted by a weight decreasing with the distance from the patches (TRUE) or not (FALSE). The weights follow a negative exponential decline such that wi = exp(-alpha*di), where wi is the weight of cell i, di its distance from the patch and alpha a parameter determined such that wi = 0.05 when di = thr. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf Be careful, when capacity has been changed. The last changes are taken into account for subsequent calculations in a project.
P. Savary
## Not run: graphab_capacity(proj_name = "grphb_ex", mode = "area") ## End(Not run)
## Not run: graphab_capacity(proj_name = "grphb_ex", mode = "area") ## End(Not run)
The function computes corridors around the least-cost paths which have been computed in the Graphab project.
graphab_corridor( proj_name, graph, maxcost, format = "raster", cost_conv = FALSE, proj_path = NULL, alloc_ram = NULL )
graphab_corridor( proj_name, graph, maxcost, format = "raster", cost_conv = FALSE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name.
The project name is also the name of the project directory in which the
file proj_name.xml is. It can be created with |
graph |
A character string indicating the name of the graph with the
links from which the corridors are computed.
This graph has been created with Graphab or using |
maxcost |
An integer or numeric value indicating the maximum cost
distance from the least-cost paths considered for creating the corridors,
in cost distance units (except when |
format |
(optional, default = "raster") A character string indicating whether the output is a raster file or a shapefile layer. |
cost_conv |
FALSE (default) or TRUE. Logical indicating whether numeric
|
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf Be careful, when capacity has been changed. The last changes are taken into account for subsequent calculations in a project.
P. Savary
## Not run: graphab_corridor(proj_name = "grphb_ex", graph = "graph", maxcost = 1000, format = "raster", cost_conv = FALSE) ## End(Not run)
## Not run: graphab_corridor(proj_name = "grphb_ex", graph = "graph", maxcost = 1000, format = "raster", cost_conv = FALSE) ## End(Not run)
The function creates a graph from a link set in a Graphab project
graphab_graph( proj_name, linkset = NULL, name = NULL, thr = NULL, cost_conv = FALSE, proj_path = NULL, alloc_ram = NULL )
graphab_graph( proj_name, linkset = NULL, name = NULL, thr = NULL, cost_conv = FALSE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name.
The project name is also the name of the project directory in which the
file proj_name.xml is. It can be created with |
linkset |
(optional, default=NULL) A character string indicating the
name of the link set used to create the graph. If |
name |
(optional, default=NULL) A character string indicating the
name of the graph created. If |
thr |
(optional, default=NULL) An integer or numeric value indicating the maximum distance associated with the links of the created graph. It allows users to create a pruned graph based on a distance threshold. Note that when the link set used has a planar topology, the graph is necessarily a pruned graph (not complete) and adding this threshold parameter can remove other links. When the link set has been created with cost-distances, the parameter is expressed in cost-distance units whereas when the link set is based upon Euclidean distances, the parameter is expressed in meters. |
cost_conv |
FALSE (default) or TRUE. Logical indicating whether numeric
|
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
By default, intra-patch distances are considered for metric calculation. See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
P. Savary
## Not run: graphab_graph(proj_name = "grphb_ex", linkset = "lcp", name = "graph") ## End(Not run)
## Not run: graphab_graph(proj_name = "grphb_ex", linkset = "lcp", name = "graph") ## End(Not run)
The function creates a raster with interpolated connectivity metric values from a metric already computed in the Graphab project.
graphab_interpol( proj_name, name, reso, linkset, graph, var, dist, prob = 0.05, thr = NULL, summed = FALSE, proj_path = NULL, alloc_ram = NULL )
graphab_interpol( proj_name, name, reso, linkset, graph, var, dist, prob = 0.05, thr = NULL, summed = FALSE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name.
The project name is also the name of the project directory in which the
file proj_name.xml is. It can be created with |
name |
A character string indicating the name of the raster to be created after the interpolation. |
reso |
An integer indicating the spatial resolution in meters of the raster resulting from the metric interpolation. |
linkset |
A character string indicating the name of the link set used for the interpolation. It should be the one used to create the used graph and the metric. |
graph |
A character string indicating the name of the graph from which
the metric was computed and whose links are considered for a potential
multi-linkage with patches.
This graph has been created with Graphab or using |
var |
A character string indicating the name of the already computed metric to be interpolated. |
dist |
A numeric or integer value specifying the distance at which we
assume a probability equal to |
prob |
A numeric or integer value specifying the probability
at distance |
thr |
(default NULL) If NULL, the value of each pixel is computed from
the value of the metric at the nearest habitat patch, weighted by a
probability depending on distance. If an integer, the value of each pixel
depends on the values of the metric taken at several of the nearest habitat
patches, up to a distance (cost or Euclidean distance, depending on the type
of linkset) equal to |
summed |
Logical (default = FALSE) only used if |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf Be careful, when capacity has been changed. The last changes are taken into account for subsequent calculations in a project.
P. Savary
## Not run: graphab_interpol(proj_name = "grphb_ex", name = "F_interp", reso = 20, linkset = "lcp", graph = "graph", var = "F_d600_p0.5_beta1_graph", dist = 600, prob = 0.5) ## End(Not run)
## Not run: graphab_interpol(proj_name = "grphb_ex", name = "F_interp", reso = 20, linkset = "lcp", graph = "graph", var = "F_d600_p0.5_beta1_graph", dist = 600, prob = 0.5) ## End(Not run)
The function creates a link set between habitat patches in the Graphab project.
graphab_link( proj_name, distance = "cost", name, cost = NULL, topo = "planar", remcrosspath = FALSE, proj_path = NULL, alloc_ram = NULL )
graphab_link( proj_name, distance = "cost", name, cost = NULL, topo = "planar", remcrosspath = FALSE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name.
The project name is also the name of the project directory in which the
file proj_name.xml is. It can be created with |
distance |
A character string indicating whether links between patches are computed based on:
In the resulting link set, each link will be associated with its
corresponding cost-distance and the length of the least-cost path in meters
(if |
name |
A character string indicating the name of the created linkset. |
cost |
This argument could be:
|
topo |
A character string indicating the topology of the created link set. It can be:
|
remcrosspath |
(optional, default = FALSE) A logical indicating whether links crossing patches are removed (TRUE). |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
By default, links crossing patches are not ignored nor broken into two links. For example, a link from patches A to C crossing patch B is created. It takes into account the distance inside patch B. It can be a problem when computing BC index. See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
P. Savary, T. Rudolph
## Not run: df_cost <- data.frame(code = 1:5, cost = c(1, 10, 100, 1000, 1)) graphab_link(proj_name = "grphb_ex", distance = "cost", name = "lcp", cost = df_cost, topo = "complete") ## End(Not run)
## Not run: df_cost <- data.frame(code = 1:5, cost = c(1, 10, 100, 1000, 1)) graphab_link(proj_name = "grphb_ex", distance = "cost", name = "lcp", cost = df_cost, topo = "complete") ## End(Not run)
The function computes connectivity metrics on a graph from a link set in a Graphab project
graphab_metric( proj_name, graph, metric, multihab = FALSE, dist = NULL, prob = 0.05, beta = 1, cost_conv = FALSE, return_val = TRUE, proj_path = NULL, alloc_ram = NULL )
graphab_metric( proj_name, graph, metric, multihab = FALSE, dist = NULL, prob = 0.05, beta = 1, cost_conv = FALSE, return_val = TRUE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
graph |
A character string indicating the name of the graph on which
the metric is computed. This graph has been created with Graphab
or using |
metric |
A character string indicating the metric which will be computed on the graph. This metric can be:
For most metrics, the interaction probability is computed for each pair of
patches from the path that minimizes the distance d (or the cost) between
them. It then maximizes |
multihab |
A logical (default = FALSE) indicating whether the
'multihabitat' mode is used when computing the metric. It only applies to
the following metrics: 'EC', 'F', 'IF' and 'BC'. If TRUE, then the project
must have been created with the option |
dist |
A numeric or integer value specifying the distance at which
dispersal probability is equal to |
prob |
A numeric or integer value specifying the dispersal probability
at distance |
beta |
A numeric or integer value between 0 and 1 specifying the
exponent associated with patch areas in the computation of metrics
weighted by patch area. By default, |
cost_conv |
FALSE (default) or TRUE. Logical indicating whether numeric
|
return_val |
Logical (default = TRUE) indicating whether metric values are returned in R (TRUE) or only stored in the patch attribute layer (FALSE) |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
The metrics are described in Graphab 2.8 manual:
https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
Graphab software makes possible the computation of other metrics.
Be careful, when the same metric is computed several times, the option
return=TRUE
is not returning the right columns. In these cases,
use get_graphab_metric
.
If return_val=TRUE
, the function returns a data.frame
with the computed metric values and the corresponding patch ID when the
metric is local or delta metric, or the numeric value of the global metric.
P. Savary
## Not run: graphab_metric(proj_name = "grphb_ex", graph = "graph", metric = "PC", dist = 1000, prob = 0.05, beta = 1) ## End(Not run)
## Not run: graphab_metric(proj_name = "grphb_ex", graph = "graph", metric = "PC", dist = 1000, prob = 0.05, beta = 1) ## End(Not run)
The function creates modules from a graph by maximising modularity
graphab_modul( proj_name, graph, dist, prob = 0.05, beta = 1, nb = NULL, return = TRUE, proj_path = NULL, alloc_ram = NULL )
graphab_modul( proj_name, graph, dist, prob = 0.05, beta = 1, nb = NULL, return = TRUE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
graph |
A character string indicating the name of the graph on which
the modularity index is computed. This graph has been created with Graphab
or using |
dist |
A numeric or integer value specifying the distance at which
dispersal probability is equal to |
prob |
A numeric or integer value specifying the dispersal probability
at distance |
beta |
A numeric or integer value between 0 and 1 specifying the
exponent associated with patch areas in the computation of metrics
weighted by patch area. By default, |
nb |
(optional, default=NULL) An integer or numeric value indicating the number of modules to be created. By default, it is the number that maximises the modularity index. |
return |
Logical (default=TRUE) indicating whether results are returned to user. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
This function maximises a modularity index by searching for the
node partition involves a large number of links within modules and a small
number of inter-module links. Each link is given a weight in the computation,
such as the weight of the link between patches i and j is:
. This function does not allow users to convert automatically Euclidean distances into cost-distances. See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
If return=TRUE
, the function returns a message indicating
whether the partition has been done. New options are being developed.
P. Savary
## Not run: graphab_modul(proj_name = "grphb_ex", graph = "graph", dist = 1000, prob = 0.05, beta = 1) ## End(Not run)
## Not run: graphab_modul(proj_name = "grphb_ex", graph = "graph", dist = 1000, prob = 0.05, beta = 1) ## End(Not run)
The function adds a spatial point set to the Graphab project, allowing users to identify closest habitat patch from each point and get corresponding connectivity metrics.
graphab_pointset( proj_name, linkset, pointset, id = "ID", return_val = TRUE, proj_path = NULL, alloc_ram = NULL )
graphab_pointset( proj_name, linkset, pointset, id = "ID", return_val = TRUE, proj_path = NULL, alloc_ram = NULL )
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
linkset |
A character string indicating the name of the link set used.
The link set is here used to get the defined cost values and compute the
distance from the point to the patches. Link sets can be created
with |
pointset |
Can be either;
The point ID column must be 'ID' by default but can also be specified
by the |
id |
A character string indicating the name of the column in either the .csv table, data.frame or attribute table, corresponding to the ID of the points. By default, it should be 'ID'. This column is used for naming the points when returning the output. |
return_val |
Logical (default=TRUE) indicating whether the metrics associated with closest habitat patches from the points are returned to users. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
Point coordinates must be in the same coordinate reference system as the habitat patches (and initial raster layer). See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
If return_val=TRUE
, the function returns a data.frame
with the properties of the nearest patch to every point in the point set,
as well as the distance from each point to the nearest patch.
P. Savary
## Not run: graphab_pointset(proj_name = "grphb_ex", graph = "graph", pointset = "pts.shp") ## End(Not run)
## Not run: graphab_pointset(proj_name = "grphb_ex", graph = "graph", pointset = "pts.shp") ## End(Not run)
The function creates a Graphab project from a raster file on which habitat patches can be delimited.
graphab_project( proj_name, raster, habitat, nomerge = FALSE, minarea = 0, nodata = NULL, maxsize = NULL, con8 = FALSE, alloc_ram = NULL, proj_path = NULL )
graphab_project( proj_name, raster, habitat, nomerge = FALSE, minarea = 0, nodata = NULL, maxsize = NULL, con8 = FALSE, alloc_ram = NULL, proj_path = NULL )
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml will be created. |
raster |
A character string indicating the name of the .tif raster file or of its path. If the path is not specified, the raster must be present in the current working directory. Raster cell values must be in INT2S encoding. |
habitat |
An integer or numeric value or vector indicating the code.s (cell value.s) of the habitat cells in the raster file. |
nomerge |
(optional, default=FALSE) A logical indicating whether
contiguous patches corresponding to different pixel codes are merged
(FALSE, default) or not merged (TRUE).
Be careful, the |
minarea |
(optional, default=0) An integer or numeric value specifiying the minimum area in hectares for a habitat patch size to become a graph node. |
nodata |
(optional, default=NULL) An integer or numeric value specifying the code in the raster file associated with nodata value (often corresponding to peripheric cells) |
maxsize |
(optional, default=NULL) An integer or numeric value
specifying the maximum side length of the rectangular full extent of each
habitat patch in metric units. If this side length exceeds |
con8 |
(optional, default=FALSE) A logical indicating whether a
neighborhood of 8 pixels (TRUE) is used for patch definition. By default,
|
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
A habitat patch consists of the central pixel with its eight neighbors if they are of the same value (8-connexity) and the path geometry is not simplified. See more information in Graphab 2.8 manual: https://sourcesup.renater.fr/www/graphab/download/manual-2.8-en.pdf
P. Savary, T. Rudolph
## Not run: proj_name <- "grphb_ex" raster <- "rast_ex.tif" habitat <- 5 graphab_project(proj_name = proj_name, raster = raster, habitat = habitat) ## End(Not run)
## Not run: proj_name <- "grphb_ex" raster <- "rast_ex.tif" habitat <- 5 graphab_project(proj_name = proj_name, raster = raster, habitat = habitat) ## End(Not run)
The function describes the objects of a Graphab project
graphab_project_desc( proj_name, mode = "patches", linkset = NULL, proj_path = NULL, fig = FALSE, return_val = TRUE )
graphab_project_desc( proj_name, mode = "patches", linkset = NULL, proj_path = NULL, fig = FALSE, return_val = TRUE )
proj_name |
A character string indicating the Graphab project name. The project name is also the name of the project directory in which the file proj_name.xml is. |
mode |
A character string indicating the objects of the project that are described. It must be either:
|
linkset |
A character string indicating the name of the link set
whose properties are imported. The link set has been created with Graphab
or using |
proj_path |
(optional) A character string indicating the path to the
directory that contains the project directory. It should be used when the
project directory is not in the current working directory. Default is NULL.
When 'proj_path = NULL', the project directory is equal to |
fig |
Logical (default = FALSE) indicating whether to plot a figure of
the resulting spatial graph. The figure is plotted using function
|
return_val |
Logical (default = TRUE) indicating whether the project features are returned as a list (TRUE) or only displayed in the R console (FALSE). |
P. Savary
## Not run: graphab_project_desc(proj_name = "grphb_ex", mode = "patches", fig = FALSE) ## End(Not run)
## Not run: graphab_project_desc(proj_name = "grphb_ex", mode = "patches", fig = FALSE) ## End(Not run)
The function creates a landscape graph from a link set created
with Graphab software or different functions of this package and converts
it into a graph object of class igraph
.
The graph has weighted links and is undirected.
Nodes attributes present in the Graphab project are included, including
connectivity metrics when computed
graphab_to_igraph( proj_name, linkset, nodes = "patches", weight = "cost", proj_path = NULL, fig = FALSE, crds = FALSE )
graphab_to_igraph( proj_name, linkset, nodes = "patches", weight = "cost", proj_path = NULL, fig = FALSE, crds = FALSE )
proj_name |
A character string indicating the project name. It is also the name of the directory in which proj_name.xml file is found. By default, 'proj_name' is searched into the current working directory |
linkset |
A character string indicating the name of the linkset used to
create the graph links. The linkset must have been created previously (see
the function |
nodes |
A character string indicating whether the nodes of the created
graph are given all the attributes or metrics computed in Graphab or only
those specific to a given graph previously created with
|
weight |
A character string ("euclid" or "cost") indicating whether to weight the links with Euclidean distance or cost-distance (default) values. |
proj_path |
(optional) A character string indicating the path to the directory that contains the project directory ('proj_name'). By default, 'proj_name' is searched into the current working directory |
fig |
Logical (default = FALSE) indicating whether to plot a figure of
the resulting spatial graph. The figure is plotted using function
|
crds |
Logical (default = FALSE) indicating whether to create an object
of class |
A graph object of class igraph
(if crds = FALSE) or a
list of objects: a graph object of class igraph
and a
data.frame
with the nodes spatial coordinates (if crds = TRUE).
P. Savary
Foltête J, Clauzel C, Vuidel G (2012). “A software tool dedicated to the modelling of landscape networks.” Environmental Modelling & Software, 38, 316–327.
## Not run: proj_path <- system.file('extdata',package='graph4lg') proj_name <- "grphb_ex" linkset <- "lkst1" nodes <- "graph" graph <- graphab_to_igraph(proj_name = proj_name, linkset = "lkst1", nodes = "graph", links = links, weights = "cost", proj_path = proj_path, crds = FALSE, fig = FALSE) ## End(Not run)
## Not run: proj_path <- system.file('extdata',package='graph4lg') proj_name <- "grphb_ex" linkset <- "lkst1" nodes <- "graph" graph <- graphab_to_igraph(proj_name = proj_name, linkset = "lkst1", nodes = "graph", links = links, weights = "cost", proj_path = proj_path, crds = FALSE, fig = FALSE) ## End(Not run)
The function converts a file formatted to use gstudio or popgraph package into a genind object (adegenet package)
gstud_to_genind(x, pop_col, ind_col = NULL)
gstud_to_genind(x, pop_col, ind_col = NULL)
x |
An object of class |
pop_col |
A character string indicating the name of the column with
populations' names in |
ind_col |
(optional) A character string indicating the name of the
column with individuals' ID in |
This function uses functions from pegas package. It can handle genetic data where alleles codings do not have same length, (99:101, for example). If the names of the loci include '.' characters, they will be replaced by '_'.
An object of class genind
.
P. Savary
data("data_ex_gstud") x <- data_ex_gstud pop_col <- "POP" ind_col <- "ID" data_genind <- gstud_to_genind(x, pop_col, ind_col)
data("data_ex_gstud") x <- data_ex_gstud pop_col <- "POP" ind_col <- "ID" data_genind <- gstud_to_genind(x, pop_col, ind_col)
The function computes the constant parameters of a dispersal kernel with a negative exponential distribution
kernel_param(p, d_disp, mode = "A")
kernel_param(p, d_disp, mode = "A")
p |
A numeric value indicating the dispersal probability at a distance equal to 'd_disp' under a negative exponential distribution. |
d_disp |
A numeric value indicating the distance to which dispersal probability is equal to 'p' under a negative exponential distribution. |
mode |
A character string indicating the value to return:
|
If the resulting parameter when mode = "A" is a and the resulting parameter when mode = "B" is b, then we have: p = exp(-a.d_disp) = 10^(-b.d_disp) and a = b.ln(10)
A numeric value
P. Savary
p <- 0.5 d_disp <- 3000 alpha <- kernel_param(p, d_disp, mode = "A")
p <- 0.5 d_disp <- 3000 alpha <- kernel_param(p, d_disp, mode = "A")
This function is exactly the same as loci2genind
from pegas package
loci_to_genind(x, ploidy = 2, na.alleles = c("NA"))
loci_to_genind(x, ploidy = 2, na.alleles = c("NA"))
x |
An object of class |
ploidy |
An integer indicating the ploidy level (by default, 'ploidy = 2') |
na.alleles |
A character vector indicating the coding of the alleles to be treated as missing data (by default, 'na.alleles = c("NA")') |
An object of class genind
P. Savary
data("data_ex_loci") genind <- loci_to_genind(data_ex_loci, ploidy = 2, na.alleles = "NA")
data("data_ex_loci") genind <- loci_to_genind(data_ex_loci, ploidy = 2, na.alleles = "NA")
The function computes cost-distances associated to least cost paths between point pairs on a raster with specified cost values.
mat_cost_dist( raster, pts, cost, method = "gdistance", return = "mat", direction = 8, parallel.java = 1, alloc_ram = NULL )
mat_cost_dist( raster, pts, cost, method = "gdistance", return = "mat", direction = 8, parallel.java = 1, alloc_ram = NULL )
raster |
A parameter indicating the raster file on which cost distances are computed. It can be:
All the raster cell values must be present in the column 'code' from
|
pts |
A parameter indicating the points between which cost distances are computed. It can be either:
The point coordinates must be in the same spatial coordinate reference system as the raster file. |
cost |
A
|
method |
A character string indicating the method used to compute the cost distances. It must be:
|
return |
A character string indicating whether the returned object is a
|
direction |
An integer (4, 8, 16) indicating the directions in which
movement can take place from a cell. Only used when |
parallel.java |
An integer indicating how many computer cores are used
to run the .jar file. By default, |
alloc_ram |
(optional, default = NULL) Integer or numeric value indicating RAM gigabytes allocated to the java process when used. Increasing this value can speed up the computations. Too large values may not be compatible with your machine settings. |
The function returns:
If return="mat"
, a pairwise matrix
with cost-distance
values between points.
If return="df"
, an object of type data.frame
with three columns:
from: A character string indicating the ID of the point of origin.
to: A character string indicating the ID of the point of destination.
cost_dist: A numeric indicating the accumulated cost-distance along the least-cost path between point ID1 and point ID2.
P. Savary
## Not run: x <- raster::raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100) raster::values(x) <- sample(c(1,2,3,4), size = 100, replace = TRUE) pts <- data.frame(ID = 1:4, x = c(10, 90, 10, 90), y = c(90, 10, 10, 90)) cost <- data.frame(code = 1:4, cost = c(1, 10, 100, 1000)) mat_cost_dist(raster = x, pts = pts, cost = cost, method = "gdistance") ## End(Not run)
## Not run: x <- raster::raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100) raster::values(x) <- sample(c(1,2,3,4), size = 100, replace = TRUE) pts <- data.frame(ID = 1:4, x = c(10, 90, 10, 90), y = c(90, 10, 10, 90)) cost <- data.frame(code = 1:4, cost = c(1, 10, 100, 1000)) mat_cost_dist(raster = x, pts = pts, cost = cost, method = "gdistance") ## End(Not run)
The function computes a pairwise matrix of genetic distances between populations and allows to implement several formula.
mat_gen_dist(x, dist = "basic", null_val = FALSE)
mat_gen_dist(x, dist = "basic", null_val = FALSE)
x |
An object of class |
dist |
A character string indicating the method used to compute the multilocus genetic distance between populations
|
null_val |
(optional) Logical. Should negative and null FST, FST_lin, GST or D values be replaced by half the minimum positive value? This option allows to compute Gabriel graphs from these "distances". Default is null_val = FALSE. This option only works if 'dist = 'FST” or 'FST_lin' or 'GST' or 'D' |
Negative values are converted into 0.
Euclidean genetic distance between population i and j
is computed as follows:
where
is the allelic frequency of allele k in population i and n is
the total number of alleles. Note that when 'dist = 'weight”, the formula
becomes
where K is the number of alleles at the locus of the allele k and
is the frequency of the allele k in all populations.
Note that when 'dist = 'PCA”, n is the number of conserved independent
principal components and
is the value taken by the principal
component k in population i.
An object of class matrix
P. Savary
Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL (1994). “High resolution of human evolutionary trees with polymorphic microsatellites.” nature, 368(6470), 455–457. Excoffier L, Smouse PE, Quattro JM (1992). “Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.” Genetics, 131(2), 479–491. Dyer RJ, Nason JD (2004). “Population graphs: the graph theoretic shape of genetic structure.” Molecular ecology, 13(7), 1713–1727. Fortuna MA, Albaladejo RG, Fernández L, Aparicio A, Bascompte J (2009). “Networks of spatial genetic variation across species.” Proceedings of the National Academy of Sciences, 106(45), 19044–19049. Weir BS, Cockerham CC (1984). “Estimating F-statistics for the analysis of population structure.” evolution, 38(6), 1358–1370. Hedrick PW (2005). “A standardized genetic differentiation measure.” Evolution, 59(8), 1633–1638. Jost L (2008). “GST and its relatives do not measure differentiation.” Molecular ecology, 17(18), 4015–4026.
data(data_ex_genind) x <- data_ex_genind D <- mat_gen_dist(x = x, dist = "basic")
data(data_ex_genind) x <- data_ex_genind D <- mat_gen_dist(x = x, dist = "basic")
The function computes Euclidean geographic distance between points given their spatial coordinates either in a metric projected Coordinate Reference System or in a polar coordinates system.
mat_geo_dist( data, ID = NULL, x = NULL, y = NULL, crds_type = "proj", gc_formula = "vicenty" )
mat_geo_dist( data, ID = NULL, x = NULL, y = NULL, crds_type = "proj", gc_formula = "vicenty" )
data |
An object of class :
|
ID |
(if |
x |
(if |
y |
(if |
crds_type |
A character string indicating the type of coordinate reference system:
|
gc_formula |
A character string indicating the formula used to compute the Great Circle distance:
|
When a projected coordinate reference system is used, it calculates
classical Euclidean geographic distance between two points using
Pythagora's theorem. When a polar coordinate reference system is used, it
calculates the Great circle distance between points using different methods.
Unless method = "polar"
, when data
is a data.frame
,
it assumes projected coordinates by default.
A pairwise matrix of geographic distances between points in meters
P. Savary
# Projected CRS data(pts_pop_simul) mat_dist <- mat_geo_dist(data=pts_pop_simul, ID = "ID", x = "x", y = "y") #Polar CRS city_us <- data.frame(name = c("New York City", "Chicago", "Los Angeles", "Atlanta"), lat = c(40.75170, 41.87440, 34.05420, 33.75280), lon = c(-73.99420, -87.63940, -118.24100, -84.39360)) mat_geo_us <- mat_geo_dist(data = city_us, ID = "name", x = "lon", y = "lat", crds_type = "polar")
# Projected CRS data(pts_pop_simul) mat_dist <- mat_geo_dist(data=pts_pop_simul, ID = "ID", x = "x", y = "y") #Polar CRS city_us <- data.frame(name = c("New York City", "Chicago", "Los Angeles", "Atlanta"), lat = c(40.75170, 41.87440, 34.05420, 33.75280), lon = c(-73.99420, -87.63940, -118.24100, -84.39360)) mat_geo_us <- mat_geo_dist(data = city_us, ID = "name", x = "lon", y = "lat", crds_type = "polar")
The function enables to plot graphs, whether spatial or not.
plot_graph_lg( graph, crds = NULL, mode = "aspatial", node_inter = NULL, link_width = NULL, node_size = NULL, module = NULL, pts_col = NULL )
plot_graph_lg( graph, crds = NULL, mode = "aspatial", node_inter = NULL, link_width = NULL, node_size = NULL, module = NULL, pts_col = NULL )
graph |
A graph object of class |
crds |
(optional, default = NULL) If 'mode = 'spatial”, it is a
This argument is not used when 'mode = 'aspatial” and mandatory when 'mode = 'spatial”. |
mode |
A character string indicating whether the graph is spatial ('mode = 'spatial”) or not ('mode = 'aspatial” (default)) |
node_inter |
(optional, default = NULL) A character string indicating whether the links of the graph are weighted by distances or by similarity indices. It is only used when 'mode = 'aspatial” to compute the node positions with Fruchterman and Reingold algorithm. It can be equal to:
|
link_width |
(optional, default = NULL) A character string indicating how the width of the link is set on the figure. Their width can be:
|
node_size |
(optional, default = NULL) A character string indicating the graph node attribute used to set the node size on the figure. It must be the name of a numeric or integer node attribute from the graph. |
module |
(optional, default = NULL) A character string indicating the graph node modules used to set the node color on the figure. It must be the name of a node attribute from the graph with discrete values. |
pts_col |
(optional, default = NULL) A character string indicating the color used to plot the nodes (default: "#F2B950"). It must be a hexadecimal color code or a color used by default in R. It cannot be used if 'module' is specified. |
When the graph is not spatial ('mode = 'aspatial”),
the nodes coordinates are calculated with Fruchterman et Reingold algorithm.
The graph object graph
of class igraph
must have node names
(not necessarily in the same order as IDs in crds, given a merging is done).
A ggplot2 object to plot
P. Savary
Fruchterman TM, Reingold EM (1991). “Graph drawing by force-directed placement.” Software: Practice and experience, 21(11), 1129–1164.
data(pts_pop_ex) data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") gp <- gen_graph_topo(mat_w = mat_w, topo = "mst") g <- plot_graph_lg(graph = gp, crds = pts_pop_ex, mode = "spatial", link_width = "inv_w")
data(pts_pop_ex) data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") gp <- gen_graph_topo(mat_w = mat_w, topo = "mst") g <- plot_graph_lg(graph = gp, crds = pts_pop_ex, mode = "spatial", link_width = "inv_w")
The function enables to plot histogram to visualize the distribution of the link weights
plot_w_hist(graph, fill = "#396D35", class_width = NULL)
plot_w_hist(graph, fill = "#396D35", class_width = NULL)
graph |
A graph object of class |
fill |
A character string indicating the color used to fill the bars (default: "#396D35"). It must be a hexadecimal color code or a color used by default in R. |
class_width |
(default values: NULL) A numeric or an integer specifying the width of the classes displayed on the histogram. When it is not specified, the width is equal to the difference between the minimum and maximum values divided by 80. |
A ggplot2 object to plot
P. Savary
data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") gp <- gen_graph_topo(mat_w = mat_w, topo = "gabriel") hist <- plot_w_hist(gp)
data(data_ex_genind) mat_w <- mat_gen_dist(data_ex_genind, dist = "DPS") gp <- gen_graph_topo(mat_w = mat_w, topo = "gabriel") hist <- plot_w_hist(gp)
The function computes population-level genetic indices from an
object of class genind
.
pop_gen_index(x, pop_names = NULL, indices = c("Nb_ind", "A", "He", "Ho"))
pop_gen_index(x, pop_names = NULL, indices = c("Nb_ind", "A", "He", "Ho"))
x |
An object of class |
pop_names |
(optional) A character vector indicating population names. It is of the same length as the number of populations. Without this argument, populations are given the names they have initially in the 'genind' object (which is sometimes only a number). The order of the population names must match with their order in the 'genind' object. The function does not reorder them. Users must be careful. |
indices |
(optional) A character vector indicating the population-level indices to compute. These indices can be:
By default, |
An object of class data.frame
whose rows
correspond to populations and columns to population attributes
(ID, size, genetic indices). By default, the first column corresponds to
the population names (ID). The order of the columns depends on the
vector 'indices'.
P. Savary
data(data_ex_genind) x <- data_ex_genind pop_names <- levels(x@pop) df_pop_indices <- pop_gen_index(x = x, pop_names = pop_names, indices = c("Nb_ind", "A"))
data(data_ex_genind) x <- data_ex_genind pop_names <- levels(x@pop) df_pop_indices <- pop_gen_index(x = x, pop_names = pop_names, indices = c("Nb_ind", "A"))
Simulation dataset 10 populations located on a simulated landscape
pts_pop_ex
pts_pop_ex
An object of class 'data.frame' with the following columns :
Population ID of the 10 populations
Site longitude (RGF93)
Site latitude (RGF93)
Landguth EL, Cushman SA (2010). “CDPOP: a spatially explicit cost distance population genetics program.” Molecular Ecology Resources, 10(1), 156–161. There are as many rows as there are sampled populations.
data("pts_pop_ex") str(pts_pop_ex)
data("pts_pop_ex") str(pts_pop_ex)
Simulation dataset 50 populations located on a simulated landscape
pts_pop_simul
pts_pop_simul
An object of class 'data.frame' with the following columns :
Population ID of the 50 populations
Site longitude (RGF93)
Site latitude (RGF93)
Landguth EL, Cushman SA (2010). “CDPOP: a spatially explicit cost distance population genetics program.” Molecular Ecology Resources, 10(1), 156–161. There are as many rows as there are sampled populations.
data("pts_pop_simul") str(pts_pop_simul)
data("pts_pop_simul") str(pts_pop_simul)
The function converts a pairwise matrix into an edge-list data.frame
pw_mat_to_df(pw_mat)
pw_mat_to_df(pw_mat)
pw_mat |
A pairwise matrix which can be:
|
An object of class data.frame
P. Savary
data(data_tuto) pw_mat <- data_tuto[[1]] df <- pw_mat_to_df(pw_mat)
data(data_tuto) pw_mat <- data_tuto[[1]] df <- pw_mat_to_df(pw_mat)
The function reorders the rows and columns of a symmetric matrix according to a specified order.
reorder_mat(mat, order)
reorder_mat(mat, order)
mat |
An object of class |
order |
A character vector with the rows and columns names of the matrix
in the order in which they will be ordered by the function. All its elements
must be rows and columns names of the matrix |
The matrix mat
must be symmetric and have rows and columns
names. Its values are not modified.
A reordered symmetric matrix
P. Savary
mat <- matrix(rnorm(36), 6) mat[lower.tri(mat)] <- t(mat)[lower.tri(mat)] row.names(mat) <- colnames(mat) <- c("A", "C", "E", "B", "D", "F") order <- c("A", "B", "C", "D", "E", "F") mat <- reorder_mat(mat = mat, order = order)
mat <- matrix(rnorm(36), 6) mat[lower.tri(mat)] <- t(mat)[lower.tri(mat)] row.names(mat) <- colnames(mat) <- c("A", "C", "E", "B", "D", "F") order <- c("A", "B", "C", "D", "E", "F") mat <- reorder_mat(mat = mat, order = order)
The function enables to plot scatterplots to visualize the relationship between genetic distance (or differentiation) and landscape distance (Euclidean distance, cost-distance, etc.)between populations or sample sites.
scatter_dist( mat_gd, mat_ld, method = "loess", thr_gd = NULL, thr_ld = NULL, se = TRUE, smooth_col = "black", pts_col = "#999999" )
scatter_dist( mat_gd, mat_ld, method = "loess", thr_gd = NULL, thr_ld = NULL, se = TRUE, smooth_col = "black", pts_col = "#999999" )
mat_gd |
A symmetric |
mat_ld |
A symmetric |
method |
A character string indicating the smoothing method used to fit a line on the scatterplot. Possible values are the same as with function 'geom_smooth()' from ggplot2 : 'lm', 'glm', 'gam', 'loess' (default). |
thr_gd |
(optional) A numeric or integer value used to remove values
from the data before to plot. All genetic distances values above
|
thr_ld |
(optional) A numeric or integer value used to remove values
from the data before to plot. All landscape distances values above
|
se |
Logical (optional, default = TRUE) indicating whether the confidence interval around the smooth line is displayed. |
smooth_col |
(optional) A character string indicating the color used to plot the smoothing line (default: "blue"). It must be a hexadecimal color code or a color used by default in R. |
pts_col |
(optional) Character string indicating the color used to plot the points (default: "#999999"). It must be a hexadecimal color code or a color used by default in R. |
IDs in mat_gd
and mat_ld
must be the same and refer
to the same sampling sites or populations, and both matrices must be ordered
in the same way.
Matrix of genetic distance mat_gd
can be computed using
mat_gen_dist
.
Matrix of landscape distance mat_ld
can be computed using
mat_geo_dist
when the landscape distance needed is a
Euclidean geographical distance.
A ggplot2 object to plot
P. Savary
data(data_tuto) mat_dps <- data_tuto[[1]] mat_dist <- suppressWarnings(mat_geo_dist(data = pts_pop_simul, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] scatterplot_ex <- scatter_dist(mat_gd = mat_dps, mat_ld = mat_dist)
data(data_tuto) mat_dps <- data_tuto[[1]] mat_dist <- suppressWarnings(mat_geo_dist(data = pts_pop_simul, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] scatterplot_ex <- scatter_dist(mat_gd = mat_dps, mat_ld = mat_dist)
The function enables to plot scatterplots of the relationship between two distances (often a genetic distance and a landscape distance between populations or sample sites), while highlighting the population pairs between which a link was conserved during the creation of a graph whose nodes are populations (or sample sites). It thereby allows to visualize the graph pruning intensity.
scatter_dist_g( mat_y, mat_x, graph, thr_y = NULL, thr_x = NULL, pts_col_1 = "#999999", pts_col_2 = "black" )
scatter_dist_g( mat_y, mat_x, graph, thr_y = NULL, thr_x = NULL, pts_col_1 = "#999999", pts_col_2 = "black" )
mat_y |
A symmetric (complete) |
mat_x |
A symmetric (complete) |
graph |
A graph object of class |
thr_y |
(optional) A numeric or integer value used to remove values
from the data before to plot. All values from |
thr_x |
(optional) A numeric or integer value used to remove values
from the data before to plot. All values from |
pts_col_1 |
(optional) A character string indicating the color used to plot the points associated to all populations or sample sites pairs (default: "#999999"). It must be a hexadecimal color code or a color used by default in R. |
pts_col_2 |
(optional) A character string indicating the color used to plot the points associated to populations or sample sites pairs connected on the graph (default: "black"). It must be a hexadecimal color code or a color used by default in R. |
IDs in mat_y
and mat_x
must be the same and refer
to the same sampling sites or populations, and both matrices must be ordered
in the same way.
Matrices of genetic distance can be computed using
mat_gen_dist
.
Matrices of landscape distance can be computed using
mat_geo_dist
when the landscape distance needed is a
Euclidean geographical distance.
This function is based upon scatter_dist
function.
A ggplot2 object to plot
P. Savary
data(data_tuto) mat_gen <- data_tuto[[1]] mat_dist <- suppressWarnings(mat_geo_dist(data=pts_pop_simul, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] x <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") scat <- scatter_dist_g(mat_y = mat_gen, mat_x = mat_dist, graph = x)
data(data_tuto) mat_gen <- data_tuto[[1]] mat_dist <- suppressWarnings(mat_geo_dist(data=pts_pop_simul, ID = "ID", x = "x", y = "y")) mat_dist <- mat_dist[order(as.character(row.names(mat_dist))), order(as.character(colnames(mat_dist)))] x <- gen_graph_topo(mat_w = mat_gen, mat_topo = mat_dist, topo = "gabriel") scat <- scatter_dist_g(mat_y = mat_gen, mat_x = mat_dist, graph = x)
The function converts a text file in STRUCTURE format into a genind object to use in R
structure_to_genind( path, pop_names = NULL, loci_names = NULL, ind_names = NULL )
structure_to_genind( path, pop_names = NULL, loci_names = NULL, ind_names = NULL )
path |
A character string indicating the path to the STRUCTURE file in format .txt, or alternatively the name of the file in the working directory. The STRUCTURE file must only have :
The row for loci names is optional but recommended. Each individual is displayed on 2 rows. |
pop_names |
(optional) A character vector indicating the population names in the same order as in the STRUCTURE file. It is of the same length as the number of populations. Without this argument, populations are numbered from 1 to the total number of individuals. |
loci_names |
A character vector with the names of the loci if not specified in the file first row. This argument is mandatory if the STRUCTURE file does not include the names of the loci in the first row. In other cases, the names of the loci is extracted from the file first row |
ind_names |
(optional) A character vector indicating the individual names in the same order as in the STRUCTURE file. It is of the same length as the number of individuals. Without this argument, individuals are numbered from 1 to the total number of individuals. |
The column order of the resulting object can be different from
that of objects returned by gstud_to_genind
and genepop_to_genind
, depending on allele and loci coding
This function uses functions from pegas package.
For details about STRUCTURE file format :
STRUCTURE user manual
An object of type genind
.
P. Savary
data("data_ex_genind") loci_names <- levels([email protected]) pop_names <- levels(data_ex_genind@pop) ind_names <- row.names(data_ex_genind@tab) path_in <- system.file('extdata', 'data_ex_str.txt', package = 'graph4lg') file_n <- file.path(tempdir(), "data_ex_str.txt") file.copy(path_in, file_n, overwrite = TRUE) str <- structure_to_genind(path = file_n, loci_names = loci_names, pop_names = pop_names, ind_names = ind_names) file.remove(file_n)
data("data_ex_genind") loci_names <- levels(data_ex_genind@loc.fac) pop_names <- levels(data_ex_genind@pop) ind_names <- row.names(data_ex_genind@tab) path_in <- system.file('extdata', 'data_ex_str.txt', package = 'graph4lg') file_n <- file.path(tempdir(), "data_ex_str.txt") file.copy(path_in, file_n, overwrite = TRUE) str <- structure_to_genind(path = file_n, loci_names = loci_names, pop_names = pop_names, ind_names = ind_names) file.remove(file_n)