HTML

In this paper, a graph is denoted as
$G=(V, E)$ , which consists of a node set$V=\{r_1, r_2, \cdots, r_N\}$ and an edge set$E$ . A directed graph contains only directed edges, an undirected graph contains only undirected edges. A graph is usually represented by an adjacency matrix$M_G$ . For a directed graph, there is an edge from the node$r_i$ to the node$r_j$ if and only if the element$m_{ji}$ in the$j$ th row and$i$ th column is nonzero. For an undirected graph, the adjacency matrix is symmetric. A nonzero element in the adjacency matrix represents an undirected edge. Given a directed graph, its skeleton is an undirected graph by stripping away all arrowheads from its edges and getting rid of redundant undirected edges. If there is a directed edge from the node$r_i$ to the node$r_j$ , then the node$r_i$ is a parent of the node$r_j$ and the node$r_j$ is a child of the node$r_i$ . If there is a directed path from the node$r_i$ to the node$r_j$ , then the node$r_i$ is an ancestor of the node$r_j$ and the node$r_j$ is a descendant of the node$r_i$ .Given a directed acyclic graph (DAG)
$G=(V, E)$ , each node denotes a random variable and each directed edge represents a causal effect from one random variable to another one. A causal model represented by$G$ generates a joint probability distribution over the nodes, which is denoted as$P(V)$ .Definition 1. For a DAG
$G=(V, E)$ and a distribution$P(V)$ generated by the causal model represented by$G$ ,$G$ and$P$ satisfy the causal Markov Condition if and only if for every node$r\in V$ ,$r$ is independent of$V \setminus (Descendants(r)\bigcup Parents(r))$ given$Parents(r)$ ^{[23]}.When the causal Markov condition is satisfied, the conditional independence relationships in the generated distribution
$P$ are entailed in the DAG$G$ using the concept called dseparation. For a 3node path$r_1\rightarrow r_2 \leftarrow r_3$ , the middle node$r_2$ is called collider.Definition 2. A path is said to be dseparated by a set
$Z$ if and only if at least one noncollider is in$Z$ or at least one collider and all its descendants are not in$Z$ . Two nodes are said to be dseparated by a set$Z$ if and only if all paths between the two nodes are dseparated by$Z$ ^{[23]}.If two nodes
$r_i$ and$r_j$ in DAG$G$ are dseparated by a node set$Z$ , then$r_i$ and$r_j$ are conditionally independent in distribution$P$ given$Z$ . There is no edge between nodes$r_i$ and$r_j$ if and only if they can be dseparated by some node set. This means that the nodes$r_i$ and$r_j$ are conditionally independent in the distribution$P$ given some node set, if there is no edge in DAG$G$ between the two nodes. However, the reverse statement is not true. For some DAG$G$ and distribution$P$ that satisfy the causal Markov condition, the conditional independence between two nodes$r_i$ and$r_j$ does not necessarily imply the absence of an edge between$r_i$ and$r_j$ . In order to exclude these cases, two equivalent concepts, stability^{[26]} and faithfulness^{[23]}, are introduced.Definition 3. A distribution
$P$ is faithful to a DAG$G$ if all and only the conditionally independence relations in$P$ are entailed by$G$ , i.e., under the faithfulness assumption, there is no edge between two nodes$r_i$ and$r_j$ in$G$ if and only if$r_i$ and$r_j$ are conditional independent in$P$ given some node set^{[23]}.For the details of Bayesian networks and causal Bayesian networks, please refer to the two seminal texts^{[23, 26]}. For references to applications of Bayesian networks in neuroscience, refer to the two recent reviews^{[27, 28]}.

In this paper, we assume the mechanism that generates BOLD signals can be represented by a DAG
$G=(V, E)$ , where$V=\{r_1, r_2, \cdots, r_N\}$ . The node$r_i$ denotes the random variable of the$i$ th region of interest (ROI). The random variable$r_i$ generates$T$ samples of the BOLD signals of the$i$ th ROI, which are denoted as$X_i=\{x_i^1, x_i^2, \cdots, x_i^T\}$ . All random variables generate the same number of samples. The Pearson$'$ s correlation coefficient$\rho_{i, j}$ between$r_i$ and$r_j$ can be estimated from samples as follows:\begin{equation} \hat{\rho}_{i, j} = \frac{\sum\limits_{t=1}^{T}(x_i^t  \bar{x}_i)(x_j^t \bar{x}_j)}{\sqrt{\sum\limits_{t=1}^{T}(x_i^t  \bar{x}_i)^2 \sum\limits_{t=1}^{T}(x_j^t \bar{x}_j)^2}} \end{equation}
(1) where
$\bar{x}_i$ and$\bar{x}_j$ are sample means of random variables$r_i$ and$r_j$ . The Pearson$'$ s correlation coefficient is also called full correlation.Full correlation counts both direct and indirect effects between two random variables. To remove indirect effects, we can calculate the second order correlation between
$r_i$ and$r_j$ by removing the effects of a node set$Z \subseteq (V\{r_i, r_j\})$ . This second order correlation is called the partial correlation$\rho_{i, j\cdot Z}$ by controlling the node set$Z$ . Similarly, partial correlation can be estimated from samples:\begin{equation} \hat{\rho}_{i, j\cdot Z} = \frac{\sum\limits_{t=1}^{T} (\varepsilon_{i\cdot Z}^t  \bar{\varepsilon}_{i\cdot Z}) (\varepsilon_{j\cdot Z}^t \bar{\varepsilon}_{j\cdot Z})} {\sqrt{\sum\limits_{t=1}^{T}(\varepsilon_{i\cdot Z}^t  \bar{\varepsilon}_{i\cdot Z})^2 \sum\limits_{t=1}^{T} (\varepsilon_{j\cdot Z}^t \bar{\varepsilon}_{j\cdot Z})^2}} \end{equation}
(2) where
$\{\varepsilon_{i \cdot Z}^t\}$ is a set of the residuals of the linear regression with$r_i$ as the response variable and$Z$ as the predictor variable set. So is$\{\varepsilon_{j \cdot Z}^t\}$ . The sample partial correlation is the sample full correlation on the residuals when some controlling factors are regressed out.The partial correlation between two nodes varies with the controlling set. For a node set
$V$ , the minimum partial correlation (MPC) between two nodes is the minimum of absolute values of partial correlations by controlling all possible subsets of other nodes, which is formally defined as follows.Definition 4. Given two nodes
$r_i$ and$r_j$ in a node set$V$ , the minimum partial correlation$\omega_{i, j}$ between the two nodes is\begin{equation} \omega_{i, j} = \min \{\rho_{i, j\cdot Z}Z \in 2^{(V\{r_i, r_j\})}\} \end{equation}
(3) where
$\rho_{i, j\cdot Z}$ is the partial correlation between$r_i$ and$r_j$ by controlling the node set$Z$ .$2^{(V\{r_i, r_j\})}$ represents all subsets of the set$V\{r_i, r_j\}$ . A controlling set$Z$ that satisfies$\rho_{i, j\cdot Z}=\omega_{i, j}$ is called a minimum controlling set for nodes$r_i$ and$r_j$ .Next, we will discuss an important property of minimum partial correlation that implies that it is a good measure of the existence of functional connectivity.
Proposition 1. Assuming the distribution
$P$ of the nodes$V$ in a DAG$G$ is multivariate Gaussian and faithful to$G$ , there is an edge between the two nodes$r_i$ and$r_j$ in$G$ if and only if the minimum partial correlation$\omega_{i, j} \neq 0$ .According to [29], the partial correlation
$\rho_{i, j \cdot z}=0$ if and only if$r_i$ and$r_j$ are conditionally independent given$Z$ under Gaussian assumption. Based on this result and Definition 3, Proposition 1 can be easily proved.According to Proposition 1, minimum partial correlation is a measure of the existence of functional connectivity (i.e., the links in functional networks). Because minimum partial correlation is symmetrical, it cannot infer the directions of functional connectivity. According to Definition 1, minimum partial correlation can be calculated only based on partial correlation. Thus, minimum partial correlation can be estimated from samples, which is denoted as
$\hat{\omega}_{i, j}$ .We denote the cardinality of the controlling node set
$Z$ as$Z$ . According to [30], the distribution of the estimated partial correlation$\hat{\rho}_{i, j\cdot Z}$ is the same as the full correlation estimated from$TZ$ samples. Under the hypothesis that$\rho_{i, j\cdot Z}=0$ , the$z$ score$\tilde{\rho}_{i, j\cdot Z}$ of the estimated partial correlation$\hat{\rho}_{i, j\cdot Z}$ can be calculated as follows:\begin{equation} \label{eq:zscore} \tilde{\rho}_{i, j\cdot Z} = \frac{1}{2}\log\Big(\frac{1+\hat{\rho}_{i, j\cdot Z}}{1\hat{\rho}_{i, j\cdot Z}}\Big)\sqrt{TZ3}. \end{equation}
(4) However, the distribution of estimated minimum partial correlation is unknown. In this paper, we approximate the absolute value of
$z$ score$\tilde{\omega}_{i, j\cdot Z}$ of the estimated minimum partial correlation$\hat{\omega}_{i, j\cdot Z}$ as\begin{equation} \tilde{\omega}_{i, j\cdot Z} = \min \{\tilde{\rho}_{i, j\cdot Z}Z \in 2^{(V\{r_i, r_j\})}\}. \end{equation}
(5) Given the BOLD signal samples from all ROIs of a brain, we can calculate the estimated value or the absolute value of its
$z$ score of the minimum partial correlation for each pair of ROIs. These estimated values or absolute values of$z$ scores form a symmetric matrix$\Omega$ . The matrix$\Omega$ represents the skeleton of functional connectivity in a fractional way. Given a directed graph$G$ , its skeleton is an undirected graph by stripping away all arrowheads from its links and removing redundant undirected links. A pair of ROIs with higher estimated minimum partial correlation value or absolute value of the$z$ score is more likely to have a link between them. The matrix$\Omega$ is therefore a parameterfree measure of functional connectivity, which can be used for decoding cognitive states, evaluating brain disease, etc.There are three advantages of parameterfree methods for inferring functional connectivity. The most obvious one is that parameterfree methods are userfriendly, prior knowledge and experiences are not required. A second advantage is that parameterfree methods increase the comparability of results from different studies. If there are some tuning parameters in a method, the results usually vary according to the values of the parameters. Even though all experimental conditions are identical in two studies, it is not valid to compare the two results if different values of parameters are used in the two studies. A third advantage is that interpretability of parameterfree methods is usually better than other methods. Taking ICOV as an example, the physiological meaning of results showing that if a connection exists under some values of the regularisation parameter, but not with others is hard to explain.

It is in polynomial time to estimate partial correlation by controlling a given node set. However, we need to enumerate all possible controlling sets in order to estimate minimum partial correlation, the number of all possible controlling sets is exponential. Several algorithms for causal inference can be easily modified to calculate minimum partial correlation, such as SGSalgorithm^{[23]}, PCalgorithm^{[23]}, TPDA^{[31]}, MMHC^{[32]} and TPMB^{[33]}.
This paper focuses on extending PCalgorithm to approximately calculate minimal partial correlation. The PCalgorithm is composed of two main steps: skeleton graph inference and edge direction determination. This paper aims at generating functional connectivity without orientations of edges. Thus we concentrate on the first step of the PCalgorithm.
The PCalgorithm uses an important feature of Bayesian networks to achieve fast computation. That is, if two nodes
$r_i$ and$r_j$ can be dseparated, then there is a subset of their neighbours which dseparates them. Therefore, there is no need to enumerate all possible controlling sets. Instead the PCalgorithm only considers subsets of the neighbour sets. Because the structure of the graph is not known in advance, the PCalgorithm implements an iterative strategy: One step in the PCalgorithm is based on the graph reduced by the previous steps. The edges of the graph are removed according to a statistical test and a significance threshold. The more edges are removed, the smaller the search space is. However, if an edge is deleted incorrectly, this search space will be improperly reduced. These mistakes will propagate to the following steps.The edge deletion is highly depended on the choice of the significance threshold
$\alpha$ . The null hypothesis of the statistical test in the PCalgorithm is that there is no edge between the selected two nodes. With a lower significance threshold$\alpha$ , it is less likely that the null hypothesis is rejected. In other words, edges are more likely to be removed, which will increase type Ⅱ error. To increase the accuracy of edge removal at each step, a larger$\alpha$ is required. However, the increase of$\alpha$ will increase the calculation time as more edges are remained. Therefore, the significance threshold$\alpha$ balances the accuracy of result and complexity of computation.It is difficult to optimally determine
$\alpha$ , because the accuracy and execution time of a given$\alpha$ cannot be predicted in advance. A common choice of$\alpha$ is 0.05. We can first set$\alpha$ to 0.05 and gradually increase its value to obtain more results with higher accuracy. This can avoid a situation in which the PCalgorithm runs for an undesirably long time for evaluation of a large choice of$\alpha$ . For example, we would like to choose$\alpha$ from the set$\{0.05, 0.1, 0.15, 0.2, 0.25\}$ . We first carry out the PCalgorithm with$\alpha=0.05$ and then with larger values. If the algorithm keeps running for a long time when$\alpha=0.2$ , we can just record the results for$\alpha=0.15$ , as it is the most accurate result that can be obtained within an acceptable execution time.Although it has been demonstrated that the above approach achieved a satisfied accuracy for inferring functional connectivity in simulated fMRI datasets^{[34]}, this approach brings in extra computational loads, as the results with lower significance threshold are not used in the calculation with higher significance thresholds. To save computational effort, this paper proposes the elastic PCalgorithm, taking the concept of elasticity in cloud computing^{[35]}. Our method automatically increases the significance threshold within a given time budget to maximally approach the minimum partial correlation. More specifically, an execution of the elastic PCalgorithm includes several iterations. The execution starts from an iteration with a low significance threshold and continues on the iterations with higher thresholds. The execution stops if the given time budge is used up. An iteration with a higher significance threshold can exploit the results of the iterations with lower significance thresholds in order to avoid repeated calculation of same partial correlation in different iterations. An iteration of elastic PCalgorithm is shown in Algorithm 1 and the execution process is shown in Algorithm 2.
Algorithm 1. An iteration of elastic PCalgorithm
Input: signal samples
$\{x_{i}^{t}\t\in[1:T], i\in[1:N]\}$ , significance threshold$\alpha$ , previous significance threshold$\beta$ , previous scube${D}$ for$\beta$ Output: scube
${C}$ for$\alpha$ 1) for each ordered node pair
$(r_i, r_j)$ do2)
${C}(i, j, 0:(N2)) \gets \tilde{\rho}_{i, j}$ 3) end for
4) for
$k=1:(N2)$ do5) initialize the referenced skeletons
${S}_{\alpha}$ and${S}_{\beta}$ as complete undirected graphs6) for each unordered node pair
$(r_i, r_j)$ do7) if
${C}(i, j, k1) \leq N^{1}(1\frac{\alpha}{2})$ then8) delete the undirected edge
$(r_i, r_j)$ from the referenced skeleton${S}_{\alpha}$ 9) end if
10) if
${D}(i, j, k1) \leq N^{1}(1\frac{\beta}{2})$ then11) delete the undirected edge
$(r_i, r_j)$ from the referenced skeleton${S}_{\beta}$ 12) end if
13) end for
14)
${C}(:, :, k) \gets {D}(:, :, k)$ 15) for each ordered node pair
$(r_i, r_j)$ that satisfies$adj({S}_{\alpha}, r_i) \backslash \{r_j\} \geq k$ do16) for each controlling node set
$Z \subseteq adj({S}_{\alpha}, r_i) \backslash \{r_j\}$ that satisfies$Z=k$ and$Z \bigcup \{r_j\} \not\subseteq adj({S}_{\beta}, r_i)$ do17) if
${C}(i, j, k)>\tilde{\rho}_{i, j \cdot Z}$ then18)
${C}(i, j, k:(N2)) \gets \tilde{\rho}_{i, j \cdot Z}$ 19)
${C}(j, i, k:(N2)) \gets \tilde{\rho}_{i, j \cdot Z}$ 20) end if
21) end for
22) end for
23) end for
Algorithm 2. Elastic PCalgorithm (EPC)
Input: signal samples
$\{x_{i}^{t}\t\in[1:T], i\in[1:N]\}$ , step$\delta$ Output: scube
${C}$ 1) for each ordered node pair
$(r_i, r_j)$ do2)
${C}(i, j, 0:(N2)) \gets \tilde{\rho}_{i, j}$ 3) end for
4)
$\beta=0$ 5)
$\alpha=\beta+\delta$ 6) while remaining time budget
$ > 0$ do7)
${C} \gets$ Call Algorithm 1$\left(\{ x_{i}^{t} \}, \alpha, \beta, {C}\right)$ 8) end while
The elastic PCalgorithm is based on PCstable algorithm^{[36]}, which is a recent variant of PCalgorithm. The symbol
$N^{1}(\cdot)$ represents the inverse of the standard normal cumulative distribution function (CDF). The$z$ score of full correlation$\tilde{\rho}_{i, j}$ and the$z$ score of partial correlation$\tilde{\rho}_{i, j \cdot Z}$ can be calculated according to (4). This symbol$adj(G, r_i)$ indicates the neighbour set of node$r_i$ in graph$G$ .In the elastic PCalgorithm, we introduce a 3D matrix that contains the minimum partial correlation values with different steps. We call this matrix as scube for short. In the algorithm, there are two scubes
${C}$ and${D}$ , which are for the significance thresholds$\alpha$ and$\beta$ respectively. As there are$N$ nodes in the graph$G$ and$k$ should be smaller than$adj(G, r_i)\setminus \{r_j\}$ for any pair of nodes$(r_i, r_j)$ , the maximum value of$k$ is$N2$ . Thus,$k$ can vary from$0$ to$N2$ , which indicates the number of neighbours. The dimension of scube is$N\times N \times (N1)$ , where the$k$ th slice${C}(:, :, k)$ contains minimum partial correlation calculated in Step$k$ .The inputs of the elastic PCalgorithm include signal samples, significance threshold
$\alpha$ , previous significance threshold$\beta$ and its corresponding scube${D}$ . At beginning, the scube${C}$ is initialized using the full correlation between any node pairs. In the$k$ th iteration, the reference skeletons${S}_{\alpha}$ and${S}_{\beta}$ are derived using the$(k1)$ th slice of scube${C}$ and${D}$ respectively. A reference skeleton is used as an approximation of the skeleton of the graph for calculating neighbour sets of nodes. Afterwards, the$k$ th slice of${C}$ is calculated. For an ordered pair of$(r_i, r_j)$ that is adjacent in${S}_{\alpha}$ but not adjacent in${S}_{\beta}$ , the minimum partial correlation should be calculated as it is not included in${D}$ . For an ordered pair of$(r_i, r_j)$ that is adjacent in both${S}_{\alpha}$ and${S}_{\beta}$ , some steps of minimum partial correlation calculation have been done in${D}$ , which can be directly reused. That is, if a subset of neighbours$Z$ for$(r_i, r_j)$ in${S}_{\alpha}$ is also a subset of neighbours for$(r_i, r_j)$ in${S}_{\beta}$ , there is no need to calculate the partial correlation$\tilde{\rho}_{i, j \cdot Z}$ . In this way, computational effort can be saved by avoiding repeated calculations. The estimated$z$ score matrix$\tilde{\Omega}$ of minimum partial correlation for significance threshold$\alpha$ is${C}(j, i, N2)$ . As shown in Algorithm 2, Algorithm 1 is repeatedly executed with increasing significance threshold within the given time budget. An example of the elastic PCalgorithm is illustrated in Fig. 4.Figure 4. An example of the elastic PCalgorithm. (a) and (b) display the reference skeletons
${S}_{\alpha}$ and${S}_{\beta}$ in Algorithm 1, satisfying$\alpha>\beta$ . The table in (c) shows calculations required at Step 1 with the significance threshold$\alpha$ . The column "need calculation?" tells us whether the corresponding partial correlation needs to be calculated or can be directly obtained from${D}$ . For node pairs$(r_2, r_3)$ ,$(r_2, r_5)$ and$(r_3, r_4)$ , whose edges are in red in (a), all their partial correlations need to be calculated as${S}_{\beta}$ in (b) does not contain the edges between them. For the node pair$(r_1, r_2)$ , partial correlation on set$\{r_5\}$ needs to be calculated. This is because in${S}_{\beta}$ ,$\{r_5\}$ is not a subset of neighbours for$(r_1, r_2)$ .
2.1. Preliminaries
2.2. Minimum partial correlation
2.3. Elastic PCalgorithm

In this section, the elastic PCalgorithm was evaluated using a synthetic dataset: NetSim^{[6]}. Its application is illustrated using a restingstate fMRI dataset from the human connectome project^{[24]}. Results with the elastic PCalgorithm were compared with full correlation, fully partial correlation, regularized inverse covariance (ICOV)^{[16, 17]}, network deconvolution (ND)^{[37]} and global silencing (GS)^{[38]}. A brief description of these methods is as follows:
1) Full correlation (Full): The full correlation of two ROIs
$r_i$ and$r_j$ can be calculated using (1). 2) Fully partial correlation (FP): The fully partial correlation of two ROIs$r_i$ and$r_j$ can be calculated using (2) by setting$Z=V\backslash\{r_i, r_j\}$ , where$V$ is the set of all ROIs. 3) ICOV: Let${\boldsymbol{\mathrm{\Sigma }}}_{\mathrm{sample}}$ be the sample covariance matrix calculated according to (1), the ICOV method finds an optimal matrix$\mathrm{\Lambda }$ according to:$\mathrm{\Lambda }\mathrm{=}{\mathrm{argmin}}_{\mathrm{\Lambda }\mathrm{\succ }\mathrm{0}}\mathrm{tr}\left(\mathrm{\Lambda }{\mathrm{\Sigma }}_{\mathrm{sample}}\right)\mathrm{}{\mathrm{log} \mathrm{det}\mathrm{\Lambda }\ }\mathrm{+}\lambda {\left\\mathrm{\Lambda }\right\}_{{l }_{{1}}}$ , where${\left\\cdot \right\}_{{l }_{{1}}}$ is the elementwise${l}_{{1}}$ norm of elements in the matrix$\mathrm{\Lambda }$ . ICOV is a parameter dependent method and the value of the regularization parameter$\lambda $ greatly influences the results. For the simulation dataset, the regularization parameter$\lambda $ was set to 5 and 100, which were suggested by the designers of this dataset^{[6]}. For the real dataset, a crossvalidation method was used to tune the regularization parameter$\lambda $ as is described in [17], although we do not think this crossvalidation method is appropriate. 4) ND: Let${\mathrm{\Sigma }}_{\mathrm{sample}}$ be the sample covariance matrix calculated according to (1). In ND,${\mathrm{\Sigma }}_{\mathrm{sample}}$ is modelled as the sum of direct dependence$\mathrm{\Lambda }$ and indirect dependence due to indirect paths.$\mathrm{\Lambda }$ is derived from:$\mathrm{\Lambda }\mathrm{=}\frac{{\mathrm{\Sigma }}_{\mathrm{sample}}}{\mathrm{(}I\mathrm{+}{\mathrm{\Sigma }}_{\mathrm{sample}}\mathrm{)}}$ , where${I}$ is a identity (ID) matrix. 5) GS: In the GS method, the relationship between$\mathrm{\Lambda }$ and${\mathrm{\Sigma }}_{\mathrm{sample}}$ is approximately modelled as$\mathrm{\Lambda }\mathrm{=}\frac{{\mathrm{\Sigma }}_{\mathrm{sample}}\mathrm{}I\mathrm{+}\varphi \mathrm{((}{\mathrm{\Sigma }}_{\mathrm{sample}}\mathrm{}I\mathrm{)}{\mathrm{\Sigma }}_{\mathrm{sample}}\mathrm{)}}{{\mathrm{\Sigma }}_{\mathrm{sample}}}$ , where$\varphi \mathrm{(}\cdot\mathrm{)}$ sets the offdiagonal terms of matrix to zero. 
The NetSim dataset^{[6]} designed by Smith et al. was used to evaluate the accuracy of our method. Because this paper focuses on skeletons, we did not evaluate the direction inference. The BOLD signals were generated using DCM with nonlinear balloon model for the "vascular dynamics"^{[7]}. The dynamics of the signals in the neural network was defined as follows:
$\dot{s}\mathrm{=}\sigma \mathrm{\Phi }s\mathrm{+}\mathrm{\Psi}u$ , where$s$ is the neural time series,$\dot{s}$ is its rate of change,$u$ is the external inputs,$\mathrm{\Psi}$ is the weights controlling how the external inputs feed into the network,$\mathrm{\Phi}$ defines the network connections between nodes with the diagonal elements set to$$ 1,$\sigma $ controls both the withinnode temporal inertia and the neural lag between nodes. The neural signals were then fed into the nonlinear balloon model for vascular dynamics to generate the BOLD signals. The values of balloon model parameters were almost identical to the prior means used in [7].All network topologies were constructed based on a same building block: a 5node ring. The ring was a variation of a chain, the head of which is pointed to its tail. The networks with more than 5 nodes were built by linking several 5node rings together. 28 networks were designed for different purposes of evaluation. Please refer to [6] for the details of these networks. For each network, 50 simulated subjects were generated by slightly changing the values of parameters.
Following [6], we used csensitivity of skeleton inference to measure the accuracy. Given a directed graph
$G$ , its skeleton is an undirected graph by stripping away all arrowheads from its edges and removing redundant undirected edges. Suppose that each element of the symmetric matrix$\hat{A}$ represents the estimated "confidence" about whether there is a connection between the corresponding nodes. All elements of matrix$\hat{A}$ are assumed to be nonnegative. The distribution of "true positive" (TP) connections includes all the elements in the matrix$\hat{A}$ , the corresponding connections of which are found in the skeleton of graph$G$ . Similarly, the distribution of "false positive" (FP) connections includes all the elements in the matrix$\hat{A}$ , the corresponding connections of which do not exist in the skeleton of graph$G$ . The csensitivity is defined as the fraction of true positives that have higher value in the matrix$\hat{A}$ than the 95th percentile of the false positive distribution. The csensitivity shows the ability of the matrix$\hat{A}$ for differentiating true connections and spurious connections. If there is no overlap between the distributions of TP and FP, the csensitivity is 100%. It is worth noting that diagonal elements are ignored for calculating csensitivity. Fig. 5 shows an example of calculating csensitivity.Figure 5. An example of calculating csensitivity. (a) The true network that generates the observations. (b) The skeleton of the true network, which is constructed by stripping away all arrowheads from links of the true network and removing redundant undirected links. (c) The estimated matrix, each element of which represents the "confidence" about whether there is a link between the corresponding nodes. (d) True positive distribution and false positive distribution that are calculated using both the estimated matrix and the skeleton. For example, the element 0.16 in Row 1 and Column 2 belongs to the true positive distribution because Node 1 and Node 2 are linked together in the skeleton. The element 0.11 in Row 1 and Column 3 belongs to the false positive distribution since there is no connection between Node 1 and Node 3 in the skeleton. The 95th percentile of the false distribution is 0.15 that is marked using a red circle. Four elements in the true distribution are larger than 0.15, one element is smaller than 0.15. Thus, the csensitivity is 0.8.
We investigated the performance of different methods by estimating the resulting csensitivity values. The results are summarized in Table 1, where each value is the mean value of 50 simulations. ICOV5 and ICOV100 represent the ICOV method with the regularization parameter being set to 5 and 100, respectively. These settings can give good results of ICOV according to [6].
Table 1. Summary of csensitivity results over simulations and methods
The figures highlighted by yellow in Table 1 are the highest csensitivity scores in their corresponding simulations. Generally speaking, the elastic PCalgorithm (EPC) performed best. The elastic PCalgorithm gained highest csensitivity in 24 out of 28 simulations. Full correlation (Full) and ICOV100 only worked best for 1 simulation, while fully partial correlation (FP) achieved the maximal csensitivity values for 2 simulations. ICOV5 and network deconvolution (ND) gained the maximal values for 3 simulations, while global silencing (GS) worked best for 4 simulations. For the results obtained from elastic PCalgorithm, 19 simulations had csensitivity values larger than 85%. The results from all methods for the 11th simulation were less than 20%. This is because the synthetic signals for this simulation were generated in a way that ROIs signals greatly influence all others. Some values of csensitivities in Table 1 are slightly different from the previous results in ^{[6]}. These differences must be caused by minor variations of csensitivity in our experiment.
We further investigated the relationship between csensitivity and significance threshold for the elastic PCalgorithm. Fig. 6 shows the csensitivity improvements with increasing significance thresholds. Negative values in Fig. 6 mean the csensitivity decreased with higher significance thresholds. Theoretically, these cases should not happen, but fMRI data do not strictly follow faithfulness and Gaussian assumptions. Although the csensitivity improvements in Fig. 6 had some local fluctuations, the overall increasing trend can still be clearly observed. An obvious increase for a large fraction of networks occurred at the first two steps, which are from 0.05 to 0.10 and from 0.10 to 0.15.
Figure 6. Relationship between csensitivity improvements and significance thresholds in the elastic PCalgorithm
The elastic PCalgorithm can save computational effort by avoiding repeated calculations. Some partial correlations at a higher significance threshold can be directly obtained from the results at the previous significance threshold. To assess this, we investigated computational effort saved at each significance threshold by representing the saved computational effort as the proportion of times that the last condition in Line 16 of Algorithm 1 was not satisfied.
Fig. 7 shows the relationship between saved computational effort and significance thresholds for different simulations. It can be seen that the maximum proportion approached to 100%. For some simulations, the proportion values became 100% at some significance thresholds. It means that the elastic PCalgorithm returned stable results when significance threshold was above a certain value. Hence all results at higher threshold were directly obtained from previous results. The computational effort was 100% saved in this case.
Figure 7. Relationship between saved computational effort and significance thresholds in the elastic PCalgorithm
The proportion of saved efforts varied from 23% to 100% with the mean equal to 84.3%. When the significance threshold became higher than 0.15, 27 out of 28 networks had the proportion values larger than 50%. The largest increase of proportion value for most simulations occurred when significance threshold increased from 0.05 to 0.15. As has been pointed out, there was a significant increase of csensitivity in Fig. 6 occurring at this place. These observations demonstrate that the elastic PCalgorithm can save computational efforts while maintaining accuracy.

A real restingstate fMRI dataset from the human connectome project^{[24]} was used to illustrate the elastic PCalgorithm and compare it with other methods as we listed above. In contrast to the simulation dataset, there was no "ground truth" in the real dataset. Thus accuracy measures of the results, such as csensitivity, therefore cannot be calculated. Here we only presented and discussed the properties of the results.
The real dataset included restingstate fMRI data of 10 unrelated subjects. All were healthy (6 women, 4 men). There were two subjects in the 2225 age range, three subjects in the 2630 age range, and five subjects in the 3135 age range. For each subject, the functional data were acquired in four approximately15minute runs, which were carried out in two separate imaging sessions. Within each session, the oblique axial acquisition of one run was phase encoded in a righttoleft direction, it was alternated to phase encoding in a lefttoright direction for the other run. All functional images were acquired using a gradientecho echoplanar imaging (EPI) sequence with the following parameters: repetition time (TR) = 720 ms, echo time (TE) = 33.1 ms, flip angle = 52
$^\circ $ , FOV = 208$\times$ 180 mm, slice thickness = 2.0 mm, 72 slices, frames per run = 1 200, run duration = 14:33. The data was processed using FMRIB software library (FSL)^{[39]}, FreeSurfer^{[40]} and Connectome Workbench image analysis suites^{[41]}. The data was denoised using an independent component analysis based method^{[42, 43]}. The details of the pipelines are presented in [44].The automated anatomical labeling (AAL) atlas^{[45]} was used to parcel a whole brain into 116 ROIs, which are 90 regions in cerebrum and 26 regions in cerebellum. The time series of a ROI were defined to be the average time series of all voxels in this ROI. Although this paper only aims at estimating the connections between nodes, we have also faced the problem of parcelling a whole brain into nodes in this study. This is because that there is yet no widelyaccepted parcellation for functional connectivity inference^{[46, 47]}. Although it has been recommended to use datadriven approaches, such as highdimensional independent component analysis (ICA), rather than gross structural atlas based parcellation that may not finely correspond to real functional boundaries in the data^{[48]}, we chose the AAL atlas (which shows greater stability than other atlases tested^{[49]}), because of its simplicity and interpretability.
For the elastic PCalgorithm, the initial significance threshold was 0.05 and the step size was 0.05. Ten steps were executed for single subject analysis. For multiple subject analysis, to save computational time only three steps were executed. Thus the significance threshold ranged from 0.05 to 0.15. For ICOV, the regularisation parameter was set to 6.5, which was obtained from a crossvalidation method as is described in [17], although the validity of this method is still unclear.
Single subject analysis:
We first arbitrarily selected a subject to visually compare the elastic PCalgorithm with other methods. The results of each method formed a matrix, representing the functional connectivity skeleton. All matrices were normalized in the same way that all elements range from 0 to 1. An element with higher score in the matrix indicated a higher confidence on the existence of a connection. Fig. 8 gave us a straightforward overview of different results, the 10 images in the first two columns show results of elastic PCalgorithm over different significance thresholds, 5 images in the last column display results of 5 other methods. Each figure presents a matrix, the dimension of which is
$116 \times 116$ . The colour indicates the values of elements of these matrices, which are from 0 to 1. A figure with more white area means that there are more zero elements in the corresponding matrices.Figure 8. Single subject results of different methods. The
$x$ $y$ axes indicate the labels of the 116 ROIsMore elements in the results of the elastic PCalgorithm approached zero with an increasing significance threshold. The resulting matrix with the significance threshold of 0.50 was sparsest. There were many elements consistently large in all results of the elastic PCalgorithm, especially the elements adjacent to the diagonal, suggesting that neighbouring labelled ROIs in AAL atlas are more likely to have connections. Because two homotypic regions in the left and right hemispheres were assigned adjacent labels in the AAL atlas (e.g., lingual regions 47 and 48 in the left and right hemispheres), this observation is consistent with the observation that two homotypic regions are more likely to highly correlate with each other.
Compared with other methods including full correlation, fully partial correlation, ICOV with regularisation parameter of 6.5, ND and GS, results from elastic PCalgorithm with significance thresholds larger than 0.10 were sparser than results from all other methods. The ND method performed poorly in regards of sparsity, while GS method generates results with patterns different from other methods that appears unreliable. Fully partial correlation and ICOV methods returned results with similar patterns to the elastic PCalgorithm but less sparse. All these observations suggest that results from the elastic PCalgorithm may be more plausible and reliable than results from other methods.
The relationship between the saved computational efforts and significance thresholds was studied. The computational effort saved at each significance threshold was represented as the proportion of times that the last condition in Line 16 of Algorithm 1 was not satisfied. The proportion of saved computational effort varies from 9.2% to 72%. When the significance threshold increased from 0.05 to 0.1, only 9.2% computational effort was saved. It means that the reference skeleton of threshold 0.1 changed greatly from the reference skeleton of threshold 0.05. When the significance threshold further increased above 0.10, the proportion of saved effort was always maintained at a level larger than 50%. Thus, the results of the elastic PCalgorithm reached a relatively stable state, where the number of recalculations is limited.
The patterns of the results with the elastic PCalgorithm with different significance thresholds were similar, but became sparser with higher significance threshold (Fig. 8). Moreover, the changes between results with two adjacent significance thresholds were calculated. The amounts of changes between two neighbouring significance thresholds were measured using the root mean square difference of elements. The changes at threshold 0.10 was as high as 0.17. When the significance threshold increased above 0.20, the changes remained at a constant level lower than 0.02. It means that the results became relatively stable with significance threshold larger than 0.20. This is consistent with the observation that the proportion of saved computational effort maintained high when the significance threshold was larger than 0.2.
Multisubject analysis:
After the single subject analysis, we applied different methods to infer functional connectivity for 10 subjects. Similar to single subject analysis, functional connectivity was fractionally represented in matrices, whose elements were normalized between 0 and 1. The matrix for each subject was inferred using different methods, then the average matrix was obtained from each method across 10 subjects. This generated 6 matrices, one for each of the 6 different methods including full correlation (Full), fully partial correlation (FP), ICOV, network deconvolution (ND), global silencing (GS) and the elastic PCalgorithm (EPC). To compare these 6 different results, we selected the top 20 largest elements of each matrix and use their corresponding connections to form a functional network for each method, which was used to estimate the 20 most confident connections. There were 40 connections selected by at least one method. Many connections were detected by at least 2 different methods.
Table 2 shows the details of these connections. There were three categories, classified according to their patterns of connections between pairs of: 1) homotypic regions in the left and right hemispheres (Type Ⅰ connection), 2) the heterotypic regions in the same hemisphere (Type Ⅱ connection), 3) heterotypic regions in different hemispheres (Type Ⅲ connection). If a connection was found by a certain method, then the corresponding value in the table was set to 1, otherwise it was set to 0. There were 16 Type Ⅰ connections, 22 Type Ⅱ connections and 2 Type Ⅲ connections. Connections discovered by the elastic PCalgorithm were similar with those detected by fully partial correlation and ICOV. However, the results of full correlation were very different from those of other methods, consistent with the expectation that full correlation method detects indirect, as well as direct, effects.
Table 2. Top 20 connections detected by different methods
There were 6 Type Ⅰ connections that were coherently discovered by all 6 different methods. These were found in the following hemispherically homotypic ROIs: Calcarine, Cuneus, Occipital_Mid, Postcentral, Supramarginal and Precuneus. Compared to the other types, the fraction of Type Ⅰ connections that were consistently detected by different methods was relatively high. Only 1 out of 22 Type Ⅱ connections was discovered by all 6 methods (the connection between precentral and postcentral gyri in the right hemisphere). Ten Type Ⅱ connections were found by one method.
There were only two Type Ⅲ connections (between left calcarine and right lingual regions, and right cuneus and left superior occipital regions), and these were identified only by the full correlation method. As is shown in Fig. 9, it was quite likely that these Type Ⅲ connections were enhanced by indirect effects.
Figure 9. Connections detected by various methods between the following ROIs: calcarine, lingual, cuneus and superior occipital regions
We also investigated the distribution of top 1% connections that were found by the elastic PCalgorithm. Totally, 67 connections were selected with the corresponding elements in the matrix ranging from 0.20 to 0.75. These connections were plotted using BrainNet Viewer^{[50]} (Fig. 10). There were 26 Type Ⅰ connections, 40 Type Ⅱ connections and only 1 Type Ⅲ connection. It is interesting to observe that we only obtained a quite limited number of connections between two heterotypic regions in different hemispheres. It suggests that in resting states, the functional connectivity might be composed primarily of Type Ⅰ and Type Ⅱ connections.
Figure 10. Top 1% connections selected by the elastic PCalgorithm. The nodes outside the cerebrum are in cerebellum. The figure is generated by BrainNet Viewer^{[50]} to show all sides of the brain, including the lateral and medial sides of each hemisphere, the dorsal and ventral sides, and the anterior and posterior sides of the brain.