Software

Github

R packages can be installed directly from github via devtools::install_github("dksewell/[repo]")

Network-Informed Constrained Divisive Pooled Testing Assignments

Frequent universal testing in a finite population is an effective approach to preventing large infectious disease outbreaks. Yet when the target group has many constituents, this strategy can be cost prohibitive. One approach to alleviate the resource burden is to group multiple individual tests into one unit in order to determine if further tests at the individual level are necessary. This approach, referred to as a group testing or pooled testing, has received much attention in finding the minimum cost pooling strategy. Existing approaches, however, assume either independence or very simple dependence structures between individuals. This assumption ignores the fact that in the context of infectious diseases there is an underlying transmission network that connects individuals. We develop a constrained divisive hierarchical clustering algorithm that assigns individuals to pools based on the contact patterns between individuals. In a simulation study based on real networks, we show the benefits of using our proposed approach compared to random assignments even when the network is imperfectly measured and there is a high degree of missingness in the data.

Simulation-free estimation of an individual-based SEIR

The ongoing COVID-19 pandemic has overwhelmingly demonstrated the need to accurately evaluate the effects of implementing new or altering existing nonpharmaceutical interventions. Since these interventions applied at the societal level cannot be evaluated through traditional experimental means, public health officials and other decision makers must rely on statistical and mathematical epidemiological models. Nonpharmaceutical interventions are typically focused on contacts between members of a population, and yet most epidemiological models rely on homogeneous mixing which has repeatedly been shown to be an unrealistic representation of contact patterns. An alternative approach is individual based models (IBMs), but these are often time intensive and computationally expensive to implement, requiring a high degree of expertise and computational resources. More often, decision makers need to know the effects of potential public policy decisions in a very short time window using limited resources. This paper presents a computation algorithm for an IBM designed to evaluate nonpharmaceutical interventions. By utilizing recursive relationships, our method can quickly compute the expected epidemiological outcomes even for large populations based on any arbitrary contact network.

Importance sampling for NAM

This code implements an importance sampler for the network autocorrelation model, as described in:

Li and Sewell (2021). A comparison of estimators for the network autocorrelation model based on observed social networks 66, p202-210.

Linear autocorrelation models for egocentric data

Network autocorrelation models have been widely used for decades to model the joint distribution of the attributes of a network's actors. This class of models can estimate both the effect of individual characteristics as well as the network effect, or social influence, on some actor attribute of interest. Collecting data on the entire network, however, is very often infeasible or impossible if the network boundary is unknown or difficult to define. Obtaining egocentric network data overcomes these obstacles, but as of yet there has been no clear way to model this type of data and still appropriately capture the network effect on the actor attributes in a way that is compatible with a joint distribution on the full network data. This paper adapts the class of network autocorrelation models to handle egocentric data. The proposed methods thus incorporate the complex dependence structure of the data induced by the network rather than simply using ad hoc measures of the egos' networks to model the mean structure, and can estimate the network effect on the actor attribute of interest. The vast quantities of unknown information about the network can be succinctly represented in such a way that only depends on the number of alters in the egocentric network data and not on the total number of actors in the network. Estimation is done within a Bayesian framework.

STAR: Simultaneous and temporal autoregressive network models

While logistic regression models are easily accessible to researchers, when applied to network data there are unrealistic assumptions made about the dependence structure of the data. For temporal networks measured in discrete time, recent work has made good advances (Almquist and Butts, 2014), but there is still the assumption that the dyads are conditionally independent given the edge histories. This assumption can be quite strong and is sometimes difficult to justify. If time steps are rather large, one would typically expect not only the existence of temporal dependencies among the dyads across observed time points but also the existence of simultaneous dependencies affecting how the dyads of the network co-evolve. We propose a general observation driven model for dynamic networks which overcomes this problem by modeling both the mean and the covariance structures as functions of the edge histories using a flexible autoregressive approach. This approach can be shown to fit into a generalized linear mixed model framework. We propose a visualization method which provides evidence concerning the existence of simultaneous dependence.

Latent space models for ranked dynamic networks

The formation of social networks and the evolution of their structures have been of interest to researchers for many decades. We wish to answer questions about network stability, group formation and popularity effects. We propose a latent space model for ranked dynamic networks that can be used to intuitively frame and answer these questions.

Latent space models for network perceptions

Social networks wherein the edges represent non-behavioral relations such as friendship, power, and influence, can be difficult to measure and model. A powerful tool to address this is cognitive social structures (Krackhardt, 1987), where the perception of the entire network is elicited from each actor. We provide a formal statistical framework in which to analyze informants' reports on the network, leveraging information across actors while accounting for the ways in which individual perceptions may vary. We implement a latent space network model directly on the CSS data, thus estimating, e.g., homophilic effects while accounting for informant error. Additionally, the proposed method provides a visualization method, an estimate of the informants' biases and variances, and we describe a method for sidestepping forced choice designs.

Model-based edge clustering

Relational data can be studied using network analytic techniques which define the network as a set of actors and a set of edges connecting these actors. One important facet of network analysis that receives significant attention is community detection. However, while most community detection algorithms focus on clustering the actors of the network, it is very intuitive to cluster the edges. Connections exist because they were formed within some latent environment such as, in the case of a social network, a workplace or religious group, and hence by clustering the edges of a network we may gain some insight into these latent environments. We propose a model-based approach to clustering the edges of a network using a latent space model describing the features of both actors and latent environments. We derive a generalized EM algorithm for estimation and gradient-based Monte Carlo algorithms, and we demonstrate that the computational cost grows linearly in the number of actors for sparse networks rather than quadratically.

Visualizing data through curvilinear representations of matrices

Most high dimensional data visualization techniques embed or project the data onto a low dimensional space which is then used for viewing. Results are thus limited by how much of the information in the data can be conveyed in two or three dimensions. We describe a lossless functional representation of any real matrix that can capture key features of the data, such as distances and correlations. Our approach can be used to visualize both subjects and variables as curves, allowing one to see patterns of subjects, patterns of variables, and how the subject and variable pat- terns relate to one another. We provide a theoretical justification for our approach and illustrate various facets of the method’s usefulness on both synthetic and real data sets.

Latent space models for dynamic networks

Dynamic networks are used in a variety of fields to represent the structure and evolution of the relationships between entities. We present a model which embeds longitudinal network data as trajectories in a latent Euclidean space. A Markov chain Monte Carlo algorithm is proposed to estimate the model parameters and latent positions of the actors in the network. The model yields meaningful visualization of dynamic networks, giving the researcher insight into the evolution and the structure, both local and global, of the network. The model handles directed or undirected edges, easily handles missing edges, and lends itself well to predicting future edges. Further, a novel approach is given to detect and visualize an attracting influence between actors using only the edge information. We use the case-control likelihood approximation to speed up the estimation algorithm, modifying it slightly to account for missing data.

Model-based longitudinal clustering

It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible. We propose a model-based clustering method for clustering objects that are observed over time. The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data. While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs. Further, it is important to detect how explanatory variables affect the clustering. An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities.

Dynamic network clustering

Embedding dyadic data into a latent space has long been a popular approach to modeling networks of all kinds. While clustering has been done using this approach for static networks, this paper gives two methods of community detection within dynamic network data, building upon the distance and projection models previously proposed in the literature. Our proposed approaches capture the time-varying aspect of the data, can model directed or undirected edges, inherently incorporate transitivity and account for each actor’s individual propensity to form edges. We provide Bayesian estimation algorithms, and apply these methods to a ranked dynamic friendship network and world export/import data.

Note: This is an old package with dependencies that no longer function in the same way as when I first wrote this package. My apologies, but I do not currently have the capacity to fully diagnose and correct any issues.

GPAQ Calibration

This code will recalibrate the self-reported global physical activity questionnaire (GPAQ) to more accurately resemble ground truth as measured via accelerometers.