Github

R packages can be installed directly from github via devtools::install_github("dksewell/[repo]")

DARE

Dose Accrual Rate Estimation

Estimating risk factors for the incidence of a disease is crucial for understanding its etiology. For diseases caused by enteric pathogens, off-the-shelf statistical model-based approaches do not consider the biological mechanisms through which infection occurs and thus can only be used to make comparatively weak statements about the association between risk factors and incidence. Building off of established work in quantitative microbiological risk assessment, we propose a new approach to determining the association between risk factors and dose accrual rates. Our more mechanistic approach achieves a higher degree of biological plausibility, incorporates currently ignored sources of variability, and provides regression parameters that are easily interpretable as the dose accrual rate ratio due to changes in the risk factors under study. We also describe a method for leveraging information across multiple pathogens. The proposed methods are available as an R package at https://github.com/dksewell/dare. Our simulation study shows unacceptable coverage rates from generalized linear models, while the proposed approach empirically maintains the nominal rate even when the model is misspecified. Finally, we demonstrated our proposed approach by applying our method to infant data obtained through the PATHOME study (https://reporter.nih.gov/project-details/10227256), discovering the impact of various environmental factors on infant enteric infections.

Network-Informed Constrained Divisive Pooled Testing Assignments

Frequent universal testing in a finite population is an effective approach to preventing large infectious disease outbreaks. Yet when the target group has many constituents, this strategy can be cost prohibitive. One approach to alleviate the resource burden is to group multiple individual tests into one unit in order to determine if further tests at the individual level are necessary. This approach, referred to as a group testing or pooled testing, has received much attention in finding the minimum cost pooling strategy. Existing approaches, however, assume either independence or very simple dependence structures between individuals. This assumption ignores the fact that in the context of infectious diseases there is an underlying transmission network that connects individuals. We develop a constrained divisive hierarchical clustering algorithm that assigns individuals to pools based on the contact patterns between individuals. In a simulation study based on real networks, we show the benefits of using our proposed approach compared to random assignments even when the network is imperfectly measured and there is a high degree of missingness in the data.

Simulation-free estimation of an individual-based SEIR

The ongoing COVID-19 pandemic has overwhelmingly demonstrated the need to accurately evaluate the effects of implementing new or altering existing nonpharmaceutical interventions. Since these interventions applied at the societal level cannot be evaluated through traditional experimental means, public health officials and other decision makers must rely on statistical and mathematical epidemiological models. Nonpharmaceutical interventions are typically focused on contacts between members of a population, and yet most epidemiological models rely on homogeneous mixing which has repeatedly been shown to be an unrealistic representation of contact patterns. An alternative approach is individual based models (IBMs), but these are often time intensive and computationally expensive to implement, requiring a high degree of expertise and computational resources. More often, decision makers need to know the effects of potential public policy decisions in a very short time window using limited resources. This paper presents a computation algorithm for an IBM designed to evaluate nonpharmaceutical interventions. By utilizing recursive relationships, our method can quickly compute the expected epidemiological outcomes even for large populations based on any arbitrary contact network.

Importance sampling for NAM

This code implements an importance sampler for the network autocorrelation model, as described in:

Li and Sewell (2021).  A comparison of estimators for the network autocorrelation model based on observed social networks 66, p202-210.

Linear autocorrelation models for egocentric data

Network autocorrelation models have been widely used for decades to model the joint distribution of the attributes of a network's actors.  This class of models can estimate both the effect of individual characteristics as well as the network effect, or social influence, on some actor attribute of interest.  Collecting data on the entire network, however, is very often infeasible or impossible if the network boundary is unknown or difficult to define.  Obtaining egocentric network data overcomes these obstacles, but as of yet there has been no clear way to model this type of data and still appropriately capture the network effect on the actor attributes in a way that is compatible with a joint distribution on the full network data.  This paper adapts the class of network autocorrelation models to handle egocentric data.  The proposed methods thus incorporate the complex dependence structure of the data induced by the network rather than simply using ad hoc measures of the egos' networks to model the mean structure, and can estimate the network effect on the actor attribute of interest.  The vast quantities of unknown information about the network can be succinctly represented in such a way that only depends on the number of alters in the egocentric network data and not on the total number of actors in the network.  Estimation is done within a Bayesian framework.

STAR: Simultaneous and temporal autoregressive network models

While logistic regression models are easily accessible to researchers, when applied to network data there are unrealistic assumptions made about the dependence structure of the data.  For temporal networks measured in discrete time, recent work has made good advances (Almquist and Butts, 2014), but there is still the assumption that the dyads are conditionally independent given the edge histories.  This assumption can be quite strong and is sometimes difficult to justify. If time steps are rather large, one would typically expect not only the existence of temporal dependencies among the dyads across observed time points but also the existence of simultaneous dependencies affecting how the dyads of the network co-evolve.  We propose a general observation driven model for dynamic networks which overcomes this problem by modeling both the mean and the covariance structures as functions of the edge histories using a flexible autoregressive approach.  This approach can be shown to fit into a generalized linear mixed model framework.  We propose a visualization method which provides evidence concerning the existence of simultaneous dependence. 

Latent space models for ranked dynamic networks

The formation of social networks and the evolution of their structures have been of interest to researchers for many decades.  We wish to answer questions about network stability, group formation and popularity effects.  We propose a latent space model for ranked dynamic networks that can be used to intuitively frame and answer these questions.

Latent space models for network perceptions

Social networks wherein the edges represent non-behavioral relations such as friendship, power, and influence, can be difficult to measure and model.  A powerful tool to address this is cognitive social structures (Krackhardt, 1987), where the perception of the entire network is elicited from each actor.  We provide a formal statistical framework in which to analyze informants' reports on the network, leveraging information across actors while accounting for the ways in which individual perceptions may vary.  We implement a latent space network model directly on the CSS data, thus estimating, e.g., homophilic effects while accounting for informant error.  Additionally, the proposed method provides a visualization method, an estimate of the informants' biases and variances, and we describe a method for sidestepping forced choice designs.

SUBSET

Subspace Shrinkage via Exponential Tilting

It is common to hold prior beliefs that are not characterized by points in the parameter space but instead are relational in nature and can be described by a linear subspace. While some previous work has been done to account for such prior beliefs, the focus has primarily been on point estimators within a regression framework. We argue, however, that prior beliefs about parameters ought to be encoded into the prior distribution rather than in the formation of a point estimator. In this way, the prior beliefs help shape all inference. Through exponential tilting, we propose a fully generalizable method of taking existing prior information from, e.g., a pilot study, and combining it with additional prior beliefs represented by parameters lying on a linear subspace. We provide computationally efficient algorithms for posterior inference that, once inference is made using a non-tilted prior, does not depend on the sample size. We illustrate our proposed approach on an antihypertensive clinical trial dataset where we shrink towards a power law dose-response relationship, and on monthly influenza and pneumonia data where we shrink moving average lag parameters towards smoothness.

WECAN

Clustering is a fundamental task in network analysis, essential for uncovering hidden structures within complex systems. Edge clustering, which focuses on relationships between nodes rather than the nodes themselves, has gained increased attention in recent years. However, existing edge clustering algorithms often overlook the significance of edge weights, which can represent the strength or capacity of connections, and fail to account for noisy edges—connections that obscure the true structure of the network. To address these challenges, the Weighted Edge Clustering Adjusting for Noise (WECAN) model is introduced. This novel algorithm integrates edge weights into the clustering process and includes a noise component that filters out spurious edges. WECAN offers a data-driven approach to distinguishing between meaningful and noisy edges, avoiding the arbitrary thresholding commonly used in network analysis. Its effectiveness is demonstrated through simulation studies and applications to real-world datasets, showing significant improvements over traditional clustering methods.

Model-based edge clustering

Relational data can be studied using network analytic techniques which define the network as a set of actors and a set of edges connecting these actors. One important facet of network analysis that receives significant attention is community detection. However, while most community detection algorithms focus on clustering the actors of the network, it is very intuitive to cluster the edges. Connections exist because they were formed within some latent environment such as, in the case of a social network, a workplace or religious group, and hence by clustering the edges of a network we may gain some insight into these latent environments. We propose a model-based approach to clustering the edges of a network using a latent space model describing the features of both actors and latent environments. We derive a generalized EM algorithm for estimation and gradient-based Monte Carlo algorithms, and we demonstrate that the computational cost grows linearly in the number of actors for sparse networks rather than quadratically. 

Visualizing data through curvilinear representations of matrices

Most high dimensional data visualization techniques embed or project the data onto a low dimensional space which is then used for viewing. Results are thus limited by how much of the information in the data can be conveyed in two or three dimensions. We describe a lossless functional representation of any real matrix that can capture key features of the data, such as distances and correlations. Our approach can be used to visualize both subjects and variables as curves, allowing one to see patterns of subjects, patterns of variables, and how the subject and variable pat- terns relate to one another. We provide a theoretical justification for our approach and illustrate various facets of the method’s usefulness on both synthetic and real data sets. 

Latent space models for dynamic networks

Dynamic networks are used in a variety of fields to represent the structure and evolution of the relationships between entities.  We present a model which embeds longitudinal network data as trajectories in a latent Euclidean space.  A Markov chain Monte Carlo algorithm is proposed to estimate the model parameters and latent positions of the actors in the network.  The model yields meaningful visualization of dynamic networks, giving the researcher insight into the evolution and the structure, both local and global, of the network.  The model handles directed or undirected edges, easily handles missing edges, and lends itself well to predicting future edges.  Further, a novel approach is given to detect and visualize an attracting influence between actors using only the edge information. We use the case-control likelihood approximation to speed up the estimation algorithm, modifying it slightly to account for missing data.

Model-based longitudinal clustering

It is often of interest to perform clustering on longitudinal data, yet it is difficult to formulate an intuitive model for which estimation is computationally feasible.  We propose a model-based clustering method for clustering objects that are observed over time.  The proposed model can be viewed as an extension of the normal mixture model for clustering to longitudinal data.  While existing models only account for clustering effects, we propose modeling the distribution of the observed values of each object as a blending of a cluster effect and an individual effect, hence also giving an estimate of how much the behavior of an object is determined by the cluster to which it belongs.  Further, it is important to detect how explanatory variables affect the clustering.  An advantage of our method is that it can handle multiple explanatory variables of any type through a linear modeling of the cluster transition probabilities. 

Dynamic network clustering

Embedding dyadic data into a latent space has long been a popular approach to modeling networks of all kinds. While clustering has been done using this approach for static networks, this paper gives two methods of community detection within dynamic network data, building upon the distance and projection models previously proposed in the literature. Our proposed approaches capture the time-varying aspect of the data, can model directed or undirected edges, inherently incorporate transitivity and account for each actor’s individual propensity to form edges. We provide Bayesian estimation algorithms, and apply these methods to a ranked dynamic friendship network and world export/import data.

 

Note: This is an old package with dependencies that no longer function in the same way as when I first wrote this package.  My apologies, but I do not currently have the capacity to fully diagnose and correct any issues.

GPAQ Calibration

This code will recalibrate the self-reported global physical activity questionnaire (GPAQ) to more accurately resemble ground truth as measured via accelerometers.