Complex Data Analysis: Large Scale Data Fusion and Linkage Studies



The need to perform epidemiological analyses with dat adrawn from ultiple popopulation is important, particlar when investigating treatment effects in small subgroups or rare outcomes which require very large populations to provide sufficiently prciase estimates. No singlepublic or provide United Sates cover on a broad scale, and often invovles pooling data from multipe sources. With large databses, She weaves statistical methods and computational inference by linking and combining publicly available and alternative data from different sources for the same overlapping population to build a complete picture of health.


Knowledge Discovery with Text

Features and patterns FROM Documents and Messages


By 2025, 80% of data worldwide is projected to be unstructured. Of particular interest is research in mining dark data generated by healthcare facilities and the biomedical literature, each of which pose unique challenges and opportunities.

Mainly, text data generated in different settings can lead to different terms describing the same information. Conversely, the same term may be used to describe disparate information (See: Analysis of Ovarian Cancer Survivorship and Prediagnosis Symptoms via Text Mining). In addition, typically large amount of text documents are available but only a small portion are annotated by experts or linked to patient outcomes. (See: Crowdsourcing Awareness [Knowledge Gap Analysis])

My work bridges statistical and symbolic methodologies in natural language processing (NLP) for investigating the behavioral and social determinants of health (See: American Public’s Perceptions of Antibiotic Use) and extract risk signals from FDA adverse event databases for pharmacovilagence studies. My intent is to leverage text scaling (See: Assessing Mental Models from Communications - Coming Soon!) and computational linguistics to quantify associations between diseases and symptoms related to cardiotoxicities and B-cell malignant cancers.


Epidemiological Crowdsourcing: Web Survey Design with Embedded Quality Control



“Crowdsourcing” is the act of allocating work traditionally performed by employees or contractors to an anonymous collection of individuals in the form of an open call. Organized via online platforms such as Crowdflower and Amazon Mechanical Turk, these crowds have incredible capacity for tackling diverse and complex problems at scale.

Crowdsourcing is a powerful and efficient model, which I leveraged to answer questions about complex public health issues such as ovarian cancer survivorship (See: Crowdsourcing Awareness), perceptions of antibiotic use (See: American Public’s Perceptions of Antibiotic Use and Knowledge about Antibiotic Resistance), and access to health information (See: Evaluation of a Novel Conjunctive Exploratory Navigation Interface for Consumer Health Information).

Unique implementation challenges come with engaging the crowd. Overlooking or poorly addressing these challenges can detrimentally impact the quality of outcomes and experiences for everyone involved.


Early Career: Behavioral Research in Addiction Psychiatry



Helping others may support maintenance of long-term sobriety. (See: “10 Year Course of AA Participation”)

Altruism may decrease some of the psychological markers of addiction (i.e. narcissism and entitlement) which can make teenagers more inclined to addictive behaviors and less likely to enter a treatment program.

In my early work (See: “Addiction and Generation Me”), I investigated the link between self-centered/narcissistic behavior and addiction. Specifically, I explored how volunteering and other-oriented/altruistic actions could serve as a potential counter to self-centered behavior in adolescents with substance dependency disorder.