Hi, I'm Jianna.

I am currently a Master of Science in Data Science student at the University of Washington. My journey started at UC Santa Barbara, where I double-majored in Psychological and Brain Sciences and Statistics and Data Science.

I am passionate about bridging the gap between technical complexity and human understanding. I specialize in transforming messy, high-dimensional data into meaningful stories and interpretable results that drive social impact and better decision-making.

Programming Languages

  • Python
  • R
  • SQL

💼 Professional Experience

Algorithm Developer Intern

Applied Materials • Santa Clara, CA

June 2024 — Sept 2024
  • Developed a Computer Vision solution using OpenCV to classify normal vs. defective wafer dies.
  • Achieved 94% accuracy with a Deep Learning model (MobileNet) trained on 40,000 images, outperforming Random Forest and SVM baselines.
  • Engineered a Tkinter-based GUI to automate die extraction and generate color-coded defect maps for intuitive interpretation.
  • Collaborated with an international R&D team to integrate tools into existing production workflows.

Research Assistant

Bionic Vision Lab • Santa Barbara, CA

Sept 2023 — June 2025
  • Compared scene description ratings across BERT, SBERT, and ChatGPT to assess AI alignment with human judgment.
  • Applied Image Processing and ML methods to research on simulated vision and degenerative eye diseases.
  • Designed and conducted eye-tracking studies and a spatial navigation VR task built in Unity.
  • Cleaned and preprocessed textual data in Python, analyzing error counts across varying viewing conditions.

Technical Projects

Cooking Helper!

Python • API Integration • CI/CD • PlotlyDash

View Repository
Project Overview

Developed an end-to-end grocery planning tool designed to reduce barriers to home cooking for individuals in food desert regions. The system enables users to select recipes from a database of 500k+ records and automatically generates a store-specific grocery list with real-time pricing and availability.

Technical Implementation
  • Data Pipeline: Automated ingestion and normalization of large-scale Kaggle recipe datasets.
  • API Integration: Real-time retail data fetching via the Kroger Development API.
  • Environment: Managed dependencies and reproducibility using Conda.
Engineering Excellence
  • CI/CD: Implemented automated build and test workflows via GitHub Actions.
  • Quality Assurance: Maintained high code reliability with Coveralls for coverage tracking.
  • Visualization: Integrated spatial food access data via interactive map components.

U.S. Hospital Satisfaction (2016–2020)

Interactive Data Storytelling • Tableau • Tableau Prep

View Interactive Dashboard
Executive Summary

The United States is a melting pot of environments; however, the need for quality medical care is universal. This project assesses satisfaction rates across the country to inform communities of how their local hospitals compare to nationwide standards. Using a top-down geographical approach, we visualized data from over 4,300 unique hospitals across 53 states and territories.

The Design Process

Data Engineering: Merged five years of HCAHPS datasets (1.6M+ records) in Python. We pivoted the data from wide to tall format to standardize satisfaction indicators like Nurse Communication and Cleanliness.

Geocoding: Leveraged the Google Maps API to map exact hospital coordinates, ensuring high-fidelity spatial accuracy in the final visualization.

Key Findings & Testing

Insights: Identified that while clinical communication is generally high (3.41 stars), environmental factors like Quietness (2.97 stars) remain significant pain points for patients.

Usability: Conducted evaluations with healthcare professionals (nurses, pharmacists) to refine the "drill-down" navigation from state to county to specific facility.

Big Data Analysis: U.S. Voter Turnout

Case Study

PySpark • Databricks • Distributed Computing

UCSB • June 2025

Leveraged Databricks and PySpark to process millions of records, analyzing how household demographics affect civic engagement. I built a scalable pipeline to categorize voter segments and visualized geographic patterns through custom choropleth maps.

Technical Highlights
  • Scalable Processing: Implemented custom PySpark UDFs for demographic segmentation of large-scale datasets.
  • Modeling: Compared Logistic Regression and Random Forest variants in a distributed computing environment.
Key Discovery

"Analysis revealed that homeownership is a primary driver of turnout, while single-person households showed the lowest probability of voting across all segments."


Technical Toolkit

💻

Languages & Frameworks

  • Python: Pandas, NumPy, Scikit-learn, PyTorch, OpenCV
  • R & SQL: Tidyverse, ggplot2, SparkSQL
  • Other: MATLAB, SAS
☁️

Cloud & Big Data

  • Platforms: Databricks, AzureML, Google Cloud
  • Processing: PySpark, SparkSQL
  • Data Mining: RapidMiner, OpenRefine
🚀

Dashboards & Apps

  • BI Tools: Tableau, Excel (MS Office)
  • App Frameworks: Streamlit, Dash, Tkinter
  • Reporting: Technical Writing, Storytelling
🧠

Strategic & Soft Skills

  • Data Storytelling
  • Collaboration
  • Analytical Thinking
  • Problem Solving
  • Adaptibility