Jianna Wong | Data Science Portfolio

Hi, I'm Jianna.

I am currently a Master of Science in Data Science student at the University of Washington. My journey started at UC Santa Barbara, where I double-majored in Psychological and Brain Sciences and Statistics and Data Science.

I am passionate about bridging the gap between technical complexity and human understanding. I specialize in transforming messy, high-dimensional data into meaningful stories and interpretable results that drive social impact and better decision-making.

Programming Languages

Python
R
SQL

💼 Professional Experience

Algorithm Developer Intern

Applied Materials • Santa Clara, CA

June 2024 — Sept 2024

Developed a Computer Vision solution using OpenCV to classify normal vs. defective wafer dies.
Achieved 94% accuracy with a Deep Learning model (MobileNet) trained on 40,000 images, outperforming Random Forest and SVM baselines.
Engineered a Tkinter-based GUI to automate die extraction and generate color-coded defect maps for intuitive interpretation.
Collaborated with an international R&D team to integrate tools into existing production workflows.

Research Assistant

Bionic Vision Lab • Santa Barbara, CA

Sept 2023 — June 2025

Compared scene description ratings across BERT, SBERT, and ChatGPT to assess AI alignment with human judgment.
Applied Image Processing and ML methods to research on simulated vision and degenerative eye diseases.
Designed and conducted eye-tracking studies and a spatial navigation VR task built in Unity.
Cleaned and preprocessed textual data in Python, analyzing error counts across varying viewing conditions.

Technical Projects

Cooking Helper!

Python • API Integration • CI/CD • PlotlyDash

View Repository

Project Overview

Developed an end-to-end grocery planning tool designed to reduce barriers to home cooking for individuals in food desert regions. The system enables users to select recipes from a database of 500k+ records and automatically generates a store-specific grocery list with real-time pricing and availability.

Technical Implementation

Data Pipeline: Automated ingestion and normalization of large-scale Kaggle recipe datasets.
API Integration: Real-time retail data fetching via the Kroger Development API.
Environment: Managed dependencies and reproducibility using Conda.

Engineering Excellence

CI/CD: Implemented automated build and test workflows via GitHub Actions.
Quality Assurance: Maintained high code reliability with Coveralls for coverage tracking.
Visualization: Integrated spatial food access data via interactive map components.

U.S. Hospital Satisfaction (2016–2020)

Interactive Data Storytelling • Tableau • Tableau Prep

View Interactive Dashboard

Executive Summary

The United States is a melting pot of environments; however, the need for quality medical care is universal. This project assesses satisfaction rates across the country to inform communities of how their local hospitals compare to nationwide standards. Using a top-down geographical approach, we visualized data from over 4,300 unique hospitals across 53 states and territories.

The Design Process

Data Engineering: Merged five years of HCAHPS datasets (1.6M+ records) in Python. We pivoted the data from wide to tall format to standardize satisfaction indicators like Nurse Communication and Cleanliness.

Geocoding: Leveraged the Google Maps API to map exact hospital coordinates, ensuring high-fidelity spatial accuracy in the final visualization.

Key Findings & Testing

Insights: Identified that while clinical communication is generally high (3.41 stars), environmental factors like Quietness (2.97 stars) remain significant pain points for patients.

Usability: Conducted evaluations with healthcare professionals (nurses, pharmacists) to refine the "drill-down" navigation from state to county to specific facility.

Big Data Analysis: U.S. Voter Turnout

Case Study

PySpark • Databricks • Distributed Computing

UCSB • June 2025

Leveraged Databricks and PySpark to process millions of records, analyzing how household demographics affect civic engagement. I built a scalable pipeline to categorize voter segments and visualized geographic patterns through custom choropleth maps.

Technical Highlights

Scalable Processing: Implemented custom PySpark UDFs for demographic segmentation of large-scale datasets.
Modeling: Compared Logistic Regression and Random Forest variants in a distributed computing environment.

Key Discovery

"Analysis revealed that homeownership is a primary driver of turnout, while single-person households showed the lowest probability of voting across all segments."

Technical Toolkit

💻

Languages & Frameworks

Python: Pandas, NumPy, Scikit-learn, PyTorch, OpenCV
R & SQL: Tidyverse, ggplot2, SparkSQL
Other: MATLAB, SAS

☁️

Cloud & Big Data

Platforms: Databricks, AzureML, Google Cloud
Processing: PySpark, SparkSQL
Data Mining: RapidMiner, OpenRefine

🚀

Dashboards & Apps

BI Tools: Tableau, Excel (MS Office)
App Frameworks: Streamlit, Dash, Tkinter
Reporting: Technical Writing, Storytelling

🧠

Strategic & Soft Skills

Data Storytelling
Collaboration
Analytical Thinking
Problem Solving
Adaptibility