Data Scientist · Analytics Engineer

Building the pipeline. Building the model.

14 years in analytics and data systems. I work in Python, SQL, and BigQuery on GCP (Vertex AI, Docker), partnering with data engineers to productionize analytics via dbt. Co-author of the eda-toolkit open-source Python library. M.S. Applied Data Science, 2023.

↗ JupyterCon 2025 Co-Presenter ↗ Education Data Science Summit 2021
Oscar Gil Data
14Years Experience
M.S.Applied Data Science
2Conference Presentations
32dbt Models in Production

What I bring
to the table

Languages & Query

PythonSQLR

Cloud & Infrastructure

GCPBigQueryVertex AIDockerdbt

Machine Learning

Predictive ModelingSHAPscikit-learnCausal / A·B

Visualization

LookerTableauPowerBIMatplotlib

Data Engineering

SQL ServerSSRS / SSISETL PipelinesData Warehousing

Analysis & EDA

pandasstatsmodelsSciPyNumPypytestDBeaver

Experience

Jan 2024 – Present University of California, Riverside

Data Scientist

Developed and evaluated predictive models — including Random Forest, Neural Networks, and Deep Neural Networks (BigQuery ML) — to identify at-risk students, enabling earlier intervention and improved decision-making. Engineered behavioral features from LMS and SIS data including engagement trends, activity rates, and temporal signals used directly in production models.

Owned end-to-end development of a production-grade analytics pipeline using dbt (32 models) in BigQuery, transforming raw institutional data into analytics-ready feature marts supporting both modeling and stakeholder reporting. Implemented data quality testing and validation within dbt to ensure reliability and trust in downstream use cases.

Migrated consultant-built ML applications into GCP-native environments using Vertex AI, Docker, and Cloud Workstations, improving reproducibility and scalability. Modernized institutional reporting by refactoring legacy stored procedure logic into warehouse-native SQL, integrating Python-based statistical analysis, and building a Looker dashboard supporting leadership — recognized with the UCR STAR Award (February 2026).

PythonBigQueryBigQuery MLVertex AIdbtDockerLookerSQLscikit-learnSHAP
2024 – Present Data Science Dynamics

Co-Founder & Data Science Consultant

Co-founded boutique data science consultancy delivering machine learning solutions, predictive modeling, and advanced analytics across research and data-driven organizations.

Co-developed the EDA Toolkit, a production-grade open-source Python library for exploratory data analysis and statistical workflows, under contract with Taylor & Francis (anticipated publication late 2026/early 2027). Presented EDA Toolkit at JupyterCon 2025 alongside collaborator Leon Shpaner (UCLA Health), demonstrating reproducible analytics pipeline design to an international audience.

PythonpandasSciPystatsmodelsSHAPscikit-learn
Summer 2026 – Present University of San Diego, Professional & Continuing Education

Adjunct Instructor — Data Engineering

Facilitating coursework in applied data engineering with focus on pipeline design, ingestion workflows, and data quality best practices. Guiding students in designing production-oriented data workflows using SQL and cloud-based data platforms; evaluating student-built pipelines for reliability, scalability, and validation rigor.

SQLPythonCloud Data PlatformsPipeline Design
2017 – 2023 National School District

Lead Database Analyst

Designed and maintained large-scale ETL pipelines and analytics workflows supporting enterprise reporting and operational decision-making across the district. Built scalable star schema data models to support cross-functional analytics use cases across multiple source systems.

Developed dashboards and reporting solutions (SSRS) for operational and executive stakeholders; optimized SQL queries and BI infrastructure to improve performance and reduce latency. Led stakeholder sessions to translate business requirements into structured analytical solutions and reporting frameworks.

SQL ServerPythonSSRSSSISETLStar SchemaData Warehousing
2011 – 2017 National School District

Database Analyst

Developed ETL pipelines integrating multiple data systems using SQL, SSIS, and Python; built reporting solutions and automated workflows supporting business operations. Maintained and supported SQL Server-based data infrastructure, improving data accessibility and consistency across the organization.

SQL ServerSSISPythonExcelReporting
Jan 2014 – Jul 2014 Fallbrook Union Elementary School District

Data Migration Consultant

Project lead for student information system migration. Scoped infrastructure requirements (VPN, SFTP, SQL Server), set up database restore processes from remote servers, and authored migration scripts that successfully transferred all student records.

SQL ServerSFTPData Migration
Jan 2009 – Jun 2011 San Diego County Office of Education

System Technician II

Diagnosed and resolved student information system issues using SQL, Access, and Excel. Assisted clients with data extracts meeting vendor file layout specifications.

SQLAccessExcel

Portfolio

Conference Presentation · 2025

JupyterCon 2025 — EDA Toolkit Tutorial

Co-presented a 25-minute session introducing EDA Toolkit to the JupyterCon audience — covering summary tables, automated profiling, distribution and crosstab visuals, and reproducible export workflows.

Open Source Library · Under Contract Taylor & Francis

EDA Toolkit — PyPI Python Library

Co-developed an open-source Python library for fast, reproducible exploratory data analysis. Includes EDA tools, contingency table creation, hypothesis testing, and reproducible workflow helpers. Under contract with Taylor & Francis for publication (anticipated late 2026/early 2027).

Nonprofit / Pro Bono

Leadership Survey Data Analysis

Collaborated with a nonprofit via Catchafire to analyze pre- and post-retreat leadership survey data. Designed efficient data staging and visualization workflows to summarize participant feedback and support program evaluation.

Technical Guide

Detecting SQL Server Truncation Errors with Python

Proactively identified and resolved a SQL Server data truncation issue by combining SQL Agent monitoring with Python-based inspection — catching the problem before users were impacted.

Technical Guide

From Pivot Tables to Python Crosstab

Demonstrated replacing manual Excel/Google Sheets pivot tables with automated, repeatable crosstab analysis in Python using COVID-19 vaccine data from the California Open Data Portal.

M.S. Capstone · 2022

Predicting ELPAC Proficiency for K–6 English Learners

Developed ML models to predict English proficiency (ELPAC) levels using five years of California school district data, enabling early intervention for English Learner students.

Education & Credentials

Education

M.S. Applied Data Science

University of San Diego

2023

B.S. Computer Networks

Coleman University

Certifications & Recognition

STAR Award

UC Riverside — Modernizing institutional reporting with dbt, BigQuery & Looker

February 2026

Taylor & Francis — EDA Toolkit (Under Contract)

Anticipated publication late 2026 / early 2027

CTO Mentor Program Graduate (CCTO)

CITE — California IT in Education

JupyterCon 2025 Speaker

NumFOCUS / JupyterCon Conference

2025

Education Data Science Summit Presenter

Co-presented on ML & attendance modeling

2021

Open to new
projects & opportunities

If you need help with data science, analytics, automation, SQL, Python, or report development — I'd love to hear about it.