Experience
Data Science Intern, CASFER
Jun 2024 – Present · Cleveland, OH
- Built a data ingestion pipeline with a concurrent web crawler leveraging goroutines.
- Utilized Apache Kafka to stream XML/JSON metadata enabling real‑time downstream processing.
- Transformed unstructured XML/JSON/HTML records with Apache Spark, normalizing spatial and temporal spans into query‑ready schemas.
- Deployed a SQL data warehouse on AWS Redshift to support complex queries over spatio‑temporal dimensions.
- Implemented a cloud‑native data lake architecture on AWS (S3 + Glue + Redshift), integrating streaming (Kafka) and batch (Spark) pipelines for scalable geospatial data management.
- Deployed a full‑stack geospatial search engine RAG web application on AWS EC2 using a Go backend, FastAPI microservices, LLaMA 3.2 80b for query processing and a Next.js frontend.
Data Science Research Assistant, SDLE
Jan 2024 – Present · Cleveland, OH
- Developed a linear‑optimization model to determine optimal routes for transport of nitrogen‑based fertilizers in Ohio using Python and Google OR‑Tools.
- Created algorithms for conducting spatial joins on datasets with inconsistent coordinate reference systems and location names using R.
- Wrote Python tools for named‑entity recognition of geospatial data entities to improve indexing and searchability.
- Orchestrated daily data ingestion pipelines for scraping and transforming RDF triples using Python and Apache Airflow.
Data Science Intern, Lawrence Livermore National Laboratory
Jul 2024 – Aug 2024 · Livermore, CA
- Selected for a competitive national ML research program to analyze electrocardiogram and electroanatomic mapping data.
- Engineered deep learning models (CNNs) achieving 95.2% accuracy in classifying cardiac arrhythmias from time–frequency ECG representations.
- Generated high‑resolution cardiac voltage maps using Fast‑Fourier transforms for signal processing and CNN‑based interpolation; achieved a 10.2 ms RMSE across 75 heart regions.
- Presented a first‑author research poster summarizing results at the lab’s annual symposium.
Projects
CanvasLM
Built and deployed a FastAPI backend with REST microservices and a Next.js frontend, enabling users to interact with AI‑generated study materials through a responsive web interface.
Technologies: Python, FastAPI, Next.js
View on GitHubGeospatial Search Engine (RAG + LLMs)
Developed a retrieval‑augmented generation pipeline using LLaMA‑based large language models to query structured and unstructured geospatial data.
Designed the backend with Go and FastAPI microservices to parse metadata (XML/JSON) and integrate semantic search over spatial and temporal dimensions.
Technologies: Go, FastAPI, Python, Next.js
View on GitHubGeospatial Web Scraper
Concurrent geospatial web crawler built in Go to collect metadata across thousands of datasets. Integrates with Kafka and Spark for streaming and batch ingestion.
Technologies: Go, Python, Kafka, Apache Spark
View on GitHubContact
If you’re interested in collaborating or have any questions, feel free to reach out!
Email: atharva.jgupta@gmail.com
GitHub: github.com/atharva789
LinkedIn: linkedin.com/in/atharvagupta