Abstract data network background

Hi, I'm Atharva Gupta

Aspiring Data Engineer & Python Developer

I’m a Data Science & Applied Mathematics student at Case Western Reserve University. I enjoy turning data into scalable systems and building full‑stack applications.

View Projects

Experience

Data Science Intern, CASFER

Jun 2024 – Present · Cleveland, OH

  • Built a data ingestion pipeline with a concurrent web crawler leveraging goroutines.
  • Utilized Apache Kafka to stream XML/JSON metadata enabling real‑time downstream processing.
  • Transformed unstructured XML/JSON/HTML records with Apache Spark, normalizing spatial and temporal spans into query‑ready schemas.
  • Deployed a SQL data warehouse on AWS Redshift to support complex queries over spatio‑temporal dimensions.
  • Implemented a cloud‑native data lake architecture on AWS (S3 + Glue + Redshift), integrating streaming (Kafka) and batch (Spark) pipelines for scalable geospatial data management.
  • Deployed a full‑stack geospatial search engine RAG web application on AWS EC2 using a Go backend, FastAPI microservices, LLaMA 3.2 80b for query processing and a Next.js frontend.

Data Science Research Assistant, SDLE

Jan 2024 – Present · Cleveland, OH

  • Developed a linear‑optimization model to determine optimal routes for transport of nitrogen‑based fertilizers in Ohio using Python and Google OR‑Tools.
  • Created algorithms for conducting spatial joins on datasets with inconsistent coordinate reference systems and location names using R.
  • Wrote Python tools for named‑entity recognition of geospatial data entities to improve indexing and searchability.
  • Orchestrated daily data ingestion pipelines for scraping and transforming RDF triples using Python and Apache Airflow.

Data Science Intern, Lawrence Livermore National Laboratory

Jul 2024 – Aug 2024 · Livermore, CA

  • Selected for a competitive national ML research program to analyze electrocardiogram and electroanatomic mapping data.
  • Engineered deep learning models (CNNs) achieving 95.2% accuracy in classifying cardiac arrhythmias from time–frequency ECG representations.
  • Generated high‑resolution cardiac voltage maps using Fast‑Fourier transforms for signal processing and CNN‑based interpolation; achieved a 10.2 ms RMSE across 75 heart regions.
  • Presented a first‑author research poster summarizing results at the lab’s annual symposium.

Projects

CanvasLM

Built and deployed a FastAPI backend with REST microservices and a Next.js frontend, enabling users to interact with AI‑generated study materials through a responsive web interface.

Technologies: Python, FastAPI, Next.js

View on GitHub

Geospatial Search Engine (RAG + LLMs)

Developed a retrieval‑augmented generation pipeline using LLaMA‑based large language models to query structured and unstructured geospatial data.

Designed the backend with Go and FastAPI microservices to parse metadata (XML/JSON) and integrate semantic search over spatial and temporal dimensions.

Technologies: Go, FastAPI, Python, Next.js

View on GitHub

Geospatial Web Scraper

Concurrent geospatial web crawler built in Go to collect metadata across thousands of datasets. Integrates with Kafka and Spark for streaming and batch ingestion.

Technologies: Go, Python, Kafka, Apache Spark

View on GitHub

Contact

If you’re interested in collaborating or have any questions, feel free to reach out!