~/marzouk $Open to co-ops & internships from December 2026

// about

I'm Muhammad. First-year MS AI student at Northeastern, came in with a CS undergrad from Rochester and a few years across software engineering, data science, and ML.

What I think makes me a bit different is that the software engineering background sits underneath everything else. I've built production tools, worked with real codebases and real APIs, and I think carefully about system design even when the work is mostly modeling. That means I can contribute at both the modeling and the engineering layer of a project.

My interests got more specific the deeper I went. I came in thinking I'd be primarily on the engineering side of AI systems. Independent research on genetic algorithms and the Multidimensional Knapsack Problem made it clear the modeling and math side is just as interesting to me. Combinatorial optimization, stochastic methods, probabilistic modeling. I'm as engaged deriving feasibility distributions from generating functions as I am building infrastructure around them.

So what I'm really drawn to is the intersection. ML infrastructure and applied modeling together. Some flavor of ML engineer or research engineer role where both how things are designed and how they actually ship matter.

When I'm not in front of a screen I'm usually out playing golf or cricket, or down some rabbit hole reading about history, different cultures, and political science.

Muhammad Marzouk Baig
currentlyMS AI @ Northeastern
based inPortland, ME
focusML Engineering + Research
emailbaig.muham@northeastern.edu

// skills

Languages
PythonTypeScriptJavaScript
ML & Deep Learning
PyTorchLLM fine-tuning (QLoRA / LoRA)TransformersNLPscikit-learn
MLOps & Infrastructure
DockerCI/CDGitREST APIsModel quantization (GGUF)llama.cpp
Data & Modeling
PandasNumPyspaCyStatistical modelingFeature engineeringData engineering
Research
Combinatorial optimizationGenetic algorithmsProbabilistic modelingGPU Monte Carlo

// experience

IT Support Consultant

Simon Business School, University of Rochester · Rochester, NY

Jul 2023 – Dec 2024
  • Resolved 100+ hardware and software tickets monthly maintaining 99% system uptime; used Jira for ticket tracking, triage, and workflow management.
  • Developed internal application widgets in Python to automate network-based administrative workflows using CI/CD pipelines.
  • Rebuilt and digitized the IT documentation platform, reducing new employee onboarding time by 30%.

Software Engineering Intern

1010data · New York, NY

Jun 2022 – Aug 2022
  • Designed and built a pip-installable Python library mirroring the Pandas API, converting method calls to 1010data's proprietary XML query language via a custom translation algorithm.
  • Integrated the library end-to-end: SSO auth, custom REST endpoints, server-side execution, live results in the 1010data GUI.
  • Built an HTTP request-based automation layer replacing a Selenium prototype that failed internal security review; new REST endpoints cut codebase size by over 30%.
  • Authored dual-audience technical documentation in Confluence covering library architecture and a client-facing user manual.

Full Stack Developer

Cronus · Rochester, NY

Feb 2021 – Aug 2021
  • Integrated the Google Places API into a React Native mobile application, implementing real-time address auto-fill with API key management and request throttling.
  • Built a two-step user registration and authentication flow with client-side form validation, persisting data to Firestore with real-time read/write.
  • Designed and implemented 10+ screens including the Vendor Profile screen, using Figma for mockups and establishing reusable component patterns.

// projects

Featured Research

Generating Function Initialization for Genetic Algorithms

MS Research, Northeastern. April 2026 to now.

Standard genetic algorithms die on tight Multidimensional Knapsack instances. Every solution in the initial population is infeasible, selection never operates, and the search is dead on arrival. I built a different initialization approach using combinatorial generating functions to derive per-item sampling probabilities. No objective values needed, no LP relaxation. GPU Monte Carlo estimates all 100 probabilities in about 6 seconds on a T4. On the tight benchmark: uniform initialization gets 0% feasibility. This gets 52.5% and converges to 96.5% of the reference best. Confirmed novelty through correspondence with Hill (1999), the closest prior work. Paper and code on GitHub.

PythonPyTorchGPUGenetic AlgorithmsCombinatorial OptimizationResearch
$ git commit -m "feat: ..."
live demo

Committed

Fine-tuned commit-message model. 2026.

A 1.7B model I fine-tuned to turn a git diff into a Conventional Commits message. On a 442-diff held-out eval, fine-tuning lifted commit-type accuracy from 0.13 to 0.64 and faithfulness from 0.43 to 0.86 over the base model. It serves as a ~1 GB quantized GGUF on llama.cpp, CPU-only, so nothing leaves your machine, and a GBNF grammar makes every output a valid commit by construction.

Fine-tuningQLoRALLMsMLOps
Try the live demo →View on GitHub →
>>> df.groupby('income_tier')['housing'].value_counts()

Listening to Southern Maine

United Way of Southern Maine. Northeastern XN Program. Spring 2026.

Mixed-methods analysis of 1,702 community survey responses for United Way of Southern Maine. The data engineering was messy: 8 different survey instruments, 18 questions, none asking the same things, producing a sparse ragged matrix where most empty cells are unasked rather than unanswered. We harmonized the schema, canonicalized open-text responses with NLP, and ran statistical modeling to answer three questions: what residents say their biggest challenges are, which challenges differ across economic strata, and who feels unheard. Main finding: housing cost is cited by 82% of respondents across every income group, confirmed no significant difference by chi-square. Any housing initiative framed as low-income support misses most of the people it affects. Full 25-page report and 14-slide deck in the repo.

PythonNLPspaCyscikit-learnStatistical ModelingData Engineering
>>> model.fit(X_proximity, y_gentrify)

Gentrification and University Proximity

Independent research, University of Rochester. Fall 2023 to Fall 2024.

Quantitative study on whether proximity to private universities in Monroe County, NY correlates with neighborhood-level gentrification indicators in Rochester. Full pipeline: data acquisition from census and geospatial sources, feature engineering on demographic variables, cross-validated logistic and linear regression in scikit-learn. The interesting part was operationalizing gentrification correctly before touching any models.

Pythonscikit-learnPandasGeospatialRegression

GPT from Scratch

Independent project. 2026.

A working ~10M-parameter character-level GPT, built in PyTorch from scratch and trained on Shakespeare until it produces coherent Shakespearean text. Every component implemented by hand: multi-head self-attention, transformer blocks with residual connections and layernorm, positional encoding, and autoregressive generation. The endpoint of the from-scratch track below; follows Karpathy's Neural Networks: Zero to Hero.

PythonPyTorchTransformersDeep LearningNLP

Neural Networks from Scratch

Independent projects. 2026.

Building the ML stack from the ground up to understand every layer before reaching for a framework. First a scalar-valued autograd engine in pure Python: backpropagation over a dynamically built DAG, no PyTorch or NumPy in the core. Then a language-modeling progression: bigram counts, a Bengio-style MLP with character embeddings, manual Xavier and Kaiming init and batch normalization, and full backprop through softmax, cross-entropy, and batchnorm by hand, validated against PyTorch autograd, up to a WaveNet-style model. The working GPT above is where this track lands.

PythonPyTorchAutogradBackpropagationLanguage Models
more projects on GitHub →

// contact

Open to co-ops and internships from December 2026, along with research collaborations and ML engineering roles. Feel free to reach out.

emailbaig.muham@northeastern.edugithubgithub.com/marzoukbaig14linkedinlinkedin.com/in/muhammadmarzoukbaig