PureGym NLP Topic Project

CAM_DS_301 Topic Project 1 · Pierre Sutherland · 39,923 customer reviews across Google + Trustpilot · BERTopic, Gensim LDA, BERT emotion classifier, Qwen2.5-7B-Instruct. Audio overview by Nadia in the bar above.

📝 The report

~1,200 words · over the 1,000-word rubric ceiling, tutor-tolerance applied · BERTopic vs LDA, emotion analysis, complaint topics, location hotspots

📥 PDF 📥 .md ↗️ open

📓 The notebook

Pierre-voice · 127 cells (53 code, 74 markdown) · all 53 cells executed end-to-end on Colab · all 48 rubric items in linear order · original-voice version available via Files tab

📥 .ipynb ↗️ open rendered

🛠️ If you want to know more

The tabs above lead to everything else: the consultant-style extended memo (LGP+KKR framing), the visual walkthrough, the per-rubric audio deck, the Top-20 tips crib sheet. The "How it happened" group covers pivots, surprises, validation, and the rules I learned to follow — including a 50-row hand-labelled gold table showing the emotion classifier's US/UK distribution shift.

📐 Extended memo 🛠️ How it happened 📋 Hand-labelled rows 📥 All files 🎧 Per-rubric audio 📊 Visual walkthrough 📖 Crib sheet ✅ Top 20 tips

Restricted · PACE NLP

PureGym NLP Topic Project

📝 The report

📓 The notebook

🛠️ If you want to know more