JoeyLLM

TechLauncher Team 2026 — Building specialised language models from the ground up

Project Vision

JoeyLLM is a hands-on project focused on building language-model workflows from end to end, with a strong emphasis on Australian and other domain-specific language use.

The goal is not just to produce models, but to deeply understand how data quality, infrastructure, and training choices shape the behaviour of modern LLM systems.

Our Pipeline

1

Clean & Filter

Process large web datasets (60 TB FineWeb corpus), filter and normalise text

2

Classify

Build classifiers to identify region, domain, and language patterns

3

Fine-Tune

Use curated, high-quality datasets to fine-tune specialised language models

Semester Roadmap

S1 Data & Infrastructure

  • Explore & clean FineWeb dataset
  • Filter content and normalise text
  • Build text classifiers (region, domain, metadata)
  • Produce high-quality filtered datasets for training

S2 Model Training

  • Fine-tune existing models on curated datasets
  • Regional models: Australian English, Canadian English, etc.
  • Domain models: banking, defence, science, hobbies
  • Understand how datasets shape model behaviour

Project Goals

Goal Deliverables

  • Tools for cleaning large web datasets
  • Text classification models
  • Training workflows and pipelines
  • Fine-tuned language models for specialised contexts

Infra Compute Environment

  • Remote GPU servers with L4 GPUs
  • JupyterHub environment at 10.55.0.245
  • WireGuard VPN for secure access
  • A100 GPU clusters for large training jobs

Repository Structure

Folder Purpose
knowledge-base/ Combined compute infrastructure, data, models, learning resources, Q&A, papers, and platform references
roadmap/ Project goals, planning docs, semester overviews
team-members/ Combined member profiles and notebook progress tracking
management/ Weekly reports, weekly TODOs, and coordination tracking

Resources & Links

Team Members

Alisonsun7
u8018638
Alisonsun7
Sean593380
u8125484
Sean593380
XingyuLi2
u7994712
XingyuLi2
2513238602
u8188942
2513238602
Posture627K
u8260186
Posture627K

View full team details →