Task 3 — NorgesGruppen Object Detection

Score = 0.7 × detection_mAP + 0.3 × classification_mAP (IoU ≥ 0.5) · Rank #38 / 214 · Sandbox: Python 3.11 · CUDA 12.4 · L4 GPU · 300s timeout · no network

Submission History

Date	Score	File (repo root)	Notes
Mar 19	0.7664	`norgesgruppen_submission.zip`	Baseline: YOLOv26l 50ep · T4 · real data only · no TTA
Mar 20 17:28	0.6959 ↓	`norgesgruppen_submission_knn.zip`	❌ KNN re-classifier — studio/shelf domain gap killed it
Mar 20 18:45	0.8331 ✅	`norgesgruppen_submission_tta.zip`	Current best. Same 50ep model + hflip TTA + WBF + conf=0.05
ETA: Mar 21 ~10:00	~0.87–0.93	`norgesgruppen_ensemble.zip`	3-model ensemble: 50ep + YOLOv26l-synth + RT-DETR · multi-scale TTA · WBF

Active Training Jobs (Vertex AI — V100)

YOLOv26l + Synthetic Running

Epoch	~38 / 150
Val mAP50	0.454 (matches 50ep baseline at ep 35 — ✅ on track)
Losses	box=0.90 ↓ · cls=1.05 ↓ · dfl=0.0077 ↓
Speed	~7.5 it/s · ~8 min/epoch
Config	imgsz=1280 · freeze=10 · copy_paste=0.8 · patience=30
Data	3,431 images (211 real + 3,220 synthetic)
Job ID	`6947538410814832640`
Output	`gs://…/output/vertex_yolo26l_synth_v100/`

~25% complete · ETA ~10 more hours (early stopping patience=30)

RT-DETR-l Running

Epoch	~18 / 100
Val mAP50	~0.001 (expected — transformer warm-up, see below)
Losses	giou=0.865 ↓ · cls=1.39 ↓ · l1=0.475 ↓
Speed	~3.1 it/s · ~18 min/epoch
Config	imgsz=1280 · freeze=0 · copy_paste=0.5 · patience=30
Data	3,431 images (211 real + 3,220 synthetic)
Job ID	`3468507698671124480`
Output	`gs://…/output/vertex_rtdetr_v100/`

~18% complete · ETA ~24 more hours (mAP expected to jump around ep 40–50)

What is RT-DETR?

RT-DETR (Real-Time Detection Transformer, Baidu 2023) is a transformer-based detector — the first fast enough to compete with YOLO on speed while matching or exceeding accuracy on small/occluded objects.

	YOLOv26l	RT-DETR-l
Architecture	CNN + grid head	CNN backbone + transformer decoder
Detection	Grid-based: cells predict boxes	Query-based: 300 queries each find one object
Post-processing	NMS baked into ONNX graph	End-to-end — no NMS, Hungarian matching
Training warmup	Fast — mAP by epoch 10	Slow — transformer queries stabilise ~ep 40–50
Strength	Dense scenes, fast convergence	Small/occluded objects, fewer false positives

Why both? YOLO and RT-DETR make different errors. WBF-fusing their predictions produces better results than either alone. Near-zero RT-DETR mAP at epoch 18 is expected — the Hungarian matcher needs time to stabilise query assignments before meaningful detection begins.

Ensemble Strategy

Current (deployed): Single model + hflip TTA + WBF

model.onnx — YOLOv26l 50ep (96 MB)
2 passes per image: original + horizontal flip
WBF fusion, conf=0.05, iou_thr=0.55
Score: 0.8331

Planned: 3-model ensemble + multi-scale TTA + WBF weights

Model A — model_yolo26l_50ep.onnx (already exists) · trained on real data only · weight 1.0
Model B — model_yolo26l_synth.onnx (pending training) · 3,431 images, 150ep · weight 1.5
Model C — model_rtdetr.onnx (pending training) · transformer, different error profile · weight 1.0
TTA — original + horizontal flip + optional multi-scale (1024/1280/1536px, requires dynamic=True ONNX export)
Passes — 3 models × 2 TTA = 6 passes · ~18 passes with multi-scale · comfortably within 300s on L4

run.py auto-discovers all *.onnx files in the submission directory — no code changes needed, just add files.

Other ensemble options considered

Option	Est. gain	Effort	Status
Add 50ep model to ensemble	+0.5–1%	Trivial (file copy)	Do now
Multi-scale TTA (1024/1280/1536px)	+2–3%	Re-export with `dynamic=True`	On re-export
WBF weights per model	+0.2–0.5%	Trivial	Do now
SAHI (sliced tiled inference)	+2–4%	High — custom tile/merge logic	If time
Crop classifier (boosts 30% component)	+1–3%	Very high — separate model + training	Low priority
KNN re-classifier	–0.07 (tested)	—	Dead end

Dataset

Real data (`data/train/`)

248 shelf images · 22,731 COCO annotations
356 product categories (0-indexed)
Sections: Egg, Frokost, Knekkebrod, Varmedrikke
110 classes with fewer than 10 appearances
GCS: gs://ai-nm26osl-1852-norgesgruppen/data/train/

Synthetic data (`data/synthetic/`)

3,220 images generated by augment.py
327 reference product images pasted onto real shelf crops
Covers 161 rare classes (<10 real appearances)
Background removal: luminance threshold + cv2 morphology
Verified: 0 bad boxes · 0 orphans · all class IDs in barcode map
GCS: gs://ai-nm26osl-1852-norgesgruppen/data/synthetic/

Why KNN Failed (–0.067)

❌ EfficientNet-B0 + KNN re-classifier

Domain gap: Reference images = clean studio shots on white backgrounds. Shelf crops = noisy, occluded, variable lighting. Embeddings don't transfer.
Too sparse: ~5 reference images per class. KNN with k=3 has no signal.
Overrides correct YOLO predictions: YOLO trained on real shelves. KNN replaces correct answers with wrong ones.

Rule: never replace a model's output with an external classifier unless it was trained on the same domain distribution.

Model Weights

File	Size	Status	Use
`weights/best_yolo26l_50ep.pt`	54 MB	Ready	PyTorch weights from v1 training
`weights/best_yolo26l_50ep.onnx`	96 MB	Ready	ONNX export — currently in `submission/model.onnx`
`weights/best_yolo26l_synth.pt`	~54 MB	Pending	Download from `gs://…/output/vertex_yolo26l_synth_v100/weights/best.pt`
`weights/best_rtdetr.pt`	~120 MB	Pending	Download from `gs://…/output/vertex_rtdetr_v100/weights/best.pt`

Steps When Training Finishes

# 1. Download weights
gcloud storage cp gs://ai-nm26osl-1852-norgesgruppen/output/vertex_yolo26l_synth_v100/weights/best.pt norgesgruppen/weights/best_yolo26l_synth.pt
gcloud storage cp gs://ai-nm26osl-1852-norgesgruppen/output/vertex_rtdetr_v100/weights/best.pt norgesgruppen/weights/best_rtdetr.pt

# 2. Export ONNX — use dynamic=True for multi-scale TTA support
#    (local ultralytics >= 8.3 only — NOT sandbox 8.1.0)
uv run python -c "
from ultralytics import YOLO
YOLO('norgesgruppen/weights/best_yolo26l_synth.pt').export(format='onnx', imgsz=1280, opset=17, dynamic=True)
YOLO('norgesgruppen/weights/best_rtdetr.pt').export(format='onnx', imgsz=1280, opset=17, dynamic=True)
"

# 3. Populate submission/ — run.py auto-discovers all *.onnx files
cp norgesgruppen/weights/best_yolo26l_50ep.onnx    norgesgruppen/submission/model_yolo26l_50ep.onnx
cp norgesgruppen/weights/best_yolo26l_synth.onnx   norgesgruppen/submission/model_yolo26l_synth.onnx
cp norgesgruppen/weights/best_rtdetr.onnx           norgesgruppen/submission/model_rtdetr.onnx

# 4. Rebuild ZIP (~300 MB for 3 models — within 420 MB limit)
cd norgesgruppen/submission
zip -r ../../norgesgruppen_ensemble.zip . -x ".*" "__MACOSX/*"

# 5. Submit at app.ainm.no/submit/norgesgruppen-data

Inference Architecture (submission/run.py)

Auto-discovers all *.onnx files in the directory at startup
For each model: runs original image + horizontal flip (TTA)
With dynamic=True export: also runs at 1024px and 1536px (multi-scale TTA)
Fuses all predictions with Weighted Boxes Fusion — iou_thr=0.55, skip_box_thr=0.01
Per-model WBF weights: synth model × 1.5, others × 1.0
conf=0.05 — low threshold maximises recall for mAP
onnxruntime CUDAExecutionProvider on L4 (24 GB)

Timing budget (3 models × 2 TTA): 6 passes × ~25ms × 250 images = ~37s total inference. Well inside 300s.

GPU Selection

GPU	VRAM	Batch	Speed	Outcome
T4	16 GB	1	~11 min/ep	✅ Used for v1 · completed
L4	24 GB	2–4	~4 min/ep est.	❌ "Insufficient resources" in europe-west4
A100	40 GB	4–8	~2 min/ep est.	❌ Queue timeout 30+ min, never provisioned
V100 ✅	16 GB	1	~6–8 min/ep	✅ 2× faster than T4 · currently training both jobs

Sandbox Constraints

Python 3.11 · CUDA 12.4 · NVIDIA L4 (24 GB) · no network · 300s timeout
Pre-installed: PyTorch 2.6 · ultralytics 8.1.0 · onnxruntime-gpu 1.20.0 · ensemble-boxes 1.0.9
Blocked: os sys subprocess shutil yaml → use pathlib + json
ONNX compat: yolo26 / yolo11 / RT-DETR not in 8.1.0 → must export as ONNX locally
ONNX opset: 17 max (opset 20+ unsupported by onnxruntime 1.20.0)
ZIP limits: 420 MB unzipped · 3 weight files max · 3 models ~300 MB ✅

Monitor Training

# YOLOv26l + Synthetic
gcloud ai custom-jobs stream-logs projects/545349690507/locations/europe-west4/customJobs/6947538410814832640

# RT-DETR
gcloud ai custom-jobs stream-logs projects/545349690507/locations/europe-west4/customJobs/3468507698671124480

🛒 Task 3 — NorgesGruppen Object Detection

Submission History

Active Training Jobs (Vertex AI — V100)

YOLOv26l + Synthetic Running

RT-DETR-l Running

What is RT-DETR?

Ensemble Strategy

Current (deployed): Single model + hflip TTA + WBF

Planned: 3-model ensemble + multi-scale TTA + WBF weights

Other ensemble options considered

Dataset

Real data (`data/train/`)

Synthetic data (`data/synthetic/`)

Why KNN Failed (–0.067)

❌ EfficientNet-B0 + KNN re-classifier

Model Weights

Steps When Training Finishes

Inference Architecture (`submission/run.py`)

GPU Selection

Sandbox Constraints

Monitor Training

🛒 Task 3 — NorgesGruppen Object Detection

Submission History

Active Training Jobs (Vertex AI — V100)

YOLOv26l + Synthetic Running

RT-DETR-l Running

What is RT-DETR?

Ensemble Strategy

Current (deployed): Single model + hflip TTA + WBF

Planned: 3-model ensemble + multi-scale TTA + WBF weights

Other ensemble options considered

Dataset

Real data (data/train/)

Synthetic data (data/synthetic/)

Why KNN Failed (–0.067)

❌ EfficientNet-B0 + KNN re-classifier

Model Weights

Steps When Training Finishes

Inference Architecture (submission/run.py)

GPU Selection

Sandbox Constraints

Monitor Training

Real data (`data/train/`)

Synthetic data (`data/synthetic/`)

Inference Architecture (`submission/run.py`)