🛒 Task 3 — NorgesGruppen Object Detection

Score = 0.7 × detection_mAP + 0.3 × classification_mAP (IoU ≥ 0.5)  ·  Rank #38 / 214  ·  Sandbox: Python 3.11 · CUDA 12.4 · L4 GPU · 300s timeout · no network

Best score
0.8331
+0.067 over baseline
Rank
#38
of 214 teams
Train images
3,431
211 real + 3,220 synthetic
Classes
356
grocery products
Val mAP50
0.454
YOLOv26l at ep 35

Submission History

DateScoreFile (repo root)Notes
Mar 19 0.7664 norgesgruppen_submission.zip Baseline: YOLOv26l 50ep · T4 · real data only · no TTA
Mar 20 17:28 0.6959 ↓ norgesgruppen_submission_knn.zip ❌ KNN re-classifier — studio/shelf domain gap killed it
Mar 20 18:45 0.8331 ✅ norgesgruppen_submission_tta.zip Current best. Same 50ep model + hflip TTA + WBF + conf=0.05
ETA: Mar 21 ~10:00 ~0.87–0.93 norgesgruppen_ensemble.zip 3-model ensemble: 50ep + YOLOv26l-synth + RT-DETR · multi-scale TTA · WBF

Active Training Jobs (Vertex AI — V100)

YOLOv26l + Synthetic  Running

Epoch~38 / 150
Val mAP500.454 (matches 50ep baseline at ep 35 — ✅ on track)
Lossesbox=0.90 ↓ · cls=1.05 ↓ · dfl=0.0077 ↓
Speed~7.5 it/s · ~8 min/epoch
Configimgsz=1280 · freeze=10 · copy_paste=0.8 · patience=30
Data3,431 images (211 real + 3,220 synthetic)
Job ID6947538410814832640
Outputgs://…/output/vertex_yolo26l_synth_v100/

~25% complete · ETA ~10 more hours (early stopping patience=30)

RT-DETR-l  Running

Epoch~18 / 100
Val mAP50~0.001 (expected — transformer warm-up, see below)
Lossesgiou=0.865 ↓ · cls=1.39 ↓ · l1=0.475 ↓
Speed~3.1 it/s · ~18 min/epoch
Configimgsz=1280 · freeze=0 · copy_paste=0.5 · patience=30
Data3,431 images (211 real + 3,220 synthetic)
Job ID3468507698671124480
Outputgs://…/output/vertex_rtdetr_v100/

~18% complete · ETA ~24 more hours (mAP expected to jump around ep 40–50)

What is RT-DETR?

RT-DETR (Real-Time Detection Transformer, Baidu 2023) is a transformer-based detector — the first fast enough to compete with YOLO on speed while matching or exceeding accuracy on small/occluded objects.


YOLOv26lRT-DETR-l
ArchitectureCNN + grid headCNN backbone + transformer decoder
DetectionGrid-based: cells predict boxesQuery-based: 300 queries each find one object
Post-processingNMS baked into ONNX graphEnd-to-end — no NMS, Hungarian matching
Training warmupFast — mAP by epoch 10Slow — transformer queries stabilise ~ep 40–50
StrengthDense scenes, fast convergenceSmall/occluded objects, fewer false positives

Why both? YOLO and RT-DETR make different errors. WBF-fusing their predictions produces better results than either alone. Near-zero RT-DETR mAP at epoch 18 is expected — the Hungarian matcher needs time to stabilise query assignments before meaningful detection begins.

Ensemble Strategy

Current (deployed): Single model + hflip TTA + WBF

Planned: 3-model ensemble + multi-scale TTA + WBF weights

run.py auto-discovers all *.onnx files in the submission directory — no code changes needed, just add files.

Other ensemble options considered

OptionEst. gainEffortStatus
Add 50ep model to ensemble+0.5–1%Trivial (file copy)Do now
Multi-scale TTA (1024/1280/1536px)+2–3%Re-export with dynamic=TrueOn re-export
WBF weights per model+0.2–0.5%TrivialDo now
SAHI (sliced tiled inference)+2–4%High — custom tile/merge logicIf time
Crop classifier (boosts 30% component)+1–3%Very high — separate model + trainingLow priority
KNN re-classifier–0.07 (tested)Dead end

Dataset

Real data (data/train/)

Synthetic data (data/synthetic/)

Why KNN Failed (–0.067)

❌ EfficientNet-B0 + KNN re-classifier

Rule: never replace a model's output with an external classifier unless it was trained on the same domain distribution.

Model Weights

FileSizeStatusUse
weights/best_yolo26l_50ep.pt54 MB Ready PyTorch weights from v1 training
weights/best_yolo26l_50ep.onnx96 MB Ready ONNX export — currently in submission/model.onnx
weights/best_yolo26l_synth.pt~54 MB Pending Download from gs://…/output/vertex_yolo26l_synth_v100/weights/best.pt
weights/best_rtdetr.pt~120 MB Pending Download from gs://…/output/vertex_rtdetr_v100/weights/best.pt

Steps When Training Finishes

# 1. Download weights
gcloud storage cp gs://ai-nm26osl-1852-norgesgruppen/output/vertex_yolo26l_synth_v100/weights/best.pt norgesgruppen/weights/best_yolo26l_synth.pt
gcloud storage cp gs://ai-nm26osl-1852-norgesgruppen/output/vertex_rtdetr_v100/weights/best.pt norgesgruppen/weights/best_rtdetr.pt

# 2. Export ONNX — use dynamic=True for multi-scale TTA support
#    (local ultralytics >= 8.3 only — NOT sandbox 8.1.0)
uv run python -c "
from ultralytics import YOLO
YOLO('norgesgruppen/weights/best_yolo26l_synth.pt').export(format='onnx', imgsz=1280, opset=17, dynamic=True)
YOLO('norgesgruppen/weights/best_rtdetr.pt').export(format='onnx', imgsz=1280, opset=17, dynamic=True)
"

# 3. Populate submission/ — run.py auto-discovers all *.onnx files
cp norgesgruppen/weights/best_yolo26l_50ep.onnx    norgesgruppen/submission/model_yolo26l_50ep.onnx
cp norgesgruppen/weights/best_yolo26l_synth.onnx   norgesgruppen/submission/model_yolo26l_synth.onnx
cp norgesgruppen/weights/best_rtdetr.onnx           norgesgruppen/submission/model_rtdetr.onnx

# 4. Rebuild ZIP (~300 MB for 3 models — within 420 MB limit)
cd norgesgruppen/submission
zip -r ../../norgesgruppen_ensemble.zip . -x ".*" "__MACOSX/*"

# 5. Submit at app.ainm.no/submit/norgesgruppen-data

Inference Architecture (submission/run.py)

Timing budget (3 models × 2 TTA): 6 passes × ~25ms × 250 images = ~37s total inference. Well inside 300s.

GPU Selection

GPUVRAMBatchSpeedOutcome
T416 GB1~11 min/ep✅ Used for v1 · completed
L424 GB2–4~4 min/ep est.❌ "Insufficient resources" in europe-west4
A10040 GB4–8~2 min/ep est.❌ Queue timeout 30+ min, never provisioned
V100 ✅16 GB1~6–8 min/ep✅ 2× faster than T4 · currently training both jobs

Sandbox Constraints

Monitor Training

# YOLOv26l + Synthetic
gcloud ai custom-jobs stream-logs projects/545349690507/locations/europe-west4/customJobs/6947538410814832640

# RT-DETR
gcloud ai custom-jobs stream-logs projects/545349690507/locations/europe-west4/customJobs/3468507698671124480