-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help needed] Test-time augmentation (TTA) #503
Comments
@mingxingtan was reviewing your repo and saw this. I might be able to help. I designed Ultralytics TTA strategy for yolov3 (and now https://github.com/ultralytics/yolov5) to increase mAP while minimizing the added extra FLOPS. For yolov5x we see a mAP increase from 48.4 to 50.0 after applying TTA, with about a 2-3X slowdown I think. I'll try to run the numbers and post here. |
Ok, here are the results. I've created a documentation issue on our yolov5 repo to help everyone understand also: Before You StartClone YOLOv5 repo and install requirements.txt dependencies, including Python>=3.7 and PyTorch>=1.5. git clone https://github.com/ultralytics/yolov5 # clone repo
cd yolov5
pip install -r requirements.txt # install requirements.txt Test NormallyThis command tests YOLOv5x on COCO val2017 at image size 672 pixels. python test.py --weights yolov5x.pt --data coco.yaml --img 672 Output: Namespace(augment=False, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=672, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)
Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 13153.65it/s]
Class Images Targets P R [email protected] [email protected]:.95: 100% 157/157 [03:04<00:00, 1.17s/it]
all 5e+03 3.63e+04 0.426 0.746 0.66 0.469
Speed: 22.9/2.1/25.0 ms inference/NMS/total per 672x672 image at batch-size 32
COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=90.56s).
Accumulating evaluation results...
DONE (t=11.87s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.484
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.668
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.311
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.535
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.371
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.609
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.663
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.715
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.807 Test with TTAAppend python test.py --weights yolov5x.pt --data coco.yaml --img 832 --augment Output: Namespace(augment=True, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=832, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)
Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 14939.35it/s]
Class Images Targets P R [email protected] [email protected]:.95: 100% 157/157 [07:48<00:00, 2.98s/it]
all 5e+03 3.63e+04 0.313 0.794 0.671 0.483
Speed: 77.5/3.2/80.7 ms inference/NMS/total per 832x832 image at batch-size 32 < ---------- slower
COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.43s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=110.23s).
Accumulating evaluation results...
DONE (t=14.43s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.500 < ---------- increased AP
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.678
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.546
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.336
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.689 < ---------- increased AR
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.534
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.734
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826 |
@glenn-jocher Thanks for the information! This is very nice: +1.2AP with about 3x slower. Could you share the main idea of this fast TTS? (Or is there any doc for it)? |
@mingxingtan yes, +1.6AP actually :) Honestly though, single-model inference improvements will probably always be better than TTA in terms of the +mAP per time or FLOP, but yes it is one final step that you can take to boost your single-model results. For EfficientDet it may boost it above 55.0 (!), but efficientdet may also benefit less than yolov5 because it already has 5 output maps rather than the 3 in yolov5, so it is already exploiting a wider range of multi-scale features than yolov5. It will see the same improvement from left-right flips though. I don't have any documentation beyond the tutorials in the yolov5 repo. Lots of people have been asking for an arxiv paper, but I simply have not had time. I am aiming to get a paper out by the end of the year after some more experiments. Maybe we could do a video call to discuss? You can email me or send a Google Calendar invite to [email protected]. I'm on California time. |
obsolete issues. |
Anyone interested in helping implement test-time augmentation (multi-scale testing, flipping)?
This is a followup to #491: seems like TTA is an easy way to boost AP (although I don't know how much helpful it is in real products). If you are interested, feel free to assign it to yourself!
The text was updated successfully, but these errors were encountered: