Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help needed] Test-time augmentation (TTA) #503

Closed
mingxingtan opened this issue Jun 12, 2020 · 5 comments
Closed

[Help needed] Test-time augmentation (TTA) #503

mingxingtan opened this issue Jun 12, 2020 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@mingxingtan
Copy link
Member

mingxingtan commented Jun 12, 2020

Anyone interested in helping implement test-time augmentation (multi-scale testing, flipping)?

This is a followup to #491: seems like TTA is an easy way to boost AP (although I don't know how much helpful it is in real products). If you are interested, feel free to assign it to yourself!

@mingxingtan mingxingtan added the help wanted Extra attention is needed label Jun 12, 2020
@glenn-jocher
Copy link

glenn-jocher commented Jul 5, 2020

@mingxingtan was reviewing your repo and saw this. I might be able to help. I designed Ultralytics TTA strategy for yolov3 (and now https://github.com/ultralytics/yolov5) to increase mAP while minimizing the added extra FLOPS. For yolov5x we see a mAP increase from 48.4 to 50.0 after applying TTA, with about a 2-3X slowdown I think. I'll try to run the numbers and post here.

@glenn-jocher
Copy link

glenn-jocher commented Jul 5, 2020

Ok, here are the results. I've created a documentation issue on our yolov5 repo to help everyone understand also:
ultralytics/yolov5#303

Before You Start

Clone YOLOv5 repo and install requirements.txt dependencies, including Python>=3.7 and PyTorch>=1.5.

git clone https://github.com/ultralytics/yolov5 # clone repo
cd yolov5
pip install -r requirements.txt # install requirements.txt

Test Normally

This command tests YOLOv5x on COCO val2017 at image size 672 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

python test.py --weights yolov5x.pt --data coco.yaml --img 672

Output:

Namespace(augment=False, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=672, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 13153.65it/s]
               Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100% 157/157 [03:04<00:00,  1.17s/it]
                 all       5e+03    3.63e+04       0.426       0.746        0.66       0.469
Speed: 22.9/2.1/25.0 ms inference/NMS/total per 672x672 image at batch-size 32

COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.08s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=90.56s).
Accumulating evaluation results...
DONE (t=11.87s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.484
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.668
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.535
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.715
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.807

Test with TTA

Append --augment to any existing test.py command to enable TTA, and increases the image size by about 30% for improved results. Note that inference with TTA enabled will typically take about 3X the time of normal inference as the images are being left-right flipped and processed at 3 different resolutions, with the outputs merged before NMS.

python test.py --weights yolov5x.pt --data coco.yaml --img 832 --augment

Output:

Namespace(augment=True, batch_size=32, conf_thres=0.001, data='./data/coco.yaml', device='', img_size=832, iou_thres=0.65, merge=False, save_json=True, single_cls=False, task='val', verbose=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='Tesla P100-PCIE-16GB', total_memory=16280MB)

Model Summary: 407 layers, 8.89652e+07 parameters, 8.89652e+07 gradients
Fusing layers...
Model Summary: 284 layers, 8.89222e+07 parameters, 8.89222e+07 gradients
Caching labels ../coco/labels/val2017.npy (4952 found, 0 missing, 48 empty, 0 duplicate, for 5000 images): 100% 5000/5000 [00:00<00:00, 14939.35it/s]
               Class      Images     Targets           P           R      [email protected]  [email protected]:.95: 100% 157/157 [07:48<00:00,  2.98s/it]
                 all       5e+03    3.63e+04       0.313       0.794       0.671       0.483
Speed: 77.5/3.2/80.7 ms inference/NMS/total per 832x832 image at batch-size 32  < ---------- slower

COCO mAP with pycocotools... saving detections_val2017_yolov5x_results.json...
loading annotations into memory...
Done (t=0.43s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=110.23s).
Accumulating evaluation results...
DONE (t=14.43s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500  < ---------- increased AP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.678
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.546
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.336
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689  < ---------- increased AR
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.534
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.734
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826

@mingxingtan
Copy link
Member Author

@glenn-jocher Thanks for the information!

This is very nice: +1.2AP with about 3x slower. Could you share the main idea of this fast TTS? (Or is there any doc for it)?

@glenn-jocher
Copy link

@mingxingtan yes, +1.6AP actually :)

Honestly though, single-model inference improvements will probably always be better than TTA in terms of the +mAP per time or FLOP, but yes it is one final step that you can take to boost your single-model results. For EfficientDet it may boost it above 55.0 (!), but efficientdet may also benefit less than yolov5 because it already has 5 output maps rather than the 3 in yolov5, so it is already exploiting a wider range of multi-scale features than yolov5. It will see the same improvement from left-right flips though.

I don't have any documentation beyond the tutorials in the yolov5 repo. Lots of people have been asking for an arxiv paper, but I simply have not had time. I am aiming to get a paper out by the end of the year after some more experiments.

Maybe we could do a video call to discuss? You can email me or send a Google Calendar invite to [email protected]. I'm on California time.

@mingxingtan
Copy link
Member Author

obsolete issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants