Skip to content

Latest commit

 

History

History
140 lines (111 loc) · 5.07 KB

File metadata and controls

140 lines (111 loc) · 5.07 KB

YOLOv3

Description

This model is a neural network for real-time object detection that detects 80 different classes. It is very fast and accurate.

Model

Model Download Download (with sample test data) ONNX version Opset version Accuracy
YOLOv3 237 MB 222 MB 1.5 10 mAP of 0.553
YOLOv3-12 237 MB 222 MB 1.9 12 mAP of 0.2874
YOLOv3-12-int8 60 MB 53 MB 1.9 12 mAP of 0.2693

Compared with the YOLOv3-12, YOLOv3-12-int8's mAP decline is 0.0181 and performance improvement is 2.19x.

Note the performance depends on the test hardware.

Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.


Inference

Input to model

Resized image (1x3x416x416) Original image size (1x2) which is [image.size[1], image.size[0]]

Preprocessing steps

The images have to be loaded in to a range of [0, 1]. The transformation should preferrably happen at preprocessing.

The following code shows how to preprocess a NCHW tensor:

import numpy as np
from PIL import Image

# this function is from yolo3.utils.letterbox_image
def letterbox_image(image, size):
    '''resize image with unchanged aspect ratio using padding'''
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

def preprocess(img):
    model_image_size = (416, 416)
    boxed_image = letterbox_image(img, tuple(reversed(model_image_size)))
    image_data = np.array(boxed_image, dtype='float32')
    image_data /= 255.
    image_data = np.transpose(image_data, [2, 0, 1])
    image_data = np.expand_dims(image_data, 0)
    return image_data

image = Image.open(img_path)
# input
image_data = preprocess(image)
image_size = np.array([image.size[1], image.size[0]], dtype=np.int32).reshape(1, 2)

Output of model

The model has 3 outputs. boxes: (1x'n_candidates'x4), the coordinates of all anchor boxes, scores: (1x80x'n_candidates'), the scores of all anchor boxes per class, indices: ('nbox'x3), selected indices from the boxes tensor. The selected index format is (batch_index, class_index, box_index). The class list is here

Postprocessing steps

Post processing and meaning of output

out_boxes, out_scores, out_classes = [], [], []
for idx_ in indices:
    out_classes.append(idx_[1])
    out_scores.append(scores[tuple(idx_)])
    idx_1 = (idx_[0], idx_[2])
    out_boxes.append(boxes[idx_1])

out_boxes, out_scores, out_classes are list of resulting boxes, scores, and classes.


Dataset (Train and validation)

We use pretrained weights from pjreddie.com here.


Validation accuracy

YOLOv3: Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data. mAP of 0.553 based on original Yolov3 model here

YOLOv3-12 & YOLOv3-12-int8: Metric is COCO box mAP@[IoU=0.50:0.95 | area=all | maxDets=100], computed over 2017 COCO val data.


Quantization

YOLOv3-12-int8 is obtained by quantizing YOLOv3-12 model. We use Intel® Neural Compressor with onnxruntime backend to perform quantization. View the instructions to understand how to use Intel® Neural Compressor for quantization.

Environment

onnx: 1.9.0 onnxruntime: 1.10.0

Prepare model

wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/yolov3/model/yolov3-12.onnx

Model quantize

bash run_tuning.sh --input_model=path/to/model \  # model path as *.onnx
                   --config=yolov3.yaml \
                   --data_path=path/to/COCO2017 \
                   --output_model=path/to/save

Publication/Attribution

Joseph Redmon, Ali Farhadi. YOLOv3: An Incremental Improvement, paper


References


Contributors


License

MIT License