Detection - Grounding DINO¶

Run a detection task with text prompts for bbox or mask.

Supported models:

Grounding-Dino-1
Grounding-Dino-1.5-Edge
Grounding-Dino-1.5-Pro
Grounding-Dino-1.6-Edge
Grounding-Dino-1.6-Pro

Usage Pattern¶

First of all, make sure you have installed this SDK by pip:

pip install dds-cloudapi-sdk

Then trigger the algorithm through DetectionTask:

from dds_cloudapi_sdk import Config
from dds_cloudapi_sdk import Client
from dds_cloudapi_sdk import DetectionTask
from dds_cloudapi_sdk import TextPrompt
from dds_cloudapi_sdk import DetectionModel
from dds_cloudapi_sdk import DetectionTarget

# Step 1: initialize the config
token = "Your API token here"
config = Config(token)

# Step 2: initialize the client
client = Client(config)

# Step 3: run the task by DetectionTask class
image_url = "https://algosplt.oss-cn-shenzhen.aliyuncs.com/test_files/tasks/detection/iron_man.jpg"
# if you are processing local image file, upload them to DDS server to get the image url
# image_url = client.upload_file("/path/to/your/prompt/image.png")

task = DetectionTask(
    image_url=image_url,
    prompts=[TextPrompt(text="iron man")],
    targets=[DetectionTarget.Mask, DetectionTarget.BBox],  # detect both bbox and mask
    model=DetectionModel.GDino1_5_Pro,  # detect with GroundingDino-1.5-Pro model
)

client.run_task(task)
result = task.result

print(result.mask_url)

objects = result.objects  # the list of detected objects
for idx, obj in enumerate(objects):
    print(obj.score)  # 0.42

    print(obj.category)  # "iron man"

    print(obj.bbox)  # [635.0, 458.0, 704.0, 508.0]

    print(obj.mask.counts)  # RLE compressed to string, ]o`f08fa14M3L2O2M2O1O1O1O1N2O1N2O1N2N3M2O3L3M3N2M2N3N1N2O...

    # convert the RLE format to RGBA image
    mask_image = task.rle2rgba(obj.mask)
    print(mask_image.size)  # (1600, 1170)

    # save the image to file
    mask_image.save(f"data/mask_{idx}.png")

    break

API Reference¶

class DetectionTask(image_url, prompts, targets, model, bbox_threshold=0.25, iou_threshold=0.8)[source]¶

Trigger a detection task.

Parameters:

image_url (str) – the image url for detection.
prompts (List[TextPrompt]) – list of TextPrompt.
targets (List[DetectionTarget]) – detection targets, list of DetectionTarget.
model (DetectionModel) – the model to be used for detection, supported models are enumerated by DetectionModel.
bbox_threshold (float) – the detection threshold for bbox
iou_threshold (float) – the detection threshold for iou

property result: TaskResult¶: Get the formatted TaskResult object.

rle2rgba(mask_obj)[source]¶

Convert the compressed RLE string of mask object to png image object.

Parameters:: mask_obj (DetectionObjectMask) – The Mask object detected by this task
Return type:: Image

class DetectionModel(value)[source]¶

An enumeration.

GDino1 = 'GroundingDino-1'¶

GDino1_5_Edge = 'GroundingDino-1.5-Edge'¶

GDino1_5_Pro = 'GroundingDino-1.5-Pro'¶

GDino1_6_Edge = 'GroundingDino-1.6-Edge'¶

GDino1_6_Pro = 'GroundingDino-1.6-Pro'¶

class DetectionTarget(value)[source]¶

An enumeration.

BBox = 'bbox'¶

Mask = 'mask'¶

class TextPrompt(*, text, is_positive=True)[source]¶

A text prompt.

Parameters:

text (str) – the str content of the prompt
is_positive (bool) – whether the prompt is positive, default to True

text: str¶: the str content of the prompt

is_positive: bool¶: whether the prompt is positive, default to True

property type¶: constant string ‘text’ for TextPrompt.

class TaskResult(*, mask_url=None, objects=[])[source]¶

The task result of detection task.

Parameters:

mask_url (str | None) – an image url with all objects’ mask drawn on
objects (List[DetectionObject]) – a list of detected objects of DetectionObject

class DetectionObject(*, score, category, bbox=None, mask=None)[source]¶

The object detected by detection task.

Parameters:

score (float) – the prediction score
bbox (List[float]) – the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
mask (DetectionObjectMask | None) – the detected Mask object
category (str)

category: str¶: the category of the object

bbox: List[float]¶: the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

mask: DetectionObjectMask | None¶: the detected Mask object

class DetectionObjectMask(*, counts, size)[source]¶

The mask detected by detection task.
It’s a format borrow COCO which compressing the mask image array in RLE format.
You can restore it back to a png image array by DetectionTask.rle2rgba:

Parameters:

counts (str) – the compressed mask array in RLE format
size (Tuple[int, int]) – the 2d size of the array, (h, w)

counts: str¶: the compressed mask array in RLE format

size: Tuple[int, int]¶: the 2d size of the array, (h, w)

Detection - Grounding DINO¶

Usage Pattern¶

API Reference¶

Contents

Navigation