Detection - Grounding DINO

Run a detection task with text prompts for bbox or mask.

Supported models:
  • Grounding-Dino-1

  • Grounding-Dino-1.5-Edge

  • Grounding-Dino-1.5-Pro

  • Grounding-Dino-1.6-Edge

  • Grounding-Dino-1.6-Pro

Usage Pattern

First of all, make sure you have installed this SDK by pip:

pip install dds-cloudapi-sdk

Then trigger the algorithm through DetectionTask:

from dds_cloudapi_sdk import Config
from dds_cloudapi_sdk import Client
from dds_cloudapi_sdk import DetectionTask
from dds_cloudapi_sdk import TextPrompt
from dds_cloudapi_sdk import DetectionModel
from dds_cloudapi_sdk import DetectionTarget

# Step 1: initialize the config
token = "Your API token here"
config = Config(token)

# Step 2: initialize the client
client = Client(config)

# Step 3: run the task by DetectionTask class
image_url = "https://algosplt.oss-cn-shenzhen.aliyuncs.com/test_files/tasks/detection/iron_man.jpg"
# if you are processing local image file, upload them to DDS server to get the image url
# image_url = client.upload_file("/path/to/your/prompt/image.png")

task = DetectionTask(
    image_url=image_url,
    prompts=[TextPrompt(text="iron man")],
    targets=[DetectionTarget.Mask, DetectionTarget.BBox],  # detect both bbox and mask
    model=DetectionModel.GDino1_5_Pro,  # detect with GroundingDino-1.5-Pro model
)

client.run_task(task)
result = task.result

print(result.mask_url)

objects = result.objects  # the list of detected objects
for idx, obj in enumerate(objects):
    print(obj.score)  # 0.42

    print(obj.category)  # "iron man"

    print(obj.bbox)  # [635.0, 458.0, 704.0, 508.0]

    print(obj.mask.counts)  # RLE compressed to string, ]o`f08fa14M3L2O2M2O1O1O1O1N2O1N2O1N2N3M2O3L3M3N2M2N3N1N2O...

    # convert the RLE format to RGBA image
    mask_image = task.rle2rgba(obj.mask)
    print(mask_image.size)  # (1600, 1170)

    # save the image to file
    mask_image.save(f"data/mask_{idx}.png")

    break

API Reference

class DetectionTask(image_url, prompts, targets, model)[source]

Trigger a detection task.

Parameters:
property result: TaskResult

Get the formatted TaskResult object.

rle2rgba(mask_obj)[source]

Convert the compressed RLE string of mask object to png image object.

Parameters:

mask_obj (DetectionObjectMask) – The Mask object detected by this task

Return type:

Image

class DetectionModel(value)[source]

An enumeration.

GDino1 = 'GroundingDino-1'
GDino1_5_Edge = 'GroundingDino-1.5-Edge'
GDino1_5_Pro = 'GroundingDino-1.5-Pro'
GDino1_6_Edge = 'GroundingDino-1.6-Edge'
GDino1_6_Pro = 'GroundingDino-1.6-Pro'
class DetectionTarget(value)[source]

An enumeration.

BBox = 'bbox'
Mask = 'mask'
class TextPrompt(*, text, is_positive=True)[source]

A text prompt.

Parameters:
  • text (str) – the str content of the prompt

  • is_positive (bool) – whether the prompt is positive, default to True

text: str

the str content of the prompt

is_positive: bool

whether the prompt is positive, default to True

property type

constant string ‘text’ for TextPrompt.

class TaskResult(*, mask_url=None, objects=[])[source]

The task result of detection task.

Parameters:
  • mask_url (str | None) – an image url with all objects’ mask drawn on

  • objects (List[DetectionObject]) – a list of detected objects of DetectionObject

class DetectionObject(*, score, category, bbox=None, mask=None)[source]

The object detected by detection task.

Parameters:
  • score (float) – the prediction score

  • bbox (List[float]) – the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

  • mask (DetectionObjectMask | None) – the detected Mask object

  • category (str)

category: str

the category of the object

bbox: List[float]

the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

mask: DetectionObjectMask | None

the detected Mask object

class DetectionObjectMask(*, counts, size)[source]
The mask detected by detection task.
It’s a format borrow COCO which compressing the mask image array in RLE format.
You can restore it back to a png image array by DetectionTask.rle2rgba:
Parameters:
  • counts (str) – the compressed mask array in RLE format

  • size (Tuple[int, int]) – the 2d size of the array, (h, w)

counts: str

the compressed mask array in RLE format

size: Tuple[int, int]

the 2d size of the array, (h, w)