Detection - Grounding DINO¶
Run a detection task with text prompts for bbox or mask.
- Supported models:
Grounding-Dino-1
Grounding-Dino-1.5-Edge
Grounding-Dino-1.5-Pro
Grounding-Dino-1.6-Edge
Grounding-Dino-1.6-Pro
Usage Pattern¶
First of all, make sure you have installed this SDK by pip:
pip install dds-cloudapi-sdk
Then trigger the algorithm through DetectionTask:
from dds_cloudapi_sdk import Config
from dds_cloudapi_sdk import Client
from dds_cloudapi_sdk import DetectionTask
from dds_cloudapi_sdk import TextPrompt
from dds_cloudapi_sdk import DetectionModel
from dds_cloudapi_sdk import DetectionTarget
# Step 1: initialize the config
token = "Your API token here"
config = Config(token)
# Step 2: initialize the client
client = Client(config)
# Step 3: run the task by DetectionTask class
image_url = "https://algosplt.oss-cn-shenzhen.aliyuncs.com/test_files/tasks/detection/iron_man.jpg"
# if you are processing local image file, upload them to DDS server to get the image url
# image_url = client.upload_file("/path/to/your/prompt/image.png")
task = DetectionTask(
image_url=image_url,
prompts=[TextPrompt(text="iron man")],
targets=[DetectionTarget.Mask, DetectionTarget.BBox], # detect both bbox and mask
model=DetectionModel.GDino1_5_Pro, # detect with GroundingDino-1.5-Pro model
)
client.run_task(task)
result = task.result
print(result.mask_url)
objects = result.objects # the list of detected objects
for idx, obj in enumerate(objects):
print(obj.score) # 0.42
print(obj.category) # "iron man"
print(obj.bbox) # [635.0, 458.0, 704.0, 508.0]
print(obj.mask.counts) # RLE compressed to string, ]o`f08fa14M3L2O2M2O1O1O1O1N2O1N2O1N2N3M2O3L3M3N2M2N3N1N2O...
# convert the RLE format to RGBA image
mask_image = task.rle2rgba(obj.mask)
print(mask_image.size) # (1600, 1170)
# save the image to file
mask_image.save(f"data/mask_{idx}.png")
break
API Reference¶
- class DetectionTask(image_url, prompts, targets, model)[source]¶
Trigger a detection task.
- Parameters:
image_url (str) – the image url for detection.
prompts (List[TextPrompt]) – list of
TextPrompt
.targets (List[DetectionTarget]) – detection targets, list of
DetectionTarget
.model (DetectionModel) – the model to be used for detection, supported models are enumerated by
DetectionModel
.
- property result: TaskResult¶
Get the formatted
TaskResult
object.
- rle2rgba(mask_obj)[source]¶
Convert the compressed RLE string of mask object to png image object.
- Parameters:
mask_obj (DetectionObjectMask) – The
Mask
object detected by this task- Return type:
Image
- class DetectionModel(value)[source]¶
An enumeration.
- GDino1 = 'GroundingDino-1'¶
- GDino1_5_Edge = 'GroundingDino-1.5-Edge'¶
- GDino1_5_Pro = 'GroundingDino-1.5-Pro'¶
- GDino1_6_Edge = 'GroundingDino-1.6-Edge'¶
- GDino1_6_Pro = 'GroundingDino-1.6-Pro'¶
- class TextPrompt(*, text, is_positive=True)[source]¶
A text prompt.
- Parameters:
text (str) – the str content of the prompt
is_positive (bool) – whether the prompt is positive, default to True
- text: str¶
the str content of the prompt
- is_positive: bool¶
whether the prompt is positive, default to True
- property type¶
constant string ‘text’ for TextPrompt.
- class TaskResult(*, mask_url=None, objects=[])[source]¶
The task result of detection task.
- Parameters:
mask_url (str | None) – an image url with all objects’ mask drawn on
objects (List[DetectionObject]) – a list of detected objects of
DetectionObject
- class DetectionObject(*, score, category, bbox=None, mask=None)[source]¶
The object detected by detection task.
- Parameters:
score (float) – the prediction score
bbox (List[float]) – the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
mask (DetectionObjectMask | None) – the detected
Mask
objectcategory (str)
- category: str¶
the category of the object
- bbox: List[float]¶
the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
- mask: DetectionObjectMask | None¶
the detected
Mask
object
- class DetectionObjectMask(*, counts, size)[source]¶
- The mask detected by detection task.It’s a format borrow COCO which compressing the mask image array in RLE format.You can restore it back to a png image array by
DetectionTask.rle2rgba
:- Parameters:
counts (str) – the compressed mask array in RLE format
size (Tuple[int, int]) – the 2d size of the array, (h, w)
- counts: str¶
the compressed mask array in RLE format
- size: Tuple[int, int]¶
the 2d size of the array, (h, w)