IVP - Interactive Visual Prompt

Interactive Visual Prompting (IVP) is an interactive object detection and counting system based on the T-Rex model independently developed by the IDEA CVR team.

It enables object detection and counting through visual prompts without any training, truly realizing a single visual model applicable to multiple scenarios.

It particularly excels in counting objects in dense or overlapping scenes.

This algorithm is available in DDS CloudAPI SDK through IVPTask.

Usage Pattern

First of all, make sure you have installed this SDK by pip:

pip install dds-cloudapi-sdk

Then trigger the algorithm through IVPTask:

from dds_cloudapi_sdk import Config
from dds_cloudapi_sdk import Client
from dds_cloudapi_sdk import IVPTask
from dds_cloudapi_sdk import RectPrompt
from dds_cloudapi_sdk import LabelTypes

# Step 1: initialize the config
token = "Your API token here"
config = Config(token)

# Step 2: initialize the client
client = Client(config)

# Step 3: run the task by IVPTask class

prompt_image_url = "https://dds-frontend.oss-cn-shenzhen.aliyuncs.com/static_files/playground/grounded_sam/05.jpg"
# if you are processing local image file, upload them to DDS server to get the image url
# prompt_image_url = client.upload_file("/path/to/your/prompt/image.png")

# use the same image for inferring
infer_image_url = prompt_image_url

task = IVPTask(
    prompt_image_url=prompt_image_url,
    prompts=[RectPrompt(rect=[475.18413597733706, 550.1983002832861, 548.1019830028329, 599.915014164306])],
    infer_image_url=infer_image_url,
    infer_label_types=[LabelTypes.BBox, LabelTypes.Mask],  # detect both bbox and mask
)

client.run_task(task)
result = task.result

print(result.mask_url)

objects = result.objects  # the list of detected objects
for idx, obj in enumerate(objects):
    print(obj.score)  # 0.42

    print(obj.bbox)  # [635.0, 458.0, 704.0, 508.0]

    print(obj.mask.counts)  # RLE compressed to string, ]o`f08fa14M3L2O2M2O1O1O1O1N2O1N2O1N2N3M2O3L3M3N2M2N3N1N2O...

    # convert the RLE format to RGBA image
    mask_image = task.rle2rgba(obj.mask)
    print(mask_image.size)  # (1600, 1170)

    # save the image to file
    mask_image.save(f"data/mask_{idx}.png")

API Reference

class IVPTask(prompt_image_url, prompts, infer_image_url, infer_label_types)[source]

Trigger the Interactive Visual Prompting algorithm.

Parameters:
  • prompt_image_url (str) – the image the prompts are acting on.

  • prompts (List[RectPrompt]) – list of RectPrompt objects which are drawn on the prompt image.

  • infer_image_url (str) – the image to be inferred on.

  • infer_label_types (List[LabelTypes]) – list of target LabelTypes to return.

property result: TaskResult

Get the formatted TaskResult object.

rle2rgba(mask_obj)[source]

Convert the compressed RLE string of mask object to png image object.

Parameters:

mask_obj (IVPObjectMask) – The Mask object detected by this task

Return type:

Image

class RectPrompt(*, rect, is_positive=True)[source]

A rectangle prompt.

Parameters:
  • rect (List[float]) – the rect location in [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

  • is_positive (bool) – whether the prompt is positive, default to True

rect: List[float]

the rect location in [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

is_positive: bool

whether the prompt is positive, default to True

property type

constant string ‘rect’ for RectPrompt.

class TaskResult(*, mask_url=None, objects=[])[source]

The task result of IVP task.

Parameters:
  • mask_url (str) – an image url with all objects’ mask drawn on

  • objects (List[IVPObject]) – a list of detected objects of IVPObject

class IVPObject(*, score, bbox=None, mask=None)[source]

The object detected by IVP task.

Parameters:
  • score (float) – the prediction score

  • bbox (List[float]) – the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

  • mask (IVPObjectMask) – the detected Mask object

bbox: List[float]

the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]

mask: IVPObjectMask

the detected Mask object

class IVPObjectMask(*, counts, size)[source]
The mask detected by IVP task.
It’s a format borrow COCO which compressing the mask image array in RLE format.
You can restore it back to a png image array by IVPTask.rle2rgba:
Parameters:
  • counts (str) – the compressed mask array in RLE format

  • size (Tuple[int, int]) – the 2d size of the array, (h, w)

counts: str

the compressed mask array in RLE format

size: Tuple[int, int]

the 2d size of the array, (h, w)