IVP - Interactive Visual Prompt¶
Interactive Visual Prompting (IVP) is an interactive object detection and counting system based on the T-Rex model independently developed by the IDEA CVR team.
It enables object detection and counting through visual prompts without any training, truly realizing a single visual model applicable to multiple scenarios.
It particularly excels in counting objects in dense or overlapping scenes.
This algorithm is available in DDS CloudAPI SDK through IVPTask.
Usage Pattern¶
First of all, make sure you have installed this SDK by pip:
pip install dds-cloudapi-sdk
Then trigger the algorithm through IVPTask:
from dds_cloudapi_sdk import Config
from dds_cloudapi_sdk import Client
from dds_cloudapi_sdk import IVPTask
from dds_cloudapi_sdk import RectPrompt
from dds_cloudapi_sdk import LabelTypes
# Step 1: initialize the config
token = "Your API token here"
config = Config(token)
# Step 2: initialize the client
client = Client(config)
# Step 3: run the task by IVPTask class
prompt_image_url = "https://dds-frontend.oss-cn-shenzhen.aliyuncs.com/static_files/playground/grounded_sam/05.jpg"
# if you are processing local image file, upload them to DDS server to get the image url
# prompt_image_url = client.upload_file("/path/to/your/prompt/image.png")
# use the same image for inferring
infer_image_url = prompt_image_url
task = IVPTask(
prompt_image_url=prompt_image_url,
prompts=[RectPrompt(rect=[475.18413597733706, 550.1983002832861, 548.1019830028329, 599.915014164306])],
infer_image_url=infer_image_url,
infer_label_types=[LabelTypes.BBox, LabelTypes.Mask], # detect both bbox and mask
)
client.run_task(task)
result = task.result
print(result.mask_url)
objects = result.objects # the list of detected objects
for idx, obj in enumerate(objects):
print(obj.score) # 0.42
print(obj.bbox) # [635.0, 458.0, 704.0, 508.0]
print(obj.mask.counts) # RLE compressed to string, ]o`f08fa14M3L2O2M2O1O1O1O1N2O1N2O1N2N3M2O3L3M3N2M2N3N1N2O...
# convert the RLE format to RGBA image
mask_image = task.rle2rgba(obj.mask)
print(mask_image.size) # (1600, 1170)
# save the image to file
mask_image.save(f"data/mask_{idx}.png")
API Reference¶
- class IVPTask(prompt_image_url, prompts, infer_image_url, infer_label_types)[source]¶
Trigger the Interactive Visual Prompting algorithm.
- Parameters:
prompt_image_url (str) – the image the prompts are acting on.
prompts (List[RectPrompt]) – list of
RectPrompt
objects which are drawn on the prompt image.infer_image_url (str) – the image to be inferred on.
infer_label_types (List[LabelTypes]) – list of target
LabelTypes
to return.
- property result: TaskResult¶
Get the formatted
TaskResult
object.
- rle2rgba(mask_obj)[source]¶
Convert the compressed RLE string of mask object to png image object.
- Parameters:
mask_obj (IVPObjectMask) – The
Mask
object detected by this task- Return type:
Image
- class RectPrompt(*, rect, is_positive=True)[source]¶
A rectangle prompt.
- Parameters:
rect (List[float]) – the rect location in [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
is_positive (bool) – whether the prompt is positive, default to True
- rect: List[float]¶
the rect location in [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
- is_positive: bool¶
whether the prompt is positive, default to True
- property type¶
constant string ‘rect’ for RectPrompt.
- class IVPObject(*, score, bbox=None, mask=None)[source]¶
The object detected by IVP task.
- Parameters:
score (float) – the prediction score
bbox (List[float]) – the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
mask (IVPObjectMask) – the detected
Mask
object
- bbox: List[float]¶
the bounding box, [upper_left_x, upper_left_y, lower_right_x, lower_right_y]
- mask: IVPObjectMask¶
the detected
Mask
object
- class IVPObjectMask(*, counts, size)[source]¶
- The mask detected by IVP task.It’s a format borrow COCO which compressing the mask image array in RLE format.You can restore it back to a png image array by
IVPTask.rle2rgba
:- Parameters:
counts (str) – the compressed mask array in RLE format
size (Tuple[int, int]) – the 2d size of the array, (h, w)
- counts: str¶
the compressed mask array in RLE format
- size: Tuple[int, int]¶
the 2d size of the array, (h, w)