Abstract: Multimodal large language models (MLLMs) have demonstrated strong language understanding and generation capabilities, excelling in visual tasks like referring and grounding. However, due to ...
Abstract: Object detection models might not be as accurate as they could be when trained on particular datasets. Learning from the predictions of multiple object detection models can help improve ...