"Along with the rapid development of our large language model Qwen, we leveraged Qwen’s capabilities and unified multimodal pretraining to address the limitations of multimodal models in generalization, and we opensourced multimodal model Qwen-VL in Sep. 2023. Recently, the Qwen-VL series has undergone a significant upgrade with the launch of two enhanced versions, Qwen-VL-Plus and Qwen-VL-Max. The key technical advancements in these versions include:\nSubstantially boost in image-related reasoning capabilities; Considerable enhancement in recognizing, extracting, and analyzing details within images and texts contained therein; Support for high-definition images with resolutions above one million pixels and images of various aspect ratios." # Description used for search engine.

Read more here: External Link