Gartner, Inc. forecasts that 40% of generative AI (GenAI) solutions will be multimodal—handling text, images, audio, and video—by 2027, a sharp increase from just 1% in 2023. This transition from single to multimodal models is expected to enhance human-AI interaction and differentiate GenAI offerings.
Erick Brethenoux, VP Analyst at Gartner, highlighted the evolving GenAI market, stating, “As models are natively trained on multiple modalities, this will allow AI to capture relationships between various data streams, scaling GenAI’s benefits across applications and data types. It also enables AI to support more human tasks, irrespective of the environment.”
Multimodal GenAI was identified as one of the two key technologies in Gartner’s 2024 Hype Cycle for Generative AI, expected to offer competitive advantages and accelerate time-to-market. Alongside open-source large language models (LLMs), these technologies are predicted to have significant organizational impact over the next five years.
Additionally, Gartner noted two other high-potential technologies set to reach mainstream adoption within a decade—domain-specific GenAI models and autonomous agents.
Arun Chandrasekaran, VP Analyst at Gartner, explained, “The GenAI ecosystem is vast and evolving rapidly, making navigation complex for enterprises. While the technology currently sits in the ‘Trough of Disillusionment,’ real benefits will emerge post-hype, with substantial advances expected in the coming years.”
Multimodal GenAI: Transforming Enterprise Capabilities
Multimodal GenAI will play a transformative role in enterprise applications, introducing new functionalities that would otherwise be unattainable. This technology’s reach extends beyond specific industries or use cases, allowing applications wherever AI interacts with humans. Currently, many models are limited to two or three modalities, but Gartner expects these capabilities to expand rapidly.
“Humans process information through a combination of modalities, such as audio and visual cues,” said Brethenoux. “Multimodal GenAI is crucial because data is typically multimodal. Assembling single modality models can cause latency and reduce accuracy, leading to a lower-quality experience,” he added.
Open-Source LLMs: Democratizing AI Development
Open-source LLMs are foundational models that enhance enterprise value by democratizing access to GenAI and enabling developers to optimize models for specific tasks. This fosters innovation across industries, academic institutions, and research sectors.
“Open-source LLMs empower enterprises with more customization, better privacy control, and transparency, while reducing vendor lock-in,” Chandrasekaran noted. “They also help enterprises build smaller, more cost-effective models that are easier to train, supporting core business processes.”
Domain-Specific GenAI Models: Tailored to Industry Needs
Domain-specific GenAI models are designed to cater to specific industry tasks and functions, improving accuracy, security, and privacy. These models offer better contextualized responses and reduce the need for extensive prompt engineering. They are also less prone to hallucinations, thanks to targeted training.
“Organizations can achieve faster time-to-value and enhanced performance with domain-specific models, particularly for industry-specific use cases where general-purpose models fall short,” Chandrasekaran said.
Autonomous Agents: The Future of AI
Autonomous agents—systems capable of achieving defined goals without human intervention—use AI techniques to analyze patterns, make decisions, and generate actions. These agents can learn from their environment, improve over time, and manage complex tasks.
“Autonomous agents mark a significant leap in AI capabilities,” Brethenoux remarked. “Their decision-making abilities and independent operations can streamline business processes, elevate customer experiences, and create innovative products, driving cost savings and competitive advantage.”
Dubai Crown Prince Sheikh Hamdan Launches Ambitious Research Program