Pushing The Limits Of AI With NExT GPT: An Any-Any Multimodal Large Language Model

The next generation of Large Language Models (LLM) can take input in any modality and generate content in any modality. According to NExT, a collaboration between the National University of Singapore and Tsinghua University focused on data analytics and artificial intelligence (AI).

In a blog post uploaded by the initiative, researchers of the project claim that this new model of LLM will be able to “perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio”.

“By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities.”, the blog states.

The currently popular generative AI tools can produce results in response to one type of input, such as text, audio, video, etc.., at a time. However, the researchers at NExT are looking to make an AI tool capable of understanding universal modalities, which will enhance human-computer interactions and make them more natural.

NExT-GPT is not fully operational yet. But there are demos of its capabilities up on GitHub.

Pushing The Limits Of AI With NExT GPT: An Any-Any Multimodal Large Language Model

Most Read

Pushing The Limits Of AI With NExT GPT: An Any-Any Multimodal Large Language Model

Aethir Unveils Its First Decentralized AI Node Sale

Google Announces “World’s First” Deal To Purchase Nuclear Energy To Power Its AI Ambitions

21Shares Files For Spot XRP ETF, Following Bitcoin Success

Private Equity Giant Blackstone Sets Sights On European Wealth Expansion

Subscribe To Our Newsletter

Follow The Distributed