Multi-Modal AI Is a UX Problem

Sep 28, 2023 ·

Transformers and other AI breakthroughs have shown state-of-the-art performance across different modalities

Text-to-Text (OpenAI ChatGPT)
Text-to-Image (Stable Diffusion)
Image-to-Text (Open AI CLIP)
Speech-to-Text (OpenAI Whisper)
Text-to-Speech (Meta’s Massively Multilingual Speech)
Image-to-Image (img2img or pix2pix)
Text-to-Audio (Meta MusicGen)
Text-to-Code (OpenAI Codex / GitHub Copilot)
Code-to-Text (ChatGPT, etc.)

The next frontier in AI is combining these mo

Read more here: External Link