Multi-Modal AI Is a UX Problem
Transformers and other AI breakthroughs have shown state-of-the-art performance across different modalities
- Text-to-Text (OpenAI ChatGPT)
- Text-to-Image (Stable Diffusion)
- Image-to-Text (Open AI CLIP)
- Speech-to-Text (OpenAI Whisper)
- Text-to-Speech (Meta’s Massively Multilingual Speech)
- Image-to-Image (img2img or pix2pix)
- Text-to-Audio (Meta MusicGen)
- Text-to-Code (OpenAI Codex / GitHub Copilot)
- Code-to-Text (ChatGPT, etc.)
The next frontier in AI is combining these mo
Read more here: External Link