Multi-Modal AI Is a UX Problem

Transformers and other AI breakthroughs have shown state-of-the-art performance across different modalities

  • Text-to-Text (OpenAI ChatGPT)
  • Text-to-Image (Stable Diffusion)
  • Image-to-Text (Open AI CLIP)
  • Speech-to-Text (OpenAI Whisper)
  • Text-to-Speech (Meta’s Massively Multilingual Speech)
  • Image-to-Image (img2img or pix2pix)
  • Text-to-Audio (Meta MusicGen)
  • Text-to-Code (OpenAI Codex / GitHub Copilot)
  • Code-to-Text (ChatGPT, etc.)

The next frontier in AI is combining these mo

Read more here: External Link