Microsoft takes on AI rivals with three new foundational models

Microsoft takes on AI rivals with three new foundational models

TechCrunch technology

Key Points:

  • Microsoft AI has launched three foundational multimodal AI models—MAI-Transcribe-1 for speech-to-text, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation—aiming to enhance its AI capabilities and compete with other AI labs.
  • MAI-Transcribe-1 supports transcription in 25 languages and is 2.5 times faster than Microsoft’s previous Azure Fast service; MAI-Voice-1 can generate 60 seconds of audio in one second and allows custom voice creation, while MAI-Image-2 was initially available on MAI Playground and now also on Microsoft Foundry.
  • These models were developed by Microsoft’s MAI Superintelligence team, led by CEO Mustafa Suleyman, who emphasized a human-centered approach to AI focused on practical communication and use cases.
  • Microsoft positions these models as cost-effective alternatives to offerings from Google and OpenAI, with pricing starting at $0.36 per hour for transcription, $22 per million characters for voice, and $5 to $33 per million tokens for image-related tasks.
  • Despite developing its own AI models, Microsoft maintains its partnership with OpenAI, with recent renegotiations enabling Microsoft to advance its superintelligence research independently while continuing collaboration.

Trending Business

Trending Technology

Trending Health