Whisper (speech recognition system)

Whisper (speech recognition system)
Original author(s)OpenAI[1]
Initial releaseSeptember 21, 2022
Repositoryhttps://github.com/openai/whisper
Type

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.[2]

It is capable of transcribing speech in English and several other languages,[3] and is also capable of translating several non-English languages into English. OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.[4]

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.[5]

Whisper V2 was released on December 8, 2022.[6] Whisper V3 was released in November 2023, on the OpenAI Dev Day.[7]

  1. ^ Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].
  2. ^ Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.
  3. ^ Dickson, Ben (2022-10-03). "How will OpenAI's Whisper model impact AI applications?". VentureBeat. Archived from the original on 2023-03-15. Retrieved 2023-08-12.
  4. ^ Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.
  5. ^ Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". p. 3. arXiv:2212.04356 [eess.AS].
  6. ^ "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.
  7. ^ OpenAI DevDay: Opening Keynote, retrieved 2024-01-08

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by Tubidy