Whisper (speech recognition system)

Whisper (speech recognition system)
Original author(s)	OpenAI
Initial release	September 21, 2022
Repository	https://github.com/openai/whisper
Type	Transcription software; Encoder-decoder transformer; Foundation model; Acoustic model;

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.^[2]

It is capable of transcribing speech in English and several other languages,^[3] and is also capable of translating several non-English languages into English. OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.^[4]

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.^[5]

Whisper V2 was released on December 8, 2022.^[6] Whisper V3 was released in November 2023, on the OpenAI Dev Day.^[7]

^ Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].
^ Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.
^ Dickson, Ben (2022-10-03). "How will OpenAI's Whisper model impact AI applications?". VentureBeat. Archived from the original on 2023-03-15. Retrieved 2023-08-12.
^ Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.
^ Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". p. 3. arXiv:2212.04356 [eess.AS].
^ "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.
^ OpenAI DevDay: Opening Keynote, retrieved 2024-01-08

[paper-1] Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].

[2] Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.

[3] Dickson, Ben (2022-10-03). "How will OpenAI's Whisper model impact AI applications?". VentureBeat. Archived from the original on 2023-03-15. Retrieved 2023-08-12.

[4] Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.

[5] Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". p. 3. arXiv:2212.04356 [eess.AS].

[6] "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.

[7] OpenAI DevDay: Opening Keynote, retrieved 2024-01-08

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Original author(s)	OpenAI^[1]
Initial release	September 21, 2022
Repository	https://github.com/openai/whisper
Type	Transcription software Encoder-decoder transformer Foundation model Acoustic model

Whisper (speech recognition system)

From Wikipedia, the free encyclopedia · View on Wikipedia