Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've played quite a lot with all the whisper models up to "medium" size. Mostly via faster-whisper as original OpenAI whisper only seem to care for/optimize performance for GPU.

I would agree that the "tiny" model has a clear drop off in accuracy, not good enough for anything real (even if transcribing your own speech, the error rate means too much editing needed). In my experience, accuracy can be more of a problem on shorter sentences because there is less context to help it.

I think for serious use (on GPU) it would be the "medium" or "large" models only. There is now a "large-turbo" model which is apparently faster than "medium" (on GPU) but more accurate than "medium" - haven't tried it yet.

On CPU for personal use (faster-whisper, CPU) I have found "base" is usable, "small" is good. On a laptop CPU though "small" is slow for real time. "Medium" is more accurate, though mostly just on punctuation, far too slow for CPU. Of course all models will get some uncommon surnames, place names wrong.

Since OpenAI have re-released the "large" models twice and now done a "large-turbo" I hope that they will re-release the smaller models too so that the smallest models become more useful.

These moonshine models are compared to original OpenAI whisper, but really I'd say they need to compare to faster-whisper: multiple projects are faster than original OpenAI whisper.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: