Articles SpeechRecognition Windows 10 API for plain Win32 by Michael Chourdakis

emailx45 · 27 Май 2020

SpeechRecognition Windows 10 API for plain Win32 by Michael Chourdakis
Michael Chourdakis - 27/May/2020

[SHOWTOGROUPS=4,20]
An one-function library to easy integrate Speech to Text in your Win32 applications

Code at github: Для просмотра ссылки Войди или Зарегистрируйся

Introduction
In the past years, lots of us, sound engineers, tried to create and improve speech recognition algorithms. Lots of training, neural networks, cepstrum, fourier, wavelets, that sort of life-consuming research. Windows Speech API would try to implement such algorithms with minor success.

Now that the internet has grown so much in capacity and speed that can hold and compare zillions of information, all those algorithms suddenly faded out in favor of network based voice recognition. Instead of local analysis, your voice is transmitted to a server which contains many, many samples and it is able to deduct, with great accuracy, your wording. Google is using that in Android already.

In Windows, we have the new Для просмотра ссылки Войди или Зарегистрируйся which, with a bit of code, can be used in plain Win32 applications. Here is an one-function library that will handle the details for you.

I have been using this in my big audio and video sequencer, Для просмотра ссылки Войди или Зарегистрируйся.

Using the library
The code exports just one function:

Код:

HRESULT __stdcall SpeechX1(void* ptr, SpeechX2 x2, const wchar_t* langx = L"en-us", int Mode = 0);
With Mode = 2, pass a std::vector<std::tuplestd::wstring,std::wstring>> as a ptr to get all languages supported.
Hide   Copy Code
std::vector<std::tuple<std::wstring, std::wstring>> sx;
SpeechX1((void*)&sx, 0, 0, 2);
for (auto& e : sx)
{
std::wcout << std::get<0>(e) << L" - " << std::get<1>(e) << std::endl;
}

The first tuple item is the display name of the language, the second is the code that you would pass again to the function later to initiate the speech recognition.

Once you have picked the language to use, call SpeechX1 again with mode = 0, ptr = a custom pointer to be passed to your callback. The third parameter is the picked language code and you will pass a callback:

Код:

HRESULT __stdcall MyCallback(void* ptr, const wchar_t* reco, int conf);

which is called on three occasions:
• periodically to confirm the status with reco = nullptr.
• with conf == -1 the recognition is pending hypothesis. Reco is the partial text recognized.
• with conf >= 0, the recognition is competed. Reco is the final text and the confidence parameter is from 0 to 3 (the lower, the better) to indicate the accuracy of the recognition.

Return S_OK to continue.

If you return an error, SpeechX1 returns and the speech recognition session is ended.

With mode==1 the library tests the specific voice recognition engine without returning results (instead, you will hear from your speakers a playback of the recognized voice).

The library is provided as both DLL and static and a command line testing tool is included.

Have fun with it!

History
27/5/2020 - First Relea

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

[/SHOWTOGROUPS]

Articles SpeechRecognition Windows 10 API for plain Win32 by Michael Chourdakis

emailx45

Похожие темы