Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to suggest a wake word? Leave your thoughts here. (AIS-1441) #88

Open
feizi opened this issue Dec 14, 2023 · 124 comments
Open

Want to suggest a wake word? Leave your thoughts here. (AIS-1441) #88

feizi opened this issue Dec 14, 2023 · 124 comments

Comments

@feizi
Copy link
Collaborator

feizi commented Dec 14, 2023

Hi all,

We're excited to offer the community more free and high-quality wake word models. Everyone has their own unique wake word preferences. Now, we're ready to regularly release some of the most popular wake words. Please let us know the wake words you want! English and Chinese are both welcome.

In the past, it was an expensive process to collect high-quality human speech data. But now, our team has developed a cost-effective way to train wake word models by using only TTS samples, which reaches 90-95% accuracy compared to models trained by human-recorded samples.

The wake word models and esp-sr have the same license and are free for commercial use. If you want a more accurate and exclusive wake word, please use our wake word customization service.

Currently, we support over 20 wake words. You can choose any one wake word to test. Starting from August 1, 2024, to get a new wake word, you'll need to meet one of these requirements:

  • If you've got an ongoing project, kindly attach the project link along with a brief overview when submitting your request.
  • Your wake word has been liked or upvoted by more than five people.

We are preparing to upgrade to a new TTS model and generate some wake word models with better performance.

@feizi feizi pinned this issue Dec 14, 2023
@github-actions github-actions bot changed the title Want to suggest a wake word? Leave your thoughts here. Want to suggest a wake word? Leave your thoughts here. (AIS-1441) Dec 14, 2023
@kristiankielhofner
Copy link

The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.

Thank you very much for offering this option, it's very exciting!

@feizi
Copy link
Collaborator Author

feizi commented Dec 14, 2023

The Willow team and community would love "Hey Willow". It's our domain name because we've been waiting for this.

Thank you very much for offering this option, it's very exciting!

I'm glad you like this. Since "hey" and "hi" sound pretty similar, sometimes people might not really notice the difference. So, I was thinking, maybe we could support both "hey willow" and "hi willow" for waking up the device. That way, whether you say "hey willow" or "hi willow", it'll still work. Of course, when we release the wake word model, we'll call it like "wn9_heywillow". What do you think about that?

@kristiankielhofner
Copy link

Good idea!

My only concern would be overall reduced accuracy (wake reliability vs false wake). We've noticed quite a bit of false wake with Alexa. From what I've read the automated TTS approach has 90-95% the accuracy of the models trained on human samples. I like "two word" wake words because they tend to improve accuracy, I suspect a 100% "Hey Willow" wake word could result in equivalent or even improved accuracy with the TTS approach vs even human sample trained Alexa?

Of course we could always test this, even starting with a pure "Hey Willow" model, a pure "Hi Willow" model, and a merged model.

Thanks again for offering this!

@feizi
Copy link
Collaborator Author

feizi commented Dec 14, 2023

Your concern may indeed happen. We will generate two words and test which model performs better.

@feizi
Copy link
Collaborator Author

feizi commented Dec 28, 2023

"hey/hi willow" model:
Model name: wn9_heywillow_tts
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 88%

Test dataset description:
The FAR dataset: This dataset contains a total of 64 hours of audio data, which includes audio collected from the internet and audio recorded using esp32-korvo boards.
The RAR dataset: This dataset is generated by multiple commercial TTS APIs, with a total of approximately 500 samples. These data and models were not used in the training process. However, due to the differences between TTS samples and human samples, please exercise caution when referring to the test results.

@AigizK
Copy link

AigizK commented Dec 29, 2023

Guys, what you are doing is really great. We have created a smart speaker called Homai based on the esp32-s3. We trained the model ourselves, but it is resource-intensive and not so easy to integrate into the pipeline. Could you please add support for our word Homai [ho'mai]? Thank you in advance!

@sun-xiangyu
Copy link
Contributor

Hi @AigizK ,
The syllable of Homai only has two. It is difficult to reduce the probability of false triggering for monosyllabic and disyllabic phrases. We recommend selecting a 3-5 syllable phrase as the wake word.

@AigizK
Copy link

AigizK commented Jan 3, 2024

Hi @sun-xiangyu
We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?

@sun-xiangyu
Copy link
Contributor

We have already launched a project with this name, so we can't change it significantly. But can we use the variant "homa ai", where the sound 'A' is pronounced long?

I'm sorry that our TTS model cannot specify a syllable to extend its pronunciation at the moment. This means that we cannot generate a large number of accurate “homa ai” phrases.

@PrathamG
Copy link

PrathamG commented Jan 9, 2024

Hi! Thank you for this awesome solution! We are developing a smart voice assistant called Sophia. Would it be possible to have the wake word "Hi Sophia"? This would help our user experience drastically. Thank you in advance!

@sun-xiangyu
Copy link
Contributor

Hi @PrathamG , I'm glad you like it. "Sophia" sounds like a wake word that can be used directly. I mean, maybe we don't need an extra prefix "Hi". I suggest we start with just "Sophia". If the performance is not satisfactory, then we can train another one with "hi Sophia". What do you think?

@PrathamG
Copy link

PrathamG commented Jan 9, 2024

Sure, that sounds like a good plan! We can use only "Sophia" and test the performance first. Thank you

@PrathamG
Copy link

PrathamG commented Jan 9, 2024

If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.

@sun-xiangyu
Copy link
Contributor

sun-xiangyu commented Jan 10, 2024

If possible, I also wanted to request the wake word "Little Sophia". We are still unsure about which wake word to use, and having both options will help us determine this via user testing.

Now our computing resources are limited. This project can generate about two wake word models in a month. So we will choose some popular wake words. Of course, if we have some free time, "Little Sophia" is also fine.

@PrathamG
Copy link

No worries, totally understandable! Looking forward to testing out the "Sophia" wake word

@sun-xiangyu
Copy link
Contributor

"Sophia" model: wn9_sophia_tts

FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 97%

@xygh
Copy link

xygh commented Jan 22, 2024

“小美” or “小美同学” would be a perfect choice. It will suit a lot of use case. We all want wake word like a human name.

@sun-xiangyu
Copy link
Contributor

@xygh, “小美同学” sounds good.

@PrathamG
Copy link

"Sophia" model: wn9_sophia_tts

FAR(False Alarm Rate): 1 times / 8 hours RAR(Right Alarm Rate): 97%

Thank you! We will test it out and report the results by next week

@xygh
Copy link

xygh commented Jan 23, 2024

@xygh, “小美同学” sounds good.

BTW, “你好小美” is also a perfect choice.

@Henry586
Copy link

"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!

@sun-xiangyu
Copy link
Contributor

The second version "Sophia":
model info: wakenet9l_tts1h8v2_Sophia_3_0.647_0.649

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

Improvement:
Add "Sophie" and "Sophy" as hard negatives to reduce false triggers.

@sun-xiangyu
Copy link
Contributor

"小当家" or "Hi 小星" is preferable wake word in our scenario. Thanks a lot!

Both of these words sound good. If you have no preference, we will choose "hi 小星".

@feizi
Copy link
Collaborator Author

feizi commented Jan 30, 2024

"小美同学"
model info: wakenet9l_tts1h8_小美同学_3_0.633_0.644

FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 95%

@lewardo
Copy link

lewardo commented Feb 11, 2024

Hello! This is a great opportunity I was hoping would come up, I'm so glad this is now possible! I've seen that the wake-words "Mycroft" and "Hey, Mycroft" are very popular in the community, and it is also the name of my product so would very much improve user experience. Would it be possible to have either of these trained and released for the community? Thank you so much in advance for this!

@sun-xiangyu
Copy link
Contributor

@lewardo, I'm glad it could help you. Although "Mycroft" is simpler, it seems there are quite a few words that sound similar, so I'll prioritize training with "Hey Mycroft."

@sun-xiangyu
Copy link
Contributor

Is it possible for esp-sr or esp-skainet or esp-adf to provide an interface or process for custom wake-up word functionality? For example, by users recording the wake-up word and training a model via a training script deployed in the cloud, then updating it to the device.

As far as I know, not yet. Before I knew about esp-adf and esp-sr, I used custom wakeword using a model created by python + tensorflow, then quantized the model and used tflite (tensorflow for microcontrollers). The model ran successfully, but maybe due to optimization issues, it consumed a lot of ram and memory (you can run the model on any microcontroller this way, of course if you have enough resources). I probably would have continued to optimize using tflite until I discovered that ESP provides ESP-DL, which also allows model deployment with hardware support. You can find out from what I say.

Deployment in the cloud is a bit difficult for us and is not in our plans. I think esp-dl might be a solution, if you want to deploy a model of your own, I recommend you to use it.
Good news, we are refactoring esp-dl so that esp-dl can directly load quantized models, just like you use onnx and pytorch.

@mike-2020
Copy link

Looks like there is already a wake-up word in English: Hey,Wand. Can we have a Chinese version? e.g. 神奇魔仗

@chipmunk000
Copy link

We are using Espressif's ESP32S3 chip to create a small wizard dialogue toy that can provide great emotional value and companionship.
We look forward to your help in training the following wake words.
“Hi,小巫”

@feizi
Copy link
Collaborator Author

feizi commented Sep 29, 2024

We are using Espressif's ESP32S3 chip to create a small wizard dialogue toy that can provide great emotional value and companionship. We look forward to your help in training the following wake words. “Hi,小巫”

Sounds great, I'm happy to help train a "Hi,小巫" wake word.

@earp123
Copy link

earp123 commented Oct 2, 2024

We are working on a patient-side voice assistant for the healthcare space. We desperately need help training the English branded wake word, "Hey, Henry". We are currently testing with the ESP-BOX-S3. Many thanks in advance.

@SleepInfinity
Copy link

sudo or hey sudo would be cool

@sun-xiangyu
Copy link
Contributor

@chipmunk000

Hi,小巫: wakenet9l_tts2h8_Hi,小巫_3_0.639_0.642

Perfromace:
FAR(False Alarm Rate): 1 times / 8 hours
RAR(Right Alarm Rate): 97%

This is the first model trained by TTS V2.0 pipeline.

@ayuusweetfish
Copy link

Hello! We are working on a duck-shaped interactive installation, deployed in the snow landscape, responding to pedestrians' voice with a glowing light. The project will be open-source; we have completed most of the work and will be publishing details on GitHub and Hackaday quite soon (I will update this comment with the links then).

We would really appreciate the addition of the wake word "小鸭小鸭" (xiǎo yā xiǎo yā). We are also willing to help if there is anything we can do to bring it to life before snowfall ^ ^

@sun-xiangyu
Copy link
Contributor

@ayuusweetfish
Your project sounds interesting, I'd be happy to help train a "小鸭小鸭" wake word.

@Spartan859
Copy link

Spartan859 commented Dec 7, 2024

Hello there. We are working on an open-source cosplay prop called RinaChanBoard that supports showing expressions on a led-pixel-based screen. We've implemented an app with functions like BLE control, video-playing, music-with-expressions, voice-control and so on. However, the voice-control part is not flexible enough, so we're planning to migrate to ESP-SR.
The project has earned quite a few fans, and supporting details are in the links below:

Videos

【天王寺科技】璃奈板的制作过程
璃奈板-声控测试
璃奈板-小程序测试(iOS可用)
璃奈板-音乐同步测试
These videos are uploaded by my friend ZhaTi (also a partner in this project), and he will give a thumbs up for this comment.

Source Code

https://github.com/Spartan859/RinaChanBoard (This is an archived version. As we've encountered plagiarism for commercial usage, the newest version is currently private. I can provide permissions on your request.)

WakeNet Models

We would really appreciate if you can train models as followings:

  • Chinese: 璃奈板
  • Japanese (I'm not sure if your TTS model supports this): りなちゃんボード
  • On conditions that Japanese is not available, English substitute is also ok. During my tests, I found that "LinaJonBordol" best simulates the Japanese pronunciations.

Thanks for all your efforts in publicizing ESP-SR, which provides me hope on making progress on my project.

@sun-xiangyu
Copy link
Contributor

sun-xiangyu commented Dec 9, 2024

@Spartan859 ,
Yes, our TTS model does not yet support Japanese. It will be difficult for Japanese to wake up device by "LinaJonBordol".
What do you think of the wake word "璃奈酱(りなちゃん/LinaJon)"? This phrase is simple and relatively consistent across Chinese, Japanese, and English.

@Spartan859
Copy link

Spartan859 commented Dec 9, 2024

@Spartan859 , Yes, our TTS model does not yet support Japanese. It will be difficult for Japanese to wake up device by "LinaJonBordol". What do you think of the wake word "璃奈酱(りなちゃん/LinaJon)"? This phrase is simple and relatively consistent across Chinese, Japanese, and English.

@sun-xiangyu
Thanks for your advice. I would like to confirm one thing: if I speak りなちゃんボード/LinaJonBordol, and the wake word is りなちゃん/LinaJon, will WakeNet be activated? In brief, does WakeNet recognize the substring of a phrase?

If true, then we would happily accept りなちゃん/LinaJon as the wake word. Otherwise, just use 璃奈板 as the wake word, and we would give up using the Japanese one.

We're grateful for your help!

@sun-xiangyu
Copy link
Contributor

The performance of sub-phrase is unpredictable, which means it may be difficult to wake up. According to your requirements, it is best to choose 璃奈板 as the wake word.

@Spartan859
Copy link

The performance of sub-phrase is unpredictable, which means it may be difficult to wake up. According to your requirements, it is best to choose 璃奈板 as the wake word.

@sun-xiangyu
Thanks. Looking forward to the release of the WakeNet model for 璃奈板.

@sun-xiangyu
Copy link
Contributor

@ayuusweetfish
小鸭小鸭: wakenet9_tts2h12_小鸭小鸭_3_0.595_0.600

Perfromace:
FAR(False Alarm Rate): 1 times / 12 hours
RAR(Right Alarm Rate): 97%

@ayuusweetfish
Copy link

@ayuusweetfish 小鸭小鸭: wakenet9_tts2h12_小鸭小鸭_3_0.595_0.600

Perfromace: FAR(False Alarm Rate): 1 times / 12 hours RAR(Right Alarm Rate): 97%

Thank you! It works wonderfully!

I have made the project repository public: ayuusweetfish/Yun-Ying-Ya. It is still WIP; I will make sure to post more details over the next few weeks ^ ^ Thank you for your quack help!

@PoohWoah
Copy link

小鸭小鸭: wakenet9_tts2h12_小鸭小鸭_3_0.595_0.600

执行率: FAR(误报率):1 次 / 12 小时 RAR(正确警报率):97%

请问下我该去哪里下载这个唤醒词?

@ayuusweetfish
Copy link

@PoohWoah

当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ 这个位置替换掉了原本的内容,然后在设置里选择 himiaomiao 唤醒词就可以了。

In this folder at the current head commit. The new version has not been published on the Registry, so I temporarily placed these files in the managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ directory at the project root, replacing existing content, and then selected the himiaomiao wake word in the configuration.

@sun-xiangyu
Copy link
Contributor

sun-xiangyu commented Dec 16, 2024

@PoohWoah

当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ 这个位置替换掉了原本的内容,然后在设置里选择 himiaomiao 唤醒词就可以了。

In this folder at the current head commit. The new version has not been published on the Registry, so I temporarily placed these files in the managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ directory at the project root, replacing existing content, and then selected the himiaomiao wake word in the configuration.

Yes, as @ayuusweetfish mentioned, you can find the wake word model you want in wakenet_model folder, then overwrite the model you were previously using, and it will be ready to use.

@sun-xiangyu
Copy link
Contributor

@Spartan859
璃奈板: wakenet9l_tts2h12_Linaiban_3_0.635_0.640

Perfromace:
FAR(False Alarm Rate): 1 times / 12 hours
RAR(Right Alarm Rate): 95%

@PoohWoah
Copy link

@PoohWoah

当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 这个位置替换掉了原本的内容,然后在设置里选择 唤醒词就可以了。managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/``himiaomiao

在此文件夹中的当前 head 提交。新版本还没有在 Registry 上发布,所以我临时把这些文件放在了项目根目录下的目录下,替换了已有的内容,然后在配置中选择了唤醒词。managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/``himiaomiao

谢谢你的回复

@PoohWoah
Copy link

@PoohWoah
当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 这个位置替换掉了原本的内容,然后在设置里选择 唤醒词就可以了。managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/``himiaomiao
在此文件夹中的当前 head 提交。新版本还没有在 Registry 上发布,所以我临时把这些文件放在了项目根目录下的目录下,替换了已有的内容,然后在配置中选择了唤醒词。managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/``himiaomiao

是的,如前所述,您可以在 wakenet_model 文件夹中找到所需的唤醒词模型,然后覆盖您之前使用的模型,它就可以使用了。

明白了 谢谢

@PoohWoah
Copy link

@PoohWoah
当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ 这个位置替换掉了原本的内容,然后在设置里选择 himiaomiao 唤醒词就可以了。
In this folder at the current head commit. The new version has not been published on the Registry, so I temporarily placed these files in the managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ directory at the project root, replacing existing content, and then selected the himiaomiao wake word in the configuration.

Yes, as @ayuusweetfish mentioned, you can find the wake word model you want in wakenet_model folder, then overwrite the model you were previously using, and it will be ready to use.

请问下您这个唤醒词支持adf里面替换吗,目前有用到adf的唤醒

@sun-xiangyu
Copy link
Contributor

@PoohWoah
当前头部提交的这个目录。Registry 上还没有发布新版本,所以我暂时把这里的几个文件放进工程根目录下 managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ 这个位置替换掉了原本的内容,然后在设置里选择 himiaomiao 唤醒词就可以了。
In this folder at the current head commit. The new version has not been published on the Registry, so I temporarily placed these files in the managed_components/espressif__esp-sr/model/wakenet_model/wn9_himiaomiao_tts/ directory at the project root, replacing existing content, and then selected the himiaomiao wake word in the configuration.

Yes, as @ayuusweetfish mentioned, you can find the wake word model you want in wakenet_model folder, then overwrite the model you were previously using, and it will be ready to use.

请问下您这个唤醒词支持adf里面替换吗,目前有用到adf的唤醒

当然可以,adf 也是用esp-sr进行唤醒

@l137295
Copy link

l137295 commented Dec 31, 2024

期待您帮助训练以下唤醒词。
“Hi,清风”
清风有清风徐来意思是微风轻轻地、缓慢地吹来,形容风势轻柔、舒缓,蕴含宁静、舒适及大自然的温柔和谐,愿我们每个开发的项目都能如浴清风,也能贴近大自然的温柔和谐。

@sun-xiangyu
Copy link
Contributor

“Hi,春风“ 也不错
遇事不决,可问春风

@l137295
Copy link

l137295 commented Dec 31, 2024

“嗨,春风”也不错 遇事不决,可问春风

哈哈,让我想起了《剑来》,不错,就是在冬天的时候喊,怪怪的。

@caseylai
Copy link

caseylai commented Jan 9, 2025

您好,可否帮忙训练一个叫“小酥肉”的唤醒词?我正在用ESP32S3开发一个面向儿童、学生的语音助手(也支持成人使用),已经接近完成,问了下大家都非常喜欢和期待“小酥肉”这个名称,如果可以使用这个名称,会对提升产品效果有很大的帮助。非常感谢~~ :)

@sun-xiangyu
Copy link
Contributor

您好,可否帮忙训练一个叫“小酥肉”的唤醒词?我正在用ESP32S3开发一个面向儿童、学生的语音助手(也支持成人使用),已经接近完成,问了下大家都非常喜欢和期待“小酥肉”这个名称,如果可以使用这个名称,会对提升产品效果有很大的帮助。非常感谢~~ :)

Although we have implemented some optimizations, children's voices is still a challenge to our current TTS wake word model.

@caseylai
Copy link

哦,补充下,不是那种很小的小孩子。一般是小学五六年级和初高中学生,说话连贯性和准确度都类似成人了,我觉得可以用成人的数据。另外,目前调研了一下,也就是用esp-sr的方案最好,用其他方案都会有一些受限于算力和能耗方面的问题。如果可以的话,请帮忙训练一个吧,期待~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests