Speech activity detector and recorder app

hayder78 · August 2, 2013, 8:11pm

Are there any android or Ios app that is capable of starting to record voice automatically with a
Voice Activity Detector (VAD) which Detects audio energy and triggers recording only when speech is present. There are a lot of apps which can trigger recording when there is sound, but it cannot differentiate between a true speech sound or other non-intelligent sound. I am interesting in an app that will trigger record only when human speech is being detected by the app.

Just like this standalone solution:
https://www.dialogic.com/~/media/products/docs/appnotes/10106_CSP_an.pdf

This will be awesome to archive my everyday life. Record just speech and transcribe it then store it in the cloud.

Am I asking too much?

I am ready to buy such solution even if it is a standalone small device if there is no handphone app doing this.

Dan_Dascalescu · August 3, 2013, 11:24pm

The new Moto X Android phone has this feature, and someone in the comments to that article mentioned that the Samsung Note II has an app or feature that does the same.

hayder78 · August 4, 2013, 1:27am

Wow! This article has been published 2 days ago. You are so up to date!

It is a step toward my goal (which is an important one) and it is impressive to put a hardware solution (SoC) inside the handphone to prevent battery draining. However, I wanted something that detect any speech. It seems that this feature needs a phrase to trigger the action “OK, Google Now,”. In the google glass the phrase is “OK glass,…”. So the hardware will simply trying to match whether this phrase has been spoken or not.

The general solution would be something that can be aware whether there is a speech (any speech) to trigger recording. It would be useful just like in the apps/devices that has sound activated recording and silence skipping which is already available in many recording apps. With the speech awareness, instead of silence skipping it would be “non-speaking” skipping.

But this step is an important step toward making hand phones smarter. I hope there is some flexibility in the hardware design which permits developing a software like what I am seeking.

I think that what I am asking is probably beyond the processing power of hand phones currently. A hardware for matching a single phrase spoken or not like “Ok glass,” is way easier than recognizing whether any speech has been spoken or not. I think they have designed the hardware so that it tries to match just this phrase that triggers a software program to analyze the subsequent spoken order (or even sending the whole subsequent phrase to the cloud servers to analyze it due to the limited processing power of the hand phone).

Probably, currently my hope should go to find a Pc sofware that can detect which audio segment contains speech and which are not , then cut those segments and put it in Dragon dictation to transcribe it. If I find a pc software like this, the handphone (or a portable voice recorder) should record 24/7 audio and the pc will do all the hard stuff. May be this is more reasonable for the current technology.

Anybody is aware of a pc software like this?

Many thanks for your valuable input Dan!