Web Technology Experts Notes

Friday 27 July 2018

Google speech API with Punctuation and word timestamp in PHP

Question: What is Speech Recognition?
Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes 120 languages and variants.

Question: Does it provide Speech Recognition for real-time streaming?
Yes, Its provide for both real-time streaming or pre-recorded audio.

Question: What are the different languages for it provides speech to text?
https://cloud.google.com/speech-to-text/docs/languages

Question: Does it return immediately?
Cloud Speech-to-Text can stream text results, immediately returning text as it’s recognized from streaming audio or as the user is speaking.
For short audio (less than a min), It return results.

For Longer audio, Its take time (1min -20) depend on audio file length.

Question: What are different Models for Speech to Text?

command_and_search: Best for short queries such as voice commands or voice search.
phone_call: Best for audio that originated from a phone call .
video: Best for audio that originated from video or includes multiple speakers
default: Best for audio that is not one of the specific audio models.

You can set model in following way

$operation = $speech->beginRecognizeOperation(
   'gs://product_id/file_name.wav',
    array('model'=>'video')
);

Question: What are the pricing for speech to text?
https://cloud.google.com/speech-to-text/pricing

Question: How to set private key in the environment?

putenv('GOOGLE_APPLICATION_CREDENTIALS=priviate_key_json.json');

Question: How to get punctuation in results?
punctuation are the character which we use in written. for example comma(,), perios(.), question mark?.
You can get the punctuation in transcription results by providing enableWordTimeOffsets=true in the config.

for example:
You can set enableWordTimeOffsets in following way

$operation = $speech->beginRecognizeOperation(
   'gs://product_id/file_name.wav',
    array('enableWordTimeOffsets'=>true,'model'=>'video')
);

Question: What is office link for speech to text?
https://cloud.google.com/speech-to-text/

Question: What is Github link for speech to text?
https://github.com/GoogleCloudPlatform/google-cloud-php-speech

Question: How to install the SDK for speech to text?

php composer.phar require google/cloud-speech

Question: How to get the transcription for small audio file.?

$audioFilePath='gs://ar10w12018/1135-19114_720p.wav';//You can place your path here
$speech = new SpeechClient([
    'languageCode' => 'en-US'
]);

// Recognize the speech in an audio file.
$results = $speech->recognize(
   $audioFilePath,
);

foreach ($results as $result) {
    echo $result->topAlternative()['transcript'] . "\n";
}

Question: How to get the transcription for heavy audio file.?
for heavy files, make sure you have already uploaded to google cloud and you have path.

$audioFilePath='gs://ar10w12018/1135-19114_720p.wav';//You can place your path here
                $speech = new V1p1beta1\SpeechClient([
                    'languageCode' => 'en-US',
                ]);
                $config = new V1p1beta1\RecognitionConfig();
                $config->setLanguageCode('en-US');
                $config->setEnableWordTimeOffsets(true);
                $config->setEnableAutomaticPunctuation(true);
                $config->setModel('video');
                
                $audio = new V1p1beta1\RecognitionAudio();
                $audio->setUri($audioFilePath);
                $operation = $speech->longRunningRecognize($config, $audio);
                
              $result['operation']=$operation->getName();//operation name

You can get the results by using Operation Name.

$operationName='194790781308823479';
$speech = new SpeechClient([
    'languageCode' => 'en-US'
]);
$operation = $speech->operation($operationName);

$isComplete = $operation->isComplete();
if($isComplete){
$results=$operation->results();
}