Google Cloud Speech-to-Text APIで音声認識機能を追加する

2025-03-24

テーマ：: ブログ

本記事では、Google Cloud Speech-to-Text APIをGoogle Apps Script (GAS)と組み合わせて、音声認識機能を実装する方法を紹介します。

Google Cloud Speech-to-Text APIの概要

Google Cloud Speech-to-Text APIは、音声をテキストに変換するための強力な機械学習サービスです。

このAPIは、多様な言語や方言に対応し、リアルタイムでの音声認識が可能です。

Speech-to-Textの機能

主な特徴は以下の通りです。

多言語対応: 100以上の言語と方言をサポート
リアルタイム認識: ストリーミング音声のリアルタイム処理が可能
カスタマイズ機能: 特定の単語や語句の認識精度を向上させるモデル適応機能
自動フォーマット: 句読点の自動挿入や数字の変換機能

Speech-to-Textの高度な機能

話者分離: 複数の話者を識別し、発話者ごとに文字起こしを行う機能
ノイズ耐性: 様々な環境でのノイズに対応
ドメイン特化モデル: 音声制御、電話通話、ビデオ転写など、用途に応じた最適化モデルを提供

Speech-to-Textの利用方法

Google Cloud Speech-to-Text APIは、3つの主要な方法で音声認識を実行します。

1. 同期認識

1分以内の音声データに適しています
音声データを送信し、処理が完了するまで待機します
結果はすぐに返されます

2. 非同期認識

最大480分の音声データに対応
長時間実行操作を開始し、定期的に結果を確認します
大量の音声データに適しています

3. ストリーミング認識

リアルタイムの音声入力に適しています
マイクからの入力など、ストリーミング音声を処理します

GASとの連携方法

Google Cloud Platformでプロジェクトを設定し、Speech-to-Text APIを有効にします
サービスアカウントを作成し、認証情報を取得します
GASでOAuthライブラリを使用してJWTトークンを生成します
APIリクエストを送信するコードを実装します

GASプロジェクトを作成

GASプロジェクトを作成し、以下のようなコードを実装します。

APIキーとフォルダIDを適切な値に置き換えます。

function transcribeAudio() {
  const folderId = 'あなたのフォルダID'; // 音声ファイルがあるフォルダID
  const folder = DriveApp.getFolderById(folderId);
  const files = folder.getFiles();
  
  while (files.hasNext()) {
    const file = files.next();
    if (file.getMimeType().indexOf('audio/') > -1) {
      const audioContent = Utilities.base64Encode(file.getBlob().getBytes());
      const transcript = callSpeechToTextAPI(audioContent);
      
      // 文字起こし結果を新しいドキュメントとして保存
      DocumentApp.create(file.getName() + ' - 文字起こし')
        .getBody()
        .appendParagraph(transcript);
    }
  }
}

function callSpeechToTextAPI(audioContent) {
  const apiKey = 'あなたのAPIキー';
  const url = 'https://speech.googleapis.com/v1/speech:recognize?key=' + apiKey;
  
  const payload = {
    config: {
      encoding: 'LINEAR16',
      sampleRateHertz: 16000,
      languageCode: 'ja-JP'
    },
    audio: {
      content: audioContent
    }
  };
  
  const options = {
    method: 'post',
    contentType: 'application/json',
    payload: JSON.stringify(payload)
  };
  
  const response = UrlFetchApp.fetch(url, options);
  const result = JSON.parse(response.getContentText());
  return result.results[0].alternatives[0].transcript;
}

スクリプトを実行すると、指定したフォルダ内の音声ファイルが文字起こしされ、結果が新しいGoogle ドキュメントとして保存されます。

※APIの使用量制限と課金に注意してください。

Google Apps Script＆Google Cloud Speech-to-Text APIで音声認識機能を追加する | eguweb(エグウェブ)本記事では、Google Cloud Speech-to-Text APIをGoogle Apps Scrip

eguweb.jp

2025年 5月
日	月	火	水	木	金	土
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Google Cloud Speech-to-Text APIで音声認識機能を追加する

Google Cloud Speech-to-Text APIの概要

Speech-to-Textの機能

Speech-to-Textの高度な機能

Speech-to-Textの利用方法

GASとの連携方法

GASプロジェクトを作成

プロフィール

最新の記事

テーマ

月別

カレンダー

2025年 5月

このブログのフォロワー

2025年 5月
日	月	火	水	木	金	土
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

2025年 5月
日	月	火	水	木	金	土
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

2025年 5月
日	月	火	水	木	金	土
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31