Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoicePipelineAgent don't send the transcription automatically to client #236

Open
cuongpham-1001 opened this issue Dec 30, 2024 · 2 comments

Comments

@cuongpham-1001
Copy link

I tried to use VoicePipelineAgent but didn't receive the transcription from client when listening to event RoomEvent.TranscriptionReceived
The livekit document mention "VoicePipelineAgent and MultimodalAgent can generate and deliver transcriptions automatically".
I also try to find the code that implement the transcription forwarding but look like it hasn't implemented yet
Here is my code to setup VoicePipelineAgent

const { logger, voiceSettings } = options
  const initialContext = new llm.ChatContext().append({
    role: llm.ChatRole.SYSTEM,
    text: options.defaultInstruction
  });

  const vad = await silero.VAD.load();
  const agent = new pipeline.VoicePipelineAgent(
    vad,
    new openai.STT(),
    new openai.LLM(),
    new openai.TTS({
      voice: voiceSettings?.voice,
    }),
    {
      chatCtx: initialContext,
      allowInterruptions: true,
      interruptSpeechDuration: 500,
      interruptMinWords: 0,
      minEndpointingDelay: 500,
      transcription: {
        userTranscription: true,
        agentTranscription: true,
        agentTranscriptionSpeech: 1,
        sentenceTokenizer: new tokenize.basic.SentenceTokenizer(),
        wordTokenizer: new tokenize.basic.WordTokenizer(false),
        hyphenateWord: tokenize.basic.hyphenateWord,
      },
      beforeLLMCallback: (_, ctx) => {

        const lastMessage = ctx.messages[ctx.messages.length - 1]
        if (lastMessage) {
          logger.info({
            content: lastMessage.content,
          })
        }
      },
      beforeTTSCallback: async (_, source) => {
        const messageChunks: string[] = []
        if (typeof source === 'string') {
          messageChunks.push(source)
        } else {
          for await (const chunk of source) {
            messageChunks.push(chunk)
          }
        }
        const message = messageChunks.join('')
        logger.info({
          content: message,
        })
        return message
      },
    }
@valdrox
Copy link

valdrox commented Jan 1, 2025

I left a comment in the docs. I had the same issue. I think I saw a comment somewhere saying they were working on this.

This is what i'm going with until this is added:

  let messageCounter = 0;

  const handleSpeechCommitted = async (text: { content?: string }) => {
    if (text.content != undefined) {
      messageCounter++;
      const textForwarder = new BasicTranscriptionForwarder(
        ctx.room,
        participant.identity,
        'trackSid',
        messageCounter.toString(),
      );
      textForwarder.start();
      textForwarder.pushText(text.content);
      textForwarder.markTextComplete();
      textForwarder.close(false);
    }
  };

  agent.on(pipeline.VPAEvent.USER_SPEECH_COMMITTED, handleSpeechCommitted);
  agent.on(pipeline.VPAEvent.AGENT_SPEECH_COMMITTED, handleSpeechCommitted);

Then catch it in the front-end with the code in the docs.

@nbsp
Copy link
Member

nbsp commented Jan 3, 2025

our method of sending transcription events is currently very rudimentary; this is something we're working on for a future release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants