WORD-LEVEL END-TO-END NEURAL SPEAKER DIARIZATION WITH AUXNET

Number of patents in Portfolio can not be more than 2000

United States of America

APP PUB NO 20250118292A1
SERIAL NO

18891045

Stats

ATTORNEY / AGENT: (SPONSORED)

Importance

Loading Importance Indicators... loading....

Abstract

See full text

A method includes obtaining labeled training data including a plurality of spoken terms spoken during a conversation. For each respective spoken term, the method includes generating a corresponding sequence of intermediate audio encodings from a corresponding sequence of acoustic frames, generating a corresponding sequence of final audio encodings from the corresponding sequence of intermediate audio encodings, generating a corresponding speech recognition result, and generating a respective speaker token representing a predicted identity of a speaker for each corresponding speech recognition result. The method also includes training the joint speech recognition and speaker diarization model jointly based on a first loss derived from the generated speech recognition results and the corresponding transcriptions and a second loss derived from the generated speaker tokens and the corresponding speaker labels.

Loading the Abstract Image... loading....

First Claim

See full text

Family

Loading Family data... loading....

Patent Owner(s)

Patent OwnerAddress
GOOGLE LLC1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043

International Classification(s)

  • [Classification Symbol]
  • [Patents Count]

Inventor(s)

Inventor Name Address # of filed Patents Total Citations
Huang, Yiling Edgewater, US 7 1
Liao, Hank New York, US 5 145
Lu, Han Redmond, US 46 229
Wang, Quan Hoboken, US 221 1849
Wang, Weiran Iowa City, US 23 68
Zhao, Guanlong Long Island City, US 4 2

Cited Art Landscape

Load Citation

Patent Citation Ranking

Forward Cite Landscape

Load Citation