Statistical Optimization Of Acoustic Models For Large Vocabulary Speech Recognition

Download Statistical Optimization Of Acoustic Models For Large Vocabulary Speech Recognition full books in PDF, epub, and Kindle. Read online Statistical Optimization Of Acoustic Models For Large Vocabulary Speech Recognition ebook anywhere anytime directly on your device. Fast Download speed and no annoying ads. We cannot guarantee that every ebooks is available!

Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition

Author : Rusheng Hu
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (162 download)

DOWNLOAD NOW!

Book Synopsis Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition by : Rusheng Hu

Download or read book Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition written by Rusheng Hu and published by . This book was released on 2006 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation investigates optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-- the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.

Statistical Methods for Speech Recognition

Author : Frederick Jelinek
Publisher : MIT Press
ISBN 13 : 9780262100663
Total Pages : 324 pages
Book Rating : 4.1/5 (6 download)

DOWNLOAD NOW!

Book Synopsis Statistical Methods for Speech Recognition by : Frederick Jelinek

Download or read book Statistical Methods for Speech Recognition written by Frederick Jelinek and published by MIT Press. This book was released on 1998-01-15 with total page 324 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author's goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques.

Speech Recognition Using Neural Networks

Author : Joe Tebelskis
Publisher :
ISBN 13 :
Total Pages : 180 pages
Book Rating : 4.:/5 (338 download)

DOWNLOAD NOW!

Book Synopsis Speech Recognition Using Neural Networks by : Joe Tebelskis

Download or read book Speech Recognition Using Neural Networks written by Joe Tebelskis and published by . This book was released on 1995 with total page 180 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN-HMM hybrid that we developed, based on context-independent phoneme models, that achieved 90.5% word accuracy on the Resource Management database, in contrast to only 86.0% accuracy achieved by a pure HMM under similar conditions. In the course of developing this system, we explored two different ways to use neural networks for acoustic modeling: prediction and classification. We found that predictive networks yield poor results because of a lack of discrimination, but classification networks gave excellent results. We verified that, in accordance with theory, the output activations of a classification network form highly accurate estimates of the posterior probabilities P(class/input), and we showed how these can easily be converted to likelihoods P(input/class) for standard HMM recognition algorithms. Finally, this thesis reports how we optimized the accuracy of our system with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the network's output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks."

Dynamic Speech Models

Author : Li Deng
Publisher : Springer Nature
ISBN 13 : 3031025555
Total Pages : 105 pages
Book Rating : 4.0/5 (31 download)

DOWNLOAD NOW!

Book Synopsis Dynamic Speech Models by : Li Deng

Download or read book Dynamic Speech Models written by Li Deng and published by Springer Nature. This book was released on 2022-05-31 with total page 105 pages. Available in PDF, EPUB and Kindle. Book excerpt: Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech “chain” starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into computerized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process. What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing

Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition

Author : Christian Gollan
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (17 download)

DOWNLOAD NOW!

Book Synopsis Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition by : Christian Gollan

Download or read book Efficient Setup of Acoustic Models for Large Vocabulary Continuous Speech Recognition written by Christian Gollan and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Automatic Speech and Speaker Recognition

Author : Joseph Keshet
Publisher : John Wiley & Sons
ISBN 13 : 9780470742037
Total Pages : 268 pages
Book Rating : 4.7/5 (42 download)

DOWNLOAD NOW!

Book Synopsis Automatic Speech and Speaker Recognition by : Joseph Keshet

Download or read book Automatic Speech and Speaker Recognition written by Joseph Keshet and published by John Wiley & Sons. This book was released on 2009-04-27 with total page 268 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book discusses large margin and kernel methods for speech and speaker recognition Speech and Speaker Recognition: Large Margin and Kernel Methods is a collation of research in the recent advances in large margin and kernel methods, as applied to the field of speech and speaker recognition. It presents theoretical and practical foundations of these methods, from support vector machines to large margin methods for structured learning. It also provides examples of large margin based acoustic modelling for continuous speech recognizers, where the grounds for practical large margin sequence learning are set. Large margin methods for discriminative language modelling and text independent speaker verification are also addressed in this book. Key Features: Provides an up-to-date snapshot of the current state of research in this field Covers important aspects of extending the binary support vector machine to speech and speaker recognition applications Discusses large margin and kernel method algorithms for sequence prediction required for acoustic modeling Reviews past and present work on discriminative training of language models, and describes different large margin algorithms for the application of part-of-speech tagging Surveys recent work on the use of kernel approaches to text-independent speaker verification, and introduces the main concepts and algorithms Surveys recent work on kernel approaches to learning a similarity matrix from data This book will be of interest to researchers, practitioners, engineers, and scientists in speech processing and machine learning fields.

Reducing Development Costs of Large Vocabulary Speech Recognition Systems

Author : Thiago Fraga Da Silva
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (896 download)

DOWNLOAD NOW!

Book Synopsis Reducing Development Costs of Large Vocabulary Speech Recognition Systems by : Thiago Fraga Da Silva

Download or read book Reducing Development Costs of Large Vocabulary Speech Recognition Systems written by Thiago Fraga Da Silva and published by . This book was released on 2014 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: One of the outstanding challenges in large vocabulary automatic speech recognition (ASR) is the reduction of development costs required to build a new recognition system or adapt an existing one to a new task, language or dialect. The state-of-the-art ASR systems are based on the principles of the statistical learning paradigm, using information provided by two stochastic models, an acoustic (AM) and a language (LM) model. The standard methods used to estimate the parameters of such models are founded on two main assumptions : the training data sets are large enough, and the training data match well the target task. It is well-known that a great part of system development costs is due to the construction of corpora that fulfill these requirements. In particular, manually transcribing the audio data is the most expensive and time-consuming endeavor. For some applications, such as the recognition of low resourced languages or dialects, finding and collecting data is also a hard (and expensive) task. As a means to lower the cost required for ASR system development, this thesis proposes and studies methods that aim to alleviate the need for manually transcribing audio data for a given target task. Two axes of research are explored. First, unsupervised training methods are explored in order to build three of the main components of ASR systems : the acoustic model, the multi-layer perceptron (MLP) used to extract acoustic features and the language model. The unsupervised training methods aim to estimate the model parameters using a large amount of automatically (and inaccurately) transcribed audio data, obtained thanks to an existing recognition system. A novel method for unsupervised AM training that copes well with the automatic audio transcripts is proposed : the use of multiple recognition hypotheses (rather than the best one) leads to consistent gains in performance over the standard approach. Unsupervised MLP training is proposed as an alternative to build efficient acoustic models in a fully unsupervised way. Compared to cross-lingual MLPs trained in a supervised manner, the unsupervised MLP leads to competitive performance levels even if trained on only about half of the data amount. Unsupervised LM training approaches are proposed to estimate standard back-off n-gram and neural network language models. It is shown that unsupervised LM training leads to additive gains in performance on top of unsupervised AM training. Second, this thesis proposes the use of model interpolation as a rapid and flexible way to build task specific acoustic models. In reported experiments, models obtained via interpolation outperform the baseline pooled models and equivalent maximum a posteriori (MAP) adapted models. Interpolation proves to be especially useful for low resourced dialect ASR. When only a few (2 to 3 hours) or no acoustic data truly matching the target dialect are available for AM training, model interpolation leads to substantial performance gains compared to the standard training methods.

Statistical Pronunciation Modeling for Non-Native Speech Processing

Author : Rainer E. Gruhn
Publisher : Springer Science & Business Media
ISBN 13 : 3642195865
Total Pages : 118 pages
Book Rating : 4.6/5 (421 download)

DOWNLOAD NOW!

Book Synopsis Statistical Pronunciation Modeling for Non-Native Speech Processing by : Rainer E. Gruhn

Download or read book Statistical Pronunciation Modeling for Non-Native Speech Processing written by Rainer E. Gruhn and published by Springer Science & Business Media. This book was released on 2011-05-08 with total page 118 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this work, the authors present a fully statistical approach to model non--native speakers' pronunciation. Second-language speakers pronounce words in multiple different ways compared to the native speakers. Those deviations, may it be phoneme substitutions, deletions or insertions, can be modelled automatically with the new method presented here. The methods is based on a discrete hidden Markov model as a word pronunciation model, initialized on a standard pronunciation dictionary. The implementation and functionality of the methodology has been proven and verified with a test set of non-native English in the regarding accent. The book is written for researchers with a professional interest in phonetics and automatic speech and speaker recognition.

Incorporating Knowledge Sources into Statistical Speech Recognition

Author : Sakriani Sakti
Publisher : Springer Science & Business Media
ISBN 13 : 038785830X
Total Pages : 207 pages
Book Rating : 4.3/5 (878 download)

DOWNLOAD NOW!

Book Synopsis Incorporating Knowledge Sources into Statistical Speech Recognition by : Sakriani Sakti

Download or read book Incorporating Knowledge Sources into Statistical Speech Recognition written by Sakriani Sakti and published by Springer Science & Business Media. This book was released on 2009-02-27 with total page 207 pages. Available in PDF, EPUB and Kindle. Book excerpt: Incorporating Knowledge Sources into Statistical Speech Recognition addresses the problem of developing efficient automatic speech recognition (ASR) systems, which maintain a balance between utilizing a wide knowledge of speech variability, while keeping the training / recognition effort feasible and improving speech recognition performance. The book provides an efficient general framework to incorporate additional knowledge sources into state-of-the-art statistical ASR systems. It can be applied to many existing ASR problems with their respective model-based likelihood functions in flexible ways.

Statistical Language Modelling for Large Vocabulary Speech Recognition

Author : Michael McGreevy
Publisher :
ISBN 13 :
Total Pages : 75 pages
Book Rating : 4.:/5 (271 download)

DOWNLOAD NOW!

Book Synopsis Statistical Language Modelling for Large Vocabulary Speech Recognition by : Michael McGreevy

Download or read book Statistical Language Modelling for Large Vocabulary Speech Recognition written by Michael McGreevy and published by . This book was released on 2005 with total page 75 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Statistical Language and Speech Processing

Author : Nathalie Camelin
Publisher : Springer
ISBN 13 : 3319684566
Total Pages : 282 pages
Book Rating : 4.3/5 (196 download)

DOWNLOAD NOW!

Book Synopsis Statistical Language and Speech Processing by : Nathalie Camelin

Download or read book Statistical Language and Speech Processing written by Nathalie Camelin and published by Springer. This book was released on 2017-10-09 with total page 282 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 5th International Conference on Statistical Language and Speech Processing, SLSP 2017, held in Le Mans, France, in October 2017. The 21 full papers presented were carefully reviewed and selected from 39 submissions. The papers cover topics such as anaphora and conference resolution; authorship identification, plagiarism and spam filtering; computer-aided translation; corpora and language resources; data mining and semanticweb; information extraction; information retrieval; knowledge representation and ontologies; lexicons and dictionaries; machine translation; multimodal technologies; natural language understanding; neural representation of speech and language; opinion mining and sentiment analysis; parsing; part-of-speech tagging; question and answering systems; semantic role labeling; speaker identification and verification; speech and language generation; speech recognition; speech synthesis; speech transcription; speech correction; spoken dialogue systems; term extraction; text categorization; test summarization; user modeling. They are organized in the following sections: language and information extraction; post-processing and applications of automatic transcriptions; speech paralinguistics and synthesis; speech recognition: modeling and resources.

Hierarchical Connectionist Acoustic Modeling for Domain Adaptive Large Vocabulary Speech Recognition

Author : Jürgen Fritsch
Publisher :
ISBN 13 : 9783826573101
Total Pages : 216 pages
Book Rating : 4.5/5 (731 download)

DOWNLOAD NOW!

Book Synopsis Hierarchical Connectionist Acoustic Modeling for Domain Adaptive Large Vocabulary Speech Recognition by : Jürgen Fritsch

Download or read book Hierarchical Connectionist Acoustic Modeling for Domain Adaptive Large Vocabulary Speech Recognition written by Jürgen Fritsch and published by . This book was released on 2000 with total page 216 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Natural Statistical Models for Automatic Speech Recognition

Author : Jeff Bilmes
Publisher :
ISBN 13 :
Total Pages : 446 pages
Book Rating : 4.:/5 (32 download)

DOWNLOAD NOW!

Book Synopsis Natural Statistical Models for Automatic Speech Recognition by : Jeff Bilmes

Download or read book Natural Statistical Models for Automatic Speech Recognition written by Jeff Bilmes and published by . This book was released on 1999 with total page 446 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Discriminative Learning for Speech Recognition

Author : Xiadong He
Publisher : Springer Nature
ISBN 13 : 3031025571
Total Pages : 112 pages
Book Rating : 4.0/5 (31 download)

DOWNLOAD NOW!

Book Synopsis Discriminative Learning for Speech Recognition by : Xiadong He

Download or read book Discriminative Learning for Speech Recognition written by Xiadong He and published by Springer Nature. This book was released on 2022-06-01 with total page 112 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-function form. This common form enables the use of the growth transformation (or extended Baum–Welch) optimization framework in discriminative learning of model parameters. In addition to all the necessary introduction of the background and tutorial material on the subject, we also included technical details on the derivation of the parameter optimization formulas for exponential-family distributions, discrete hidden Markov models (HMMs), and continuous-density HMMs in discriminative learning. Selected experimental results obtained by the authors in firsthand are presented to show that discriminative learning can lead to superior speech recognition performance over conventional parameter learning. Details on major algorithmic implementation issues with practical significance are provided to enable the practitioners to directly reproduce the theory in the earlier part of the book into engineering practice. Table of Contents: Introduction and Background / Statistical Speech Recognition: A Tutorial / Discriminative Learning: A Unified Objective Function / Discriminative Learning Algorithm for Exponential-Family Distributions / Discriminative Learning Algorithm for Hidden Markov Model / Practical Implementation of Discriminative Learning / Selected Experimental Results / Epilogue / Major Symbols Used in the Book and Their Descriptions / Mathematical Notation / Bibliography

Automatic Speech Recognition

Author : Dong Yu
Publisher : Springer
ISBN 13 : 1447157796
Total Pages : 329 pages
Book Rating : 4.4/5 (471 download)

DOWNLOAD NOW!

Book Synopsis Automatic Speech Recognition by : Dong Yu

Download or read book Automatic Speech Recognition written by Dong Yu and published by Springer. This book was released on 2014-11-11 with total page 329 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.

Language Modeling for Automatic Speech Recognition of Inflective Languages

Author : Gregor Donaj
Publisher : Springer
ISBN 13 : 3319416073
Total Pages : 77 pages
Book Rating : 4.3/5 (194 download)

DOWNLOAD NOW!

Book Synopsis Language Modeling for Automatic Speech Recognition of Inflective Languages by : Gregor Donaj

Download or read book Language Modeling for Automatic Speech Recognition of Inflective Languages written by Gregor Donaj and published by Springer. This book was released on 2016-08-29 with total page 77 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the number of spoken languages.

Multi-level Acoustic Modeling for Automatic Speech Recognition

Author : Hung-An Chang (Ph. D.)
Publisher :
ISBN 13 :
Total Pages : 192 pages
Book Rating : 4.:/5 (813 download)

DOWNLOAD NOW!

Book Synopsis Multi-level Acoustic Modeling for Automatic Speech Recognition by : Hung-An Chang (Ph. D.)

Download or read book Multi-level Acoustic Modeling for Automatic Speech Recognition written by Hung-An Chang (Ph. D.) and published by . This book was released on 2012 with total page 192 pages. Available in PDF, EPUB and Kindle. Book excerpt: Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local phoneme context is used as a means to define context-dependent units. Because the number of possible context-dependent units can grow exponentially with the length of the contexts, many units will not have enough training examples to train a robust model, resulting in a data sparsity problem. For nearly two decades, this data sparsity problem has been dealt with by a clustering-based framework which systematically groups different context-dependent units into clusters such that each cluster can have enough data. Although dealing with the data sparsity issue, the clustering-based approach also makes all context-dependent units within a cluster have the same acoustic score, resulting in a quantization effect that can potentially limit the performance of the context-dependent model. In this work, a multi-level acoustic modeling framework is proposed to address both the data sparsity problem and the quantization effect. Under the multi-level framework, each context-dependent unit is associated with classifiers that target multiple levels of contextual resolution, and the outputs of the classifiers are linearly combined for scoring during recognition. By choosing the classifiers judiciously, both the data sparsity problem and the quantization effect can be dealt with. The proposed multi-level framework can also be integrated into existing large-vocabulary ASR systems, such as FST-based ASR systems, and is compatible with state-of-the-art error reduction techniques for ASR systems, such as discriminative training methods. Multiple sets of experiments have been conducted to compare the performance of the clustering-based acoustic model and the proposed multi-level model. In a phonetic recognition experiment on TIMIT, the multi-level model has about 8% relative improvement in terms of phone error rate, showing that the multi-level framework can help improve phonetic prediction accuracy. In a large-vocabulary transcription task, combining the proposed multi-level modeling framework with discriminative training can provide more than 20% relative improvement over a clustering baseline model in terms of Word Error Rate (WER), showing that the multi-level framework can be integrated into existing large-vocabulary decoding frameworks and that it combines well with discriminative training methods. In speaker adaptive transcription task, the multi-level model has about 14% relative WER improvement, showing that the proposed framework can adapt better to new speakers, and potentially to new environments than the conventional clustering-based approach.