Research
My students and I work in a number of areas in speech and language processing, including...
  • Novel statistical methods for speech recognition
  • Prediction of errors in ASR systems
  • Discriminative language/pronunciation models
  • Phonetically-aware speech enhancement
  • Statistical investigations of linguistic phenomena in large corpora
  • Spoken dialogue system design; spoken human-computer interface issues
  • Natural language generation for spoken dialogue systems
  • Information extraction from electronic medical records
Lab Link More about Research
Teaching
I teach a number of the Artificial Intelligence Courses at OSU:
  • Intro to Artificial Intelligence
  • Neural Networks
  • Foundations of Speech and Language Processing
More about Teaching
Publications
Recent publications include
Full List Google Scholar
Recent News

Current Research Students

Ph.D. Graduates

Postdocs

MS/BS Graduates

About the Makhoul Professorship

The Ohio State University Board of Trustees established the John I. Makhoul Professorship in Electrical and Computer Engineering in 2020 in support of signal processing and machine learning at Ohio State; I am the first holder of the professorship (2022-2026). Unusually, while the professorship is in Electrical and Computer Engineering, my tenure home remains in Computer Science and Engineering. I'm very grateful to the College of Engineering, ECE, CSE and particularly John Makhoul for the opportunity to serve as the first Makhoul Professor.

Bio

Eric Fosler-Lussier is the John I. Makhoul Professor and Associate Chair for Academic Administration of Computer Science and Engineering, with courtesy appointments in Linguistics and Biomedical Informatics, at The Ohio State University. After receiving a B.A.S. (Computer and Cognitive Science) and B.A. (Linguistics) from the University of Pennsylvania in 1993, he received his Ph.D. in 1999 from the University of California, Berkeley, performing his dissertation research at the International Computer Science Institute under the tutelage of Prof. Nelson Morgan. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, a Visiting Researcher at Columbia University, and a Visiting Professor at the University of Pennsylvania. Awards inlcude NSF Career (2006), Ohio State College of Engineering Lumley Research Award (2010,2021), the IEEE Signal Processing Society Best Paper Award (2011), and the IMIA Yearbook Best Paper award in the Natural Language Processing (2015, 2017).

He has published widely in speech and language processing, and is a Fellow of the International Speech Communication Association and the IEEE, and a member of the Association for Computational Linguistics.

Fosler-Lussier serves as an senior area editor for the IEEE/ACM Transactions on Audio, Speech, and Language Proessing; he has served three terms on the IEEE Speech and Language Technical Committee (Chair, 2019-2020). He also served on the editorial board of the ACM Transactions on Speech and Language Processing, as an action editor for Transactions of the Association for Computational Linguistics, and was co-Program Chair for NAACL 2012.

Appointments

Professional Activities

Introduction

This page gives an overview of and links to recent research papers that describe some of the research of my lab. The commentary for some papers gives links to follow-on work so that the reader can see the trajectories of the different research lines.

My group's current research covers a number of topics in speech and natural language processing. The overall goal of my lab's research is to find meaningful ways to integrate acoustic, phonetic, lexical, and other linguistic insights into the speech recognition process through a combination of statistical modeling and data/error analysis. My goal is to train students to be flexible, independent thinkers who can apply statistical techniques to a range of language-related problems.

Joining the lab

The Speech and Language Technologies Laboratory is a group of dynamic researchers who are interested in mixing aspects of machine learning with speech and language processing.

If you are not an OSU student, but want to apply: see my note on the application process to OSU.

If you are a current OSU student: see the "once you are at OSU" section of my note.

Selected papers (with commentary)
D. Bagchi, P. Plantinga, A. Stiff, and E. Fosler-Lussier, "Spectral feature mapping with mimic loss for robust speech recognition,", ICASSP 2018.

For the task of speech enhancement, local learning objectives are agnostic to phonetic structures helpful for speech recognition. We propose to add a global criterion to ensure de-noised speech is useful for downstream tasks like ASR. We first train a spectral classifier on clean speech to predict senone labels. Then, the spectral classifier is joined with our speech enhancer as a noisy speech recognizer. This model is taught to imitate the output of the spectral classifier alone on clean speech. This mimic loss is combined with the traditional local criterion to train the speech enhancer to produce de-noised speech. Feeding the de-noised speech to an off-the-shelf Kaldi training recipe for the CHiME-2 corpus shows significant improvements in WER.

D. Newman-Griffis, A. Lai, and E. Fosler-Lussier, "Jointly embedding entities and text with distant supervision," Proceedings of the 3rd Workshop on Representation Learning for NLP, 2018.

Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.

J.K. Kim, Y.B. Kim, R. Sarikaya, and E. Fosler-Lussier "Cross-lingual transfer learning for POS tagging without cross-lingual resources," EMNLP 2017.

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

Y. He and E. Fosler-Lussier. "Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition," Interspeech 2015, Dresden, Germany, 2015.

In this line of research, my lab engaged in a series of studies to build automatic speech recognition systems using direct discriminative models that can combine correlated evidence of linguistic events. This work is the lastest step in this line of research: it provides a discriminative framework for modeling longer trajectories in speech through segmental models. The innovation in this particular paper is the first one-pass discriminative segmental model for word recognition (building on our previous work in phone recognition). We show that the monophone-based model improves recognition over discriminatively trained monophone-based HMM and Frame-based CRF models for the Wall Street Journal read-speech task, and starts to approach triphone-based performance. Thus, this serves as a good intermediate point in building systems that can start to compete with state-of-the-art systems.

See also:

R. Prabhavalkar, E. Fosler-Lussier, and K. Livescu, "A Factored Conditional Random Field Model for Articulatory Feature Forced Transcription," IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

Recognized as a Spotlight Poster at ASRU 2011 (voted as a top 3 poster in its session by the attendees).

Segmental modeling can be thought of as a type of linguistic structural modeling (integrating linguistic structure over time). Another linguistic-inspired modeling approach that we have experimented with, in conjunction with partners at Toyota Technological Institute at Chicago, explicitly models articulator trajectories over time through a factored model -- unlike phone-based systems, this paradigm allows models of asynchrony which can account for different types of pronunciation variation commonly seen in continuous speech. In this paper, we use factorized Conditional Random Fields in order to learn patterns of asynchrony that can be utilized in providing articulatory feature transcriptions that can be expensive to obtain manually. Our experiments show that the transcriptions can better account for pronunciation variations observed by linguists in the Switchboard corpus. In subsequent papers, we were able to utilize this framework for acoustic-based keyword spotting, showing improvement over a HMM-based baseline.

See also:

P. Jyothi, E. Fosler-Lussier, and K. Livescu, "Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks," Interspeech 2012.

Best Student Paper Award, Interspeech 2012

This paper takes a slightly different approach to articulatory modeling than the Prabhavalkar work described above, starting from a previous Dynamic Bayesian Network (DBN) approach and efficiently derives, as well as discriminatively trains, a weighted finite state transducer (WFST) representation for the articulatory feature-based model of pronunciation. We use the conditional independence assumptions imposed by the DBN to efficiently convert it into a sequence of WFSTs (factor FSTs) which, when composed, yield the same model as the DBN. We then introduce a linear model of the arc weights of the factor FSTs and discriminatively learn its weights using the averaged perceptron algorithm. We demonstrate the approach using a lexical access task in which we recognize a word given its surface realization. This work subsequently led to discriminative training approaches for factorized WFSTs that can be used even in standard WFST-based ASR systems.

See also:

W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. Wang, "A Direct Masking Approach to Robust ASR," IEEE Transactions on Acoustics, Speech, and Language Processing, 21:10, pp 1993-2005, Oct 2013.

One line of research that we have followed is to use some of the discriminative techniques that we have developed in speech recognition in concert with speech separation techniques inspired by (and often in collaboration with) my colleague DeLiang Wang. The paper highlighted here was an outgrowth of this work, in which my student Billy Hartmann and I asked whether it was possible to use speech separation directly on noisy speech data to mask out noise without any reconstruction of the masked componenets in ASR. Previously it was assumed that zero-energy "holes" would cause problems in spectrally-masked speech that was not reconstructed or where the missing components were not marginalized in the probability estimation. The baseline for these latter techniques was usually just the recognition on the non-modified (noisy) speech. In this paper we show that one can use masked speech data directly in recognition, and argue that this should be the "simple" baseline from which other techniques are based.

See also:

P. Raghavan, E. Fosler-Lussier, N. Elhadad, and A. Lai, "Cross-narrative Temporal Ordering of Medical Events," Association for Computational Linguistics Annual Meeting, 2014.

My group has also been active in NLP research, particularly in the domain of electronic health records (EHRs) in collaboration with Albert Lai in Biomedical Informatics. This paper describes the culmination of several pieces of work, where we extract medical events from multiple clinical notes in an EHR, develop a timeline for each note, and then align the events across notes to create an overall summary timeline of the medical history.

See also:

E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, "Conditional Random Fields in Speech, Audio and Language Processing," Proceedings of the IEEE, 101:5, pp 1054-1075, 2013.

I have also been active in developing review articles to help explain several current topics to wider audiences. This invited paper gives a broad overview of Conditional Random Fields and their use in various processing tasks.

See also:

J. Morris and E. Fosler-Lussier. "Conditional Random Fields for Integrating Local Discriminative Classifiers," IEEE Transactions on Audio, Speech, and Language Processing, 16:3, pp 617-628, March 2008.
Awarded IEEE Signal Processing Society Best Paper Award in 2010.

This paper details a model which can selectively pay attention to some phonological information and ignore other information using a discriminative model known as Conditional Random Fields (CRFs). While CRFs had been used in a few studies prior to this work, the contribution of this paper was to examine their utility as feature combiners, combining posterior estimates of phone classes and phonological feature classes to improve TIMIT phone recognition. We have continued this line of research since this paper, moving towards the first CRF-based word recognition experiments ever done.

See also:

E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo. "A Framework for Predicting Speech Recognition Errors," Speech Communication issue on Pronunciation Modeling and Lexicon Adaptation, 46:2, pp. 153-170, 2005.

Much of the work above is devoted to methods of modeling the acoustic-phonetic variation inherent in speech, in order to build better speech recognition models. However, a slightly different way of thinking about variation is to consider the variation in patterns of errors made by a speech recognizer due to many factors (for example, errors due to inherent speech variation, errors caused by poor acoustic/lexical models, or search errors). This paper focuses on methods to predict errors made by speech recognition systems, even when we only have a text transcript (i.e., no audio); the proposed framework is flexible enough to allow for different prediction models to characterize system performance. The impact of this technology has allowed us and others to train discriminative language models that directly optimize system error rate (rather than data likelihood) using a large amount of textual data.

See also:

Teaching: Classes taught at Ohio State

Resources from my classes that might be useful

Fun things about me