Eric Fosler-Lussier

John I. Makhoul Professor and Acting Chair
Department of Computer Science & Engineering
The Ohio State University
Professor by Courtesy of Linguistics, Biomedical Informatics
Office: 395K Dreese Lab
Phone: +1 614 292 4890
Email: [first half of last name] @cse.osu.edu
Mailing address:
395 Dreese Lab, 2015 Neil Ave, Columbus, OH 43210
Lab GitHub
GitHub
Google Scholar

Research

My students and I work in a number of areas in speech and language processing, including...

Novel statistical methods for speech recognition
Prediction of errors in ASR systems
Discriminative language/pronunciation models
Phonetically-aware speech enhancement
Statistical investigations of linguistic phenomena in large corpora
Spoken dialogue system design; spoken human-computer interface issues
Natural language generation for spoken dialogue systems
Information extraction from electronic medical records

Lab Link More about Research

Teaching

I teach a number of the Artificial Intelligence Courses at OSU:

Intro to Artificial Intelligence
Neural Networks
Foundations of Speech and Language Processing

More about Teaching

Publications

Recent publications include

Full List Google Scholar

Useful Stuff

Some things we've done

Current Research Students

Ph.D. Graduates

Postdocs

MS/BS Graduates

About the Makhoul Professorship

The Ohio State University Board of Trustees established the John I. Makhoul Professorship in Electrical and Computer Engineering in 2020 in support of signal processing and machine learning at Ohio State; I am the first holder of the professorship (2022-2026). Unusually, while the professorship is in Electrical and Computer Engineering, my tenure home remains in Computer Science and Engineering. I'm very grateful to the College of Engineering, ECE, CSE and particularly John Makhoul for the opportunity to serve as the first Makhoul Professor.

Bio

Eric Fosler-Lussier is the John I. Makhoul Professor and Acting Department Chair of Computer Science and Engineering, with courtesy appointments in Linguistics and Biomedical Informatics, at The Ohio State University. After receiving a B.A.S. (Computer and Cognitive Science) and B.A. (Linguistics) from the University of Pennsylvania in 1993, he received his Ph.D. in 1999 from the University of California, Berkeley, performing his dissertation research at the International Computer Science Institute under the tutelage of Prof. Nelson Morgan. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, a Visiting Researcher at Columbia University, and a Visiting Professor at the University of Pennsylvania. Awards inlcude NSF Career (2006), Ohio State College of Engineering Lumley Research Award (2010,2021), the IEEE Signal Processing Society Best Paper Award (2011), and the IMIA Yearbook Best Paper award in the Natural Language Processing (2015, 2017).

He has published widely in speech and language processing, and is a Fellow of the International Speech Communication Association and the IEEE, and a member of the Association for Computational Linguistics.

Fosler-Lussier served as an senior area editor for the IEEE/ACM Transactions on Audio, Speech, and Language Proessing; he has served three terms on the IEEE Speech and Language Technical Committee (Chair, 2019-2020). He also served on the editorial board of the ACM Transactions on Speech and Language Processing, as an action editor for Transactions of the Association for Computational Linguistics, and was co-Program Chair for NAACL 2012.

Appointments

2022 - 2026: John I. Makhoul Professor of Electrical and Computer Engineering
2025: Acting Department Chair, Dept. of Computer Science and Engineering
2021 - present: Associate Chair for Academic Administration, Dept. of Computer Science and Engineering
2020 - 2023: Program co-Director, Foundations of Data Science and Artificial Intelligence, Translational Data Analytics Institute
2016 - present: Professor Dept. of Computer Science and Engineering, and Professor by Courtesy, Departments of Linguistics and Biomedical Informatics, OSU
Jan - May 2019: Visiting Professor Dept. of Computer and Information Science, University of Pennsylvania
2010 - 2016 : Associate Professor Dept. of Computer Science and Engineering, and Associate Professor by Courtesy, Departments of Linguistics and Biomedical Informatics (since 2016), OSU
2003 - 2010 : Assistant Professor Dept. of Computer Science and Engineering, and Assistant Professor by Courtesy, Department of Linguistics (since 2004), OSU
2003 - present: Member, Center for Cognitive Science, OSU
2003: Visiting Research Scientist, Dept. of Electrical Engineering, Columbia University
2000-2002: Member of Technical Staff, Bell Labs Research, Lucent Technologies
1999-2000: Postdoctoral Researcher, International Computer Science Institute
1994-1999: Graduate Student Researcher, U.C. Berkeley and International Computer Science Institute

Professional Activities

IEEE James L. Flanagan Speech and Audio Processing Award Committee, member 2022-2024, chair 2025-2026, past chair 2027.
ISCA Fellow Selection Committee, 2024-2027.
Member of IEEE SPS Speech and Language Technical Committee, 2006-2008, 2011-2013, 2017-2021.
- Vice Chair, 2018
- Chair, 2019-2020
- Past Chair, 2021
IEEE Signal Processing Society Awards Board, 2021-2023.
Senior Area chair, NAACL HLT 2021
IEEE Signal Processing Society Technical Directions Board, 2019-2020.
General Co-chair, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Signapore, 2019.
Associate Editor, IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017-2021
Action Editor, Transactions of the Association for Computational Linguistics, 2012-2018
Area chair, NAACL HLT 2018
Executive Committee, Center for Cognitive and Brain Sciences, The Ohio State University, 2011-2014
Tutorials chair, Interspeech 2016
Area chair, NAACL HLT 2016
Program co-chair, North American Association for Computational Linguistics Annual Meeting - Human Language Technologies Conference (NAACL HLT), 2012
Associate Editor, ACM Transactions on Speech and Language Processing, 2011-2013
Finance chair, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011)
Panels co-chair, IEEE Spoken Language Technology Workshop (SLT 2010)
Publication chair, 2010 Conference on Empirical Methods in Natural Language Processing
ACL Archivist, 2006-2010
Executive committee, ACL Special Interest Group in Computational Morphonology and Phonology (SIGMORPHON), 2006-2007
Publicity chair, IEEE/ACL Workshop on Spoken Language Technology, 2006.
Student Workshop Faculty Co-advisor, HLT/NAACL Conference, 2004.
Publicity chair, IEEE Workshop on Automatic Speech Recognition and Understanding, 2003.
Co-organizer, ISCA Tutorial and Research Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology, 2002.
Reviewer/program committee, Annual Meeting of the Association for Computational Linguistics, Int'l Conference on Acoustics, Speech, and Signal Processing, IEEE Workshop on Automatic Speech Recognition and Understanding, IEEE/ACL Workshop on Spoken Language Technology, Interspeech, Human Language Technologies conference, Neural Information Processing Systems.
Reviewer for the journals Speech Communication; Computer, Speech and Language; Compuational Linguistics; IEEE Transactions on Speech and Audio Processing; IEEE Systems, Man and Cybernetics; Machine Learning Journal; Journal of the Acoustical Society of America.

Introduction

This page gives an overview of and links to recent research papers that describe some of the research of my lab. The commentary for some papers gives links to follow-on work so that the reader can see the trajectories of the different research lines.

My group's current research covers a number of topics in speech and natural language processing. The overall goal of my lab's research is to find meaningful ways to integrate acoustic, phonetic, lexical, and other linguistic insights into the speech recognition process through a combination of statistical modeling and data/error analysis. My goal is to train students to be flexible, independent thinkers who can apply statistical techniques to a range of language-related problems.

Joining the lab

The Speech and Language Technologies Laboratory is a group of dynamic researchers who are interested in mixing aspects of machine learning with speech and language processing.

If you are not an OSU student, but want to apply: see my note on the application process to OSU.

If you are a current OSU student: see the "once you are at OSU" section of my note.

Selected papers (with commentary)

D. Bagchi, P. Plantinga, A. Stiff, and E. Fosler-Lussier, "Spectral feature mapping with mimic loss for robust speech recognition,", ICASSP 2018.

For the task of speech enhancement, local learning objectives are agnostic to phonetic structures helpful for speech recognition. We propose to add a global criterion to ensure de-noised speech is useful for downstream tasks like ASR. We first train a spectral classifier on clean speech to predict senone labels. Then, the spectral classifier is joined with our speech enhancer as a noisy speech recognizer. This model is taught to imitate the output of the spectral classifier alone on clean speech. This mimic loss is combined with the traditional local criterion to train the speech enhancer to produce de-noised speech. Feeding the de-noised speech to an off-the-shelf Kaldi training recipe for the CHiME-2 corpus shows significant improvements in WER.

D. Newman-Griffis, A. Lai, and E. Fosler-Lussier, "Jointly embedding entities and text with distant supervision," Proceedings of the 3rd Workshop on Representation Learning for NLP, 2018.

Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.

J.K. Kim, Y.B. Kim, R. Sarikaya, and E. Fosler-Lussier "Cross-lingual transfer learning for POS tagging without cross-lingual resources," EMNLP 2017.

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

Y. He and E. Fosler-Lussier. "Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition," Interspeech 2015, Dresden, Germany, 2015.

In this line of research, my lab engaged in a series of studies to build automatic speech recognition systems using direct discriminative models that can combine correlated evidence of linguistic events. This work is the lastest step in this line of research: it provides a discriminative framework for modeling longer trajectories in speech through segmental models. The innovation in this particular paper is the first one-pass discriminative segmental model for word recognition (building on our previous work in phone recognition). We show that the monophone-based model improves recognition over discriminatively trained monophone-based HMM and Frame-based CRF models for the Wall Street Journal read-speech task, and starts to approach triphone-based performance. Thus, this serves as a good intermediate point in building systems that can start to compete with state-of-the-art systems.

See also:

Y. He and E. Fosler-Lussier, "Efficient Segmental Conditional Random Fields for One-Pass Phone Recognition," Interspeech 2012.

R. Prabhavalkar, E. Fosler-Lussier, and K. Livescu, "A Factored Conditional Random Field Model for Articulatory Feature Forced Transcription," IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

Recognized as a Spotlight Poster at ASRU 2011 (voted as a top 3 poster in its session by the attendees).

Segmental modeling can be thought of as a type of linguistic structural modeling (integrating linguistic structure over time). Another linguistic-inspired modeling approach that we have experimented with, in conjunction with partners at Toyota Technological Institute at Chicago, explicitly models articulator trajectories over time through a factored model -- unlike phone-based systems, this paradigm allows models of asynchrony which can account for different types of pronunciation variation commonly seen in continuous speech. In this paper, we use factorized Conditional Random Fields in order to learn patterns of asynchrony that can be utilized in providing articulatory feature transcriptions that can be expensive to obtain manually. Our experiments show that the transcriptions can better account for pronunciation variations observed by linguists in the Switchboard corpus. In subsequent papers, we were able to utilize this framework for acoustic-based keyword spotting, showing improvement over a HMM-based baseline.

See also:

R. Prabhavalkar, J. Keshet, K. Livescu, and E. Fosler-Lussier, "Discriminative Spoken Term Detection with Limited Data," Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012.
R. Prabhavalkar, K. Livescu, E. Fosler-Lussier, J. Keshet, "Discriminative Articulatory Models for Spoken Term Detection in Low-Resource Conversational Settings," Proceedings of ICASSP, 2013.

P. Jyothi, E. Fosler-Lussier, and K. Livescu, "Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks," Interspeech 2012.

Best Student Paper Award, Interspeech 2012

This paper takes a slightly different approach to articulatory modeling than the Prabhavalkar work described above, starting from a previous Dynamic Bayesian Network (DBN) approach and efficiently derives, as well as discriminatively trains, a weighted finite state transducer (WFST) representation for the articulatory feature-based model of pronunciation. We use the conditional independence assumptions imposed by the DBN to efficiently convert it into a sequence of WFSTs (factor FSTs) which, when composed, yield the same model as the DBN. We then introduce a linear model of the arc weights of the factor FSTs and discriminatively learn its weights using the averaged perceptron algorithm. We demonstrate the approach using a lexical access task in which we recognize a word given its surface realization. This work subsequently led to discriminative training approaches for factorized WFSTs that can be used even in standard WFST-based ASR systems.

See also:

P. Jyothi, K. Livescu, and E. Fosler-Lussier, Lexical access experiments with context-dependent articulatory feature-based model," ICASSP 2011.
P. Jyothi, E. Fosler-Lussier, and K. Livescu, "Discriminative Training of WFST Factors with Application to Pronunciation Modeling," Proceedings of Interspeech, 2013.

W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. Wang, "A Direct Masking Approach to Robust ASR," IEEE Transactions on Acoustics, Speech, and Language Processing, 21:10, pp 1993-2005, Oct 2013.

One line of research that we have followed is to use some of the discriminative techniques that we have developed in speech recognition in concert with speech separation techniques inspired by (and often in collaboration with) my colleague DeLiang Wang. The paper highlighted here was an outgrowth of this work, in which my student Billy Hartmann and I asked whether it was possible to use speech separation directly on noisy speech data to mask out noise without any reconstruction of the masked componenets in ASR. Previously it was assumed that zero-energy "holes" would cause problems in spectrally-masked speech that was not reconstructed or where the missing components were not marginalized in the probability estimation. The baseline for these latter techniques was usually just the recognition on the non-modified (noisy) speech. In this paper we show that one can use masked speech data directly in recognition, and argue that this should be the "simple" baseline from which other techniques are based.

See also:

R. Prabhavalkar, Z. Jin, and E. Fosler-Lussier, "Monaural Segregation of Voiced Speech using Discriminative Random Fields," Proceedings of Interspeech, Brighton, UK, 2009.
W. Hartmann and E. Fosler-Lussier, "Improved Model Selection for the ASR-Driven Binary Mask," Interspeech 2012.
W. Hartmann and E. Fosler-Lussier, "ASR-Driven Top-Down Binary Mask Estimation Using Spectral Priors," Proc. ICASSP, 2012.

P. Raghavan, E. Fosler-Lussier, N. Elhadad, and A. Lai, "Cross-narrative Temporal Ordering of Medical Events," Association for Computational Linguistics Annual Meeting, 2014.

My group has also been active in NLP research, particularly in the domain of electronic health records (EHRs) in collaboration with Albert Lai in Biomedical Informatics. This paper describes the culmination of several pieces of work, where we extract medical events from multiple clinical notes in an EHR, develop a timeline for each note, and then align the events across notes to create an overall summary timeline of the medical history.

See also:

P. Raghavan, E. Fosler-Lussier, and A. Lai, "Temporal Classification of Medical Events," BioNLP 2012.
P. Raghavan, E. Fosler-Lussier, and A. Lai, "Exploring Semi-Supervised Coreference Resolution of Medical Concepts using Semantic and Temporal Features," North American Association for Computational Linguistics Annual Meeting - Human Language Technologies Conference (NAACL HLT 2012), 2012.

E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, "Conditional Random Fields in Speech, Audio and Language Processing," Proceedings of the IEEE, 101:5, pp 1054-1075, 2013.

I have also been active in developing review articles to help explain several current topics to wider audiences. This invited paper gives a broad overview of Conditional Random Fields and their use in various processing tasks.

See also:

M.J.F. Gales, S. Watanabe, and E. Fosler-Lussier, "Structured Discriminative Models for Speech Recognition," Signal Processing Magazine, 29:6, pp 70-81, Nov. 2012.
K. Livescu, E. Fosler-Lussier, and F. Metze, "Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches," Signal Processing Magazine, 29:6, pp 44-57, Nov. 2012.

J. Morris and E. Fosler-Lussier. "Conditional Random Fields for Integrating Local Discriminative Classifiers," IEEE Transactions on Audio, Speech, and Language Processing, 16:3, pp 617-628, March 2008.

Awarded IEEE Signal Processing Society Best Paper Award in 2010.

This paper details a model which can selectively pay attention to some phonological information and ignore other information using a discriminative model known as Conditional Random Fields (CRFs). While CRFs had been used in a few studies prior to this work, the contribution of this paper was to examine their utility as feature combiners, combining posterior estimates of phone classes and phonological feature classes to improve TIMIT phone recognition. We have continued this line of research since this paper, moving towards the first CRF-based word recognition experiments ever done.

See also:

E. Fosler-Lussier and J. Morris, "CRANDEM systems: Conditional Random Field Acoustic Models for Hidden Markov Models," International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, Nevada, 2008.
J. Morris and E. Fosler-Lussier, "CRANDEM: Conditional Random Fields for Word Recognition," Proceedings of Interspeech, Brighton, UK, 2009.
I. Heintz, E. Fosler-Lussier, and C. Brew. "Discriminative Input Stream Combination for Conditional Random Field Phone Recognition," IEEE Transactions on Audio, Speech, and Language Processing, 18:8, pp 1533-1546, 2009.

E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo. "A Framework for Predicting Speech Recognition Errors," Speech Communication issue on Pronunciation Modeling and Lexicon Adaptation, 46:2, pp. 153-170, 2005.

Much of the work above is devoted to methods of modeling the acoustic-phonetic variation inherent in speech, in order to build better speech recognition models. However, a slightly different way of thinking about variation is to consider the variation in patterns of errors made by a speech recognizer due to many factors (for example, errors due to inherent speech variation, errors caused by poor acoustic/lexical models, or search errors). This paper focuses on methods to predict errors made by speech recognition systems, even when we only have a text transcript (i.e., no audio); the proposed framework is flexible enough to allow for different prediction models to characterize system performance. The impact of this technology has allowed us and others to train discriminative language models that directly optimize system error rate (rather than data likelihood) using a large amount of textual data.

See also:

P. Jyothi and E. Fosler-Lussier, "A Comparison of Audio-free Speech Recognition Error Prediction Methods," Proceedings of Interspeech, Brighton, UK, 2009.
P. Jyothi and E. Fosler-Lussier, "Discriminative Language Modeling Using Simulated ASR Errors," Proc. Interspeech, 2010.

Research

My students and I work in a number of areas in speech and language processing, including...

Teaching

I teach a number of the Artificial Intelligence Courses at OSU:

Publications

Recent publications include

Recent News

Useful Stuff

Some things we've done

Current Research Students

Ph.D. Graduates

Postdocs

MS/BS Graduates

About the Makhoul Professorship

Bio

Appointments

Professional Activities

Introduction

Joining the lab

Selected papers (with commentary)

D. Bagchi, P. Plantinga, A. Stiff, and E. Fosler-Lussier, "Spectral feature mapping with mimic loss for robust speech recognition,", ICASSP 2018.

D. Newman-Griffis, A. Lai, and E. Fosler-Lussier, "Jointly embedding entities and text with distant supervision," Proceedings of the 3rd Workshop on Representation Learning for NLP, 2018.

J.K. Kim, Y.B. Kim, R. Sarikaya, and E. Fosler-Lussier "Cross-lingual transfer learning for POS tagging without cross-lingual resources," EMNLP 2017.

Y. He and E. Fosler-Lussier. "Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition," Interspeech 2015, Dresden, Germany, 2015.

R. Prabhavalkar, E. Fosler-Lussier, and K. Livescu, "A Factored Conditional Random Field Model for Articulatory Feature Forced Transcription," IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

P. Jyothi, E. Fosler-Lussier, and K. Livescu, "Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks," Interspeech 2012.

W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. Wang, "A Direct Masking Approach to Robust ASR," IEEE Transactions on Acoustics, Speech, and Language Processing, 21:10, pp 1993-2005, Oct 2013.

P. Raghavan, E. Fosler-Lussier, N. Elhadad, and A. Lai, "Cross-narrative Temporal Ordering of Medical Events," Association for Computational Linguistics Annual Meeting, 2014.

E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, "Conditional Random Fields in Speech, Audio and Language Processing," Proceedings of the IEEE, 101:5, pp 1054-1075, 2013.

J. Morris and E. Fosler-Lussier. "Conditional Random Fields for Integrating Local Discriminative Classifiers," IEEE Transactions on Audio, Speech, and Language Processing, 16:3, pp 617-628, March 2008.

E. Fosler-Lussier, I. Amdal, and H.-K. J. Kuo. "A Framework for Predicting Speech Recognition Errors," Speech Communication issue on Pronunciation Modeling and Lexicon Adaptation, 46:2, pp. 153-170, 2005.

Teaching: Classes taught at Ohio State

Resources from my classes that might be useful

Fun things about me