Speech and Language in Multimedia

home
committees
call
program
author kit

ACM Multimedia 2015 Workshop

Speech, Language and Audio in Multimedia

Speech, language and audio meet computer vision

The third workshop on Speech, Language and Audio in Multimedia (SLAM) aims at bringing together researchers working in speech, language and audio processing to analyze, index and access multimedia data. Multimedia data are now available in enormous volumes in a wide variety of formats and qualities, from professional content to user-generated ones: Lectures, meetings, interviews, debates, conversational broadcast, podcasts, social videos on the Web, etc. Such data, along with the associated use scenarios, raise specific challenges: Robustness facing the high variability in quality; Efficiency to handle very large amount of data; Semantics shared across modalities; Potentially high error rates in transcription; etc. Worldwide, several national and international research projects are focusing on audio and language analysis of multimedia data. Similarly, various benchmark initiatives have devoted effort to offering tasks related to multimodal multimedia challenges (e.g., TRECVid, CLEF, MediaEval).

Following SLAM 2013 in Marseille, France, and SLAM 2014 in Pinang, Malaysia, both collocated with the Interspeech conference, SLAM 2015 naturally moves to the multimedia community! To make the most of the collocation with ACM Multimedia, the workshop features a dedicated session to highlight work on multimodality and fusion, at the intersection of speech, audio, language and computer vision.

SLAM gathers players from the fields of speech and audio processing and of multimedia to share recent research results, discuss ongoing and future projects, explore potential areas for interdisciplinary collaboration or sharing or ideas, and develop new benchmarking initiatives of mutual interest to multimedia and language researchers. We expect contributions on ongoing research work, project descriptions, evaluation initiatives, demonstrations and applications emphasizing the speech and/or language and/or audio contribution to any type of multimedia technology. Topics of interest include (but are not limited to):

Audio event detection and audio classification
Speech recognition deploying multimedia information sources
Audio-aware genre analysis and classification
Multimodal speaker identification and clustering
Multimodal content retrieval
Speech and audio aware content segmentation and structuring
Audio indexing and fingerprinting
Natural language processing for multimedia
Entity extraction, keyword extraction, etc.
Summarization and hyperlink generation
Multimodal fusion and integration involving audio
Generation of descriptive text for multimedia
Speech and audio multimedia applications and services
Evaluation data and benchmarks
Large scale speech and audio analysis
Navigation in multimedia content with audio or language

As a special focus of SLAM 2015, we particularly welcome contributions on video hyperlinking, as a case study where the speech and language modalities are complemented by audio and vision.

Donwload the call for paper for more details. Check the author kit and submission procedure.

Important dates

	Paper submission deadline	extended to July 22, 2015
	Notification of acceptance	August 2, 2015
	Camera ready	August 10, 2015
	Workshop	October 26 or 30, 2015

The SLAM workshop series is organized by the Special Interest Group on Speech and Language in Multimedia of the Intl. Speech Communication Association, with support from the IEEE SIG on Audio and Speech Processing in Multimedia.

last updated: July 2015