ACM Multimedia

author kit


ACM Multimedia 2015 Workshop

Speech, Language and Audio in Multimedia

Speech, language and audio meet computer vision

The third workshop on Speech, Language and Audio in Multimedia (SLAM) aims at bringing together researchers working in speech, language and audio processing to analyze, index and access multimedia data. Multimedia data are now available in enormous volumes in a wide variety of formats and qualities, from professional content to user-generated ones: Lectures, meetings, interviews, debates, conversational broadcast, podcasts, social videos on the Web, etc. Such data, along with the associated use scenarios, raise specific challenges: Robustness facing the high variability in quality; Efficiency to handle very large amount of data; Semantics shared across modalities; Potentially high error rates in transcription; etc. Worldwide, several national and international research projects are focusing on audio and language analysis of multimedia data. Similarly, various benchmark initiatives have devoted effort to offering tasks related to multimodal multimedia challenges (e.g., TRECVid, CLEF, MediaEval).

Following SLAM 2013 in Marseille, France, and SLAM 2014 in Pinang, Malaysia, both collocated with the Interspeech conference, SLAM 2015 naturally moves to the multimedia community! To make the most of the collocation with ACM Multimedia, the workshop features a dedicated session to highlight work on multimodality and fusion, at the intersection of speech, audio, language and computer vision.

SLAM gathers players from the fields of speech and audio processing and of multimedia to share recent research results, discuss ongoing and future projects, explore potential areas for interdisciplinary collaboration or sharing or ideas, and develop new benchmarking initiatives of mutual interest to multimedia and language researchers. We expect contributions on ongoing research work, project descriptions, evaluation initiatives, demonstrations and applications emphasizing the speech and/or language and/or audio contribution to any type of multimedia technology. Topics of interest include (but are not limited to):

  • Audio event detection and audio classification
  • Speech recognition deploying multimedia information sources 
  • Audio-aware genre analysis and classification
  • Multimodal speaker identification and clustering
  • Multimodal content retrieval
  • Speech and audio aware content segmentation and structuring 
  • Audio indexing and fingerprinting
  • Natural language processing for multimedia
  • Entity extraction, keyword extraction, etc. 
  • Summarization and hyperlink generation
  • Multimodal fusion and integration involving audio 
  • Generation of descriptive text for multimedia
  • Speech and audio multimedia applications and services 
  • Evaluation data and benchmarks
  • Large scale speech and audio analysis
  • Navigation in multimedia content with audio or language

As a special focus of SLAM 2015, we particularly welcome contributions on video hyperlinking, as a case study where the speech and language modalities are complemented by audio and vision.

Donwload the call for paper for more details. Check the author kit and submission procedure.

Important dates

Paper submission deadline                  
extended to July 22, 2015

Notification of acceptance
August 2, 2015

Camera ready
August 10, 2015

October 26 or 30, 2015

              SIG logo

The SLAM workshop series is organized by  the Special Interest Group  on Speech and Language in Multimedia of the Intl. Speech Communication Association, with support from the IEEE SIG on Audio and Speech Processing in Multimedia.

last updated: July 2015