Speech, Language and Audio in Multimedia
language and audio meet computer vision
The third workshop on Speech, Language and Audio in Multimedia (SLAM) aims at bringing together researchers working in speech, language and audio processing to analyze, index and access multimedia data. Multimedia data are now available in enormous volumes in a wide variety of formats and qualities, from professional content to user-generated ones: Lectures, meetings, interviews, debates, conversational broadcast, podcasts, social videos on the Web, etc. Such data, along with the associated use scenarios, raise specific challenges: Robustness facing the high variability in quality; Efficiency to handle very large amount of data; Semantics shared across modalities; Potentially high error rates in transcription; etc. Worldwide, several national and international research projects are focusing on audio and language analysis of multimedia data. Similarly, various benchmark initiatives have devoted effort to offering tasks related to multimodal multimedia challenges (e.g., TRECVid, CLEF, MediaEval).
Following SLAM 2013 in
Marseille, France, and SLAM 2014 in
Pinang, Malaysia, both collocated with the Interspeech
conference, SLAM 2015 naturally moves to the
multimedia community! To make the most of the collocation
with ACM Multimedia,
the workshop features a dedicated
session to highlight work on multimodality and fusion,
at the intersection of speech, audio, language and
SLAM gathers players from the fields of
speech and audio processing and of multimedia to share recent research results, discuss
ongoing and future projects, explore potential areas for
interdisciplinary collaboration or sharing or ideas, and
develop new benchmarking initiatives of mutual
interest to multimedia and language researchers. We expect
contributions on ongoing research work, project
descriptions, evaluation initiatives, demonstrations and
applications emphasizing the speech and/or language and/or
audio contribution to any type of multimedia technology.
Topics of interest include (but are not limited to):
As a special focus of SLAM 2015, we particularly welcome contributions on video hyperlinking, as a case study where the speech and language modalities are complemented by audio and vision.