Predicting
music popularity patterns based on
musical complexity and early stage
popularity Junghyuk
Lee and Jong-Seok Lee
11 :15
SpeakerLDA:
Discovering Topics in Transcribed
Multi-Speaker Audio Contents Damiano
Spina, Johanne R. Trippas,
Lawrence Cavedon and Mark
Sanderson
11 :45
Acoustic
adaptation in cross database audio
visual SHMM training for phonetic
spoken term detection Shahram Kalantari, David
Dean, Sridha Sridharan, Houman
Ghaemmaghami and Clinton Fookes
12 :15
Evaluation Data,
Benchmarks, and Activities for
Cascaded Speech Recognition and
Extraction of 35 Entities: Content
Capturing, Segmentation, and
Structuring of Verbal Clinical
Handover Liyuan
Zhou, Hanna Suominen and Leif
Hanlen
Poster
Score Propagation based on
Similarity Shot Graph for Improving
Visual Object Retrieval Juan
Manuel Barrios and Jose
M. Saavedra
12:45 -
14:00
Lunch
break
14:00 -
16:00
Hyperlinking
session :Vision
meets speech and language
14 :00
Convenient
Discovery of Archived Video Using
Audiovisual Hyperlinking
Roeland
Ordelman, Robin Aly, Maria
Eskevich, Benoît Huet and Gareth
Jones
14 :30
Audio Information for
Hyperlinking of TV content Petra Galuščáková and Pavel
Pecina
15 :00
Hierarchical
topic models for language-based
video hyperlinking Anca-Roxana Simon, Guillaume
Gravier, Pacale Sébillot, Rémi
Bois, Emmanuel Morin and Sien
Moens
15 :30
Exploring Video
Hyperlinking in Broadcast Media Maria Eskevich,
Quoc-Minh Bui, Hoang-An
and Benoît Huet
16:00 -
17:00
Round
table discussion
Keynote
SAIVT-BNEWS: An Australian broadcast news video
dataset for entity extraction, and more David Dean, Queensland University of
Technology, Australia
Recently QUT have released
a set of annotated broadcast news videos
(SAIVTBNEWS) that we have made available
at our website. This presentation will
outline the dataset itself, covering 50 or so
short news clips surrounding a single political
event with many entities appearing in multuple
records, and cover interesting research that QUT
has, is currently, and is interesting in
performing on this dataset in the future. This
presentation will cover existing published
research, including image processing tasks like
face detection, face recognition, face clustering;
and speech processing tasks (including the use of
visual speech) like speech detection, speaker
recognition, and speaker diarisation. We have also
started very interesting research on fusing
multiple sources of information, including
metadata, OCR, faces, speech, scene detection to
improve the performance of many techniques, but
with a focus on improving the automatic extraction
of entities (people, places, companies and
organisations) from large volumes of audiovisual
data, and this will also be covered. As this
dataset is publically available for free to all
researchers, QUT hopes that other researchers will
also be able to make use of, and improve upon this
dataset as well.
Dr David Dean is a Senior
Research Fellow at the Queensland University of
Technology with extensive publication across a wide
range of audio and visual speech processing areas,
with a focus on speaker diarisation, verification
and keyword spotting across multimedia archives.
Since completing his PhD in 2008, on Synchronous
HMMs for AudioVisual Speech Processing, Dr Dean has
worked on a wide range of research projects funded
by Industry, ARC and CRCs, and assisted to
completion 4 PhD research programs.
The
SLAM
workshop series is organized by the
Special
Interest Groupon
Speech
and Language in Multimedia of the
Intl. Speech Communication Association, with
support from the IEEE SIG
on Audio and Speech Processing in Multimedia.