We present EditIQ, a completely automated framework for
cinematically editing scenes captured via a stationary, large
field-of-view and high-resolution camera. From the static camera feed,
EditIQ initially generates multiple virtual feeds, emulating a team
of cameramen. These virtual camera shots termed rushes are
subsequently assembled using an automated editing algorithm, whose
objective is to present the viewer with the most vivid scene content. To
understand key scene elements and guide the editing process, we employ a
two-pronged approach: (1) a large language model (LLM)-based dialogue
understanding module to analyze conversational flow, coupled with (2)
visual saliency prediction to identify meaningful scene elements and
camera shots therefrom. We then formulate cinematic video editing as an
energy minimization problem over shot selection, where cinematic
constraints determine shot choices, transitions, and continuity.
EditIQ synthesizes an aesthetically and visually compelling
representation of the original narrative while maintaining cinematic
coherence and a smooth viewing experience. Efficacy of
EditIQ against competing baselines is demonstrated via a
psychophysical study involving twenty participants on the
BBC Old School dataset plus eleven theatre performance videos.