Nitin "Nick" Sawhney, David Balcom and Ian Smith
The Georgia Institute of Technology
School of Literature, Communication, and Culture
College of Computing
Atlanta, GA 30332-0165 USA
Paper presented at Hypertext '96: Seventh ACM Conference on Hypertext
Recipient of the first Engelbart Best Paper Award at Hypertext '96 (March 20, 1996)
Visit HyperCafe (a link to the HyperCafe website)
Abstract -- A visit to HyperCafe -- Introduction -- Related Work
Conceptual Design of HyperCafe -- Design Aesthetic -- Navigation and Structure
Link Opportunities in HyperCafe -- Temporal Link Opportunities -- Spatial Link Opportunities -- Interpretative Textual Links
Intersection of Hypertext and Film/Video -- HyperCafe and Hypertext -- HyperCafe and Film
Content Production and Development -- Towards a Hypervideo Tool -- Future Work
HyperCafe is an experimental hypermedia prototype, developed as an illustration of a general hypervideo system. This program places the user in a virtual cafe, composed primarily of digital video clips of actors involved in fictional conversations in the cafe; HyperCafe allows the user to follow different conversations, and offers dynamic opportunities of interaction via temporal, spatio-temporal and textual links to present alternative narratives. Textual elements are also present in the form of explanatory text, contradictory subtitles, and intruding narratives. Based on our work with HyperCafe, we discuss the components and a framework for hypervideo structures, along with the underlying aesthetic considerations.
KEYWORDS: Aesthetics, multi-threaded narratives, navigation,
temporal links, digital video.
You enter the Cafe, and the voices surround you. Pick a table,
make a choice, follow the voices. You're over their shoulders
looking in, listening to what they say-you have to choose,
or the story will go on without you.
Welcome to HyperCafe. The experience, the aesthetic, is choice:
you decide when to listen, when to leave. You decide. The camera
moves from one table to the next, and opportunities (what, in
another context, J. Yellowlees Douglas would call a "narrative
of possibilities" [11]) present themselves to you. Select
a conversation, and the navigation pan fades into a close-up,
two men talking, and one's saying to the other, "in
fact, our words over here don't affect their words over there."
Another table comes into view, another possible conversation,
next to the first. After a few moments, the second table fades
away, if not selected. The link unrealized (which is a choice
in itself), the story continues: "I find that highly questionable,"
the other man says. "What if I yell fire? What then? It seems
my words have a great effect on [motions to table behind him]
these people over here [motions to another table], these people
over here."
Another table comes into view. A man with a thin beard is standing
over a young woman with blond hair. You choose the second table
and the first story fades back. Thin beard begins to speak. "Do
you remember me?" he asks the blond woman. As possible narratives
are realized by your touch, and the story forms around your choices,
the question lingers. Do you remember me?
Another table, another opportunity to move.
"Hypervideo" is digital video and hypertext, offering
to its user and author the richness of multiple narratives, even
multiple means of structuring narrative (or non-narrative),
combining digital video with a polyvocal, linked text. We have
redefined the notion of links for a video-centric medium, where
they become spatial and temporal opportunities in video and text.
In this paper we will discuss HyperCafe as the basis for
a broader discussion of the narrative and aesthetic structures
afforded by hypervideo. These structures and navigational methodologies
later help us develop a conceptual framework for hypervideo. Our
aim is to provide new thinking for a new mode of expression, much
as George Landow did for hypertext: "Hypermedia, which changes
the way texts exist and the way we read them, requires a new rhetoric
and a new stylistics." [21] Now, too, does hypervideo.
Our primary influence is the hypertextual framework found within
Storyspace [30] where the relationships among the spatially
organized writing spaces become part of the content, permitting
a duality of the writing space [4]. By constructing our script
in Storyspace (see Figure 8), we were able to exploit the
duality of structure and content to create a representation of
linked "narrative video spaces." Synthesis [28]
based on Storyspace, was used to index and navigate analog
video content associated with text in writing spaces. In Hyperspeech
[2], recorded audio interviews were segmented by topic and
the related comments were linked. Like HyperCafe, the system focused
on "conversational interactions"; one user of the system
felt that she was "creating artificial conversations"
between participants.
Video-to-video linking was earlier demonstrated in the hypermedia
journal Elastic Charles [8], developed at the Interactive
Cinema Group (MIT Media Lab), where Micons (miniaturized movie
loops) would briefly appear to indicate video links. This prototype
relied on analog video and laser disc technology requiring the
use of two screens. Digital video today permits newer design and
aesthetic solutions, such as considered in the design of the Interactive
Kon-Tiki Museum [22]. Rhythmic and temporal aspects were stressed
to achieve continuous integration in linking from video to text
and video to video, by exchanging basic qualities between the
media types. Time dependence was added to text and spatial simultaneity
to video. Yet, unlike HyperCafe, moving text was not utilized
and video links were represented by pictures of the video on static
buttons. Temporal opportunities in HyperCafe permit
only a temporal window for navigating links in video and text,
as an intentional aesthetic. The nature of the video content in
HyperCafe allows us to consider new conventions for indication
of temporal and spatio-temporal opportunities in hypervideo. Opportunities
exist as dynamic video previews and as links within the frame
of moving video. Control over the video content permitted use
of additional camera techniques to add continuity between video
to video links via "navigational bridges."
Time-based, scenario-oriented hypermedia has been demonstrated
in VideoBook [26][27]. Here multimedia content was specified
within a nodal structure and timer driven links were automatically
activated to present the content, based on the time attributes.
Hardman et al. [15] utilize timing to explicitly state the source
and destination contexts when links are followed. Synchronizing
media elements is both time consuming and difficult to maintain.
A system called Firefly [9] allowed authors to create hypermedia
documents by manipulating temporal relationships among media elements
at a high level rather than as timings. In HyperCafe, we
deal with the presentation of temporal links within a continuity
based on film aesthetic. We later discuss future work towards
a toolkit for specifying such temporal links in hypervideo. These
temporal links permit the possibility of presenting alternative
(multi-threaded) narratives in hypervideo. Agent Stories [3]
was developed at the Interactive Cinema Group (MIT Media Lab)
as a tool for creating multi-threaded story structures, built
on knowledge representation of characters in the story. A multi-threaded
documentary was demonstrated in CyberBELT [5], where the
documentary evolved with feedback from the user, based on dynamic
weights assigned to video clips. Although user-defined narratives
are utilized in HyperCafe, our focus is on the "presentation"
of aesthetic navigational structures, where intentional chance
(our intention, their chance) and simultaneous narratives can
create new interpretative cinematic experiences.
This section offers a detailed discussion of the design, navigation,
and linking mechanisms in HyperCafe.
HyperCafe has been envisioned primarily as a cinematic
experience of hyper-linked video scenes. The video is shown in
black and white to produce a film-like grainy quality. In HyperCafe,
the video sequences play out continuously, and at no point can
they be stopped by actions of the user. The user simply navigates
through the flow of the video and links presented. This aesthetic
constraint simulates the feeling of an actual visit to a cafe
where the "real-time video" of the world also plays
out continuously. A minimalist interface is employed by utilizing
few explicit visual artifacts on the screen. All the navigation
and interaction is permitted via mouse movement and selection.
For instance, changes in the shape of the cursor depict different
link opportunities and the dynamic status of the video. By minimizing
the traditional computer-based artifacts in the interface and
retaining a filmic metaphor, we hope to provide the user with
a greater immersion in the experience of conversations in the
cafe. Specific instances of utilizing an intentional design aesthetic
in HyperCafe will be discussed further.
When the user first enters the HyperCafe, a overview shot
of the entire scene is revealed, with a view of all the participants
and tables in the cafe. The low hum of voices can be faintly heard,
suggesting conversation. Moving text at the bottom of the screen
provides subtle instructions. The camera moves to reveal each
table (3 in all), allowing the user 5-10 seconds to select any
conversation (Figure 1). The video of the cafe overview scene
plays continuously, forwards and then backwards, until the user
selects a table (the audio here is distinct from the video, and
remains a constant hum of conversation). Once a choice is made,
the user is placed in a narrative sequence determined by the conversations
in the selected table. The user may be returned to the main cafe
sequence at the end of the conversations (this is but one possibility);
a specific conversation may trigger other related narrative events,
that the user can choose to follow.
The user navigates through the logical structure of HyperCafe
via a hierarchy of video scenes creating linked narrative sequences.
At the top level, the main cafe sequence provides access to all
other possible narratives. At the second level, conversational
narratives for each table are available. A single conversation
is chosen randomly if a particular table is selected. The third
level consists of a sequential stream of video scenes (representing
a conversation) with links to other video conversations in the
current or in different tables (i.e., links within or across the
hierarchical nodes). Additional aspects of hypervideo structures
will be comprehensively discussed in the section "Framework
for Hypervideo."
HyperCafe employs several different types of linking mechanisms,
both aesthetically and programmatically.
With HyperCafe, we offer a departure from hypertext, as
it has been presented thus far in popular literature, with our
temporal realization of link opportunities. A notable exception
of a hypertext with temporal links is Dreamtime [25] by
Stuart Moulthrop. As we mentioned earlier, previous work [26][9]
stresses time attributes and temporal schedules for Hypermedia
rather than aesthetic navigational support of temporal opportunities.
Traditional hypertext presents users with several text or image-based
links simultaneously, and opportunities in any one node are available
concurrently. In narrative situations where we have employed temporal
linking, the story proceeds sequentially and opportunities are
presented temporally in the form of one or more previews of related
video scenes that dynamically fade-in, determined by the context
of the events in the current video scene (Figure 2). The user
is given a brief window of time (3-5 seconds) to pursue a different
narrative path. If the user makes a selection the narrative is
redirected to a labyrinth of paths available within the structure
of HyperCafe, otherwise the predetermined video sequence
continues to play out.
Figure 2: As one conversation is shown (video of man on the
bottom left), two new temporal opportunities briefly appear (on
the top and right) at different points in time. One of the new
conversations can now be selected (within a time-frame) to view
the related narrative, otherwise they will both disappear.
In some scenes, alternative camera angles can be selected to change
filmic perspective and view conversational reactions. Presence
of alternative points of view could be shown via camera icons
that dynamically appear within the frame of the video, and point
in the appropriate direction of view.
In some video scenes, the user can explore the filmic depth of
the scene to reveal spatial link opportunities that can trigger
other video sequences, such as conversations in a specific table
in the background (Figure 3). Such opportunities are found
within the frame itself, where spatial positioning of the
conversants in time, recalls or uncovers related interactions,
when activated. These spatial opportunities are implemented as
dynamically available (transparent) objects associated with specific
aspects in the video frame. With the recent progress in video
segmentation and content analysis techniques, it is conceivable
that objects in the frame of moving video could be automatically
detected and tracked in real-time. For a good discussion of approaches
to parsing and abstraction of visual features in video content,
see the recent work by Zhang et al. [32].
Generally, we feel that HyperCafe requires a more exploratory
form of interaction to effectively utilize spatio-temporal links.
The user could be made aware of the presence of spatial link opportunities
by three potential interface modes: flashing rectangular frames
within the video, changes in the cursor, and/or possible playback
of an audio-only preview of the destination video when the mouse
is moved over the link space. Several large and overlapping rectangular
frames could detract from the aesthetic of the film-like hypervideo
content, yet the use of cursor changes alone requires the user
to continuously navigate around the video space to find links.
The cursor-only solution is similar to that utilized by Apple's
QuickTime VR interface for still images, yet navigation
in hypervideo is complicated by moving video (and hence dynamic
link spaces). One solution is to employ a combination of modes,
where the presence of spatio-temporal links is initially provided
by an audio preview in a specific stereo channel to indicate general
directional cue. Then changes in the cursor prompt the user to
move around the video space, and actual link spaces are shown
with a cursor change, coupled, if necessary, with a brief flash
of a rectangular frame around the link space. Overlapping link
spaces are shown only temporally, and not simultaneously. We are
still evaluating which mode or combination of modes is best suited
for spatio-temporal links.
Figure 3: The main video narrative (on the left) shows a table
with two men in the background. A spatio-temporal opportunity
in the filmic depth of the scene triggers another narrative.
Spatio-temporal links also permit the user to select the background
of some scenes to return to the main cafe sequence. An "exit
space" here allows the user to leave HyperCafe.
It must be noted that the destination of temporal or spatio-temporal
links may either be entire video scenes or particular frames within
a video scene.
Figure 4: A video collage or "simultaneity" of multiple
colliding narratives, that produce other related narratives when
two or more video scenes semantically intersect on the screen.
Several aesthetic effects enhance the reading of the videotext.
A video wall of conversations permits users to activate different
segments of the videos, dynamically joining them together into
new conversations. A "collision space" allows
users to drag moving video (Figure 4) or scrolling lines of text.
Previous work at the Interactive Cinema Group (MIT Media Lab)
[12] spatially organized video clips, using the Collage
notepad, as a way of understanding their content and relationships.
We propose a dynamic representation of text and video where the
intersection of the "image" or text would trigger other
video images and textual narratives. This creates a simultaneity
of multiple "closed" or intersecting narratives [29]
along spatio-temporal dimensions. Such interaction between the
user and the content produces an entirely new "videotext."
We first summarize definitions of some useful terms before we
present a broader framework for hypervideo.
Scene: The smallest unit of hypervideo, it consists of
a set of digitized video frames, presented sequentially.
Narrative sequence: A possible path through a set of linked
video scenes, dynamically assembled based on user interaction.
Temporal links: A time-based reference between different
video scenes, where a specific time in the source video triggers
the playback of the destination video scene.
Spatio-temporal links: A reference between different video
scenes, where a specified spatial location in the source video
triggers a different destination video at a specific point in
time.
Temporal link opportunities: Previews of destination video
scenes that are played back for a specified duration, at specified
points in time during the playback of the source video scene.
Spatial link opportunities: A dynamic spatial location
in the source video that can trigger destination videos if selected.
Now we can describe the possible range of structures that can
be formed within a general framework of hypervideo. While the
Amsterdam Hypermedia Model [14] provides a more generalized approach
to the representation and synchronization of media data types,
our framework provides a more specific structure for presenting
narrative sequences using hypervideo. Overall, it is our intent
that a hypervideo structure be able to embody sufficient definition
and abstraction to permit the creation and navigation of a network
of hyper-linked video scenes. The scenes may involve conversations
between multiple participants, distributed within the space or
time of the video narratives. It should be possible for the user
to explore the narrative spaces of the hypervideo while allowing
the developer/author sufficient control of the system to create
desired narrative and aesthetic effects.
At the lowest level of the hypervideo, frames of digitized video,
are assembled into logical units called scenes. An example of
a scene may be a character in the narrative walking to a door,
opening it and exiting the room. Scenes themselves are assembled
into larger structures which are displayed "end to end"
to form narrative sequences as shown in Figure 5. There are no
restrictions on either the size of scenes or narrative sequences,
or the number of scenes that make up a narrative sequence, except
that there must be at least one scene in each. However, scene
connections can also embody contextual information present in
the narrative, allowing multiple destinations (multivalent
links) as a scene plays out. Such contextual information permits
decisions based on chance (for a random selection of the next
scene within the context), the number of previous visits to that
scene, or whether or not the user has previously visited other
scenes in the "video space." These types of "decision
points" in the narrative are closely related to Zellweger's
programmable paths and variable paths [31].
Implicit in this definition of scene connections is that narrative
sequences may "share" scenes as demonstrated in Figure
6. Here, scene 7 is utilized in both narratives and the decision
whether scene 8 or scene 12 follows is based on the context in
which the choice is evaluated. Note that this usage is similar
in some respects to Michael Joyce's hypertext fiction, afternoon,
a story [19]. A single node in afternoon can assume
multiple meanings if encountered in different contexts. While
the text in the node itself appears the same, context has altered
the meaning.
Figure 5: Linear Narrative Sequence of Scenes
Figure 6: Scenes Shared between Narrative Sequences
Throughout the hypervideo, decisions about the composition of
narrative sequences are being made not just by the developer,
but also by the user. As the author defines the sequence of scenes
in the narrative and their structured relationship to each other,
he designates some of them as link opportunities for the user.
(In general, the number of link opportunities will be substantially
smaller than the number of links connecting scenes, as most scenes
are fairly brief and are simply linked to the next scene of a
narrative for continuity.)
One can imagine that a hypervideo could be defined simply as a
graph of nodes and links, where the nodes of the video correspond
to our scenes. This would be equivalent to removing the outer
boxes in Figure 6 which represent the narrative sequences. While
this definition might be sufficient, it fails to convey a significant
semantic aspect of the structures. We consider the body of narrative
sequences a "map," overlaid onto the graph structure
of the hypervideo to provide authors with a familiar (and higher-level)
concept. The narrative sequences in our formulation of hypervideo
is similar to the writing spaces in Storyspace [30] or
the "overviews" used in Brown University's Intermedia
system [21].
Figure 7: Spatial Map of Narrative Sequences
It may be desirable to define other semantic maps to provide alternative
frameworks for the hypervideo structure. An example might be a
spatial map to conceptually guide authors in relating space to
the structure of scenes (via the spatial link opportunities).
The example in Figure 7 shows how each of the exits of a cafe
could result in particular video sequences. Navigational bridges,
consisting of specific camera pans around the cafe, are utilized
between the "narrative video spaces." This abstract
spatial map allows the author to more easily visualize the fictive
space of his or her creation.
In this section, we discuss the influences of film and hypertext
to our work, and the implications of this new discursive formation,
hypervideo.
HyperCafe engages Jay David Bolter's notion of "topographic
writing," that is, "writing with places, spatially
realized topics" [7]. While HyperCafe is not an electronic
writing environment, the spatiality and placement of text and
video on the screen are vital to the user's experience. The
user creates his or her own "videotext" artifact with
HyperCafe. The nature of HyperCafe's video
interaction is, in Michael Joyce's words, exploratory (as
opposed to constructive) [18]- i.e., users cannot add their
video work to the body of HyperCafe. This constraint is
largely imposed by the media we are working in: digital video
takes time to produce, from shooting to digitizing, to the disk
storage space required. However, in future prototypes we would
like to make provisions for the user to his or her own text in
selected spaces, and to link it to video clips and textual interactions.
While the Cafe isn't yet a realized constructive hypertext,
it is desirable that the user be able to save their links through
the program, providing for them a memory (or mis-memory) of their
encounters, should they wish to return to the program and recover,
replay, or rewrite those encounters. They are, in Joyce's
words, "versions of what they are becoming, a structure for
what does not yet exist" [18].
As the user moves through conversations and makes choices, the
spatial and narrative contexts necessarily shift: videos play
in different portions of the screen or concurrently, suggesting
relationships between the clips based on proximity, movement,
and absence; text appears and disappears, ghosted annotations
and mock dialogue-revisions. These shifts and events appear based
on user interaction, or are intentionally hidden. The same clip
may play during a "car crash" narrative line as would
play during a "do you remember me?" narrative sequence.
The clip stays the same-the context changes. By recontextualizing
or repositioning identical clips at several points in the program,
we are shifting the meanings of our media, asking the user to
engage in building the text and context, making meaning.
In HyperCafe, there is an inherent determination to make
all chance encounters of the videotext meaningful. The navigation
is thus always "contingent" and the reading is subject
at every moment to "chance alignments and deviations that
exceed the limits of any boundaries that might be called context'."
[16]. J. Yellowlees Douglas, in charting the "narrative of
possibilities" of afternoon, a story, describes the
experience of visiting the same space four times and not realizing
the words were the same, that only the context had changed [11].
Douglas uses afternoon and WOE, also by Joyce, as
examples of Umberto Eco's concept of the open work, or a
work whose possibilities even on multiple readings are not exhausted.
When the user's session with HyperCafe ends, contingencies
remain, based on "indeterminabilities operating between the
gaps of the reading" [16], leaving behind the possibility
of an unexhausted, if not inexhaustible, text.
HyperCafe is not entirely textual, or even primarily textual-it
communicates most of its information to the user with digital
video. We have attempted a marriage of film and hypertext, whose
properties (in avante garde film, at least) are already quite
similar.
HyperCafe implicates cinematic form as one its models/modes
of representation. Our digital video clips are composed entirely
of head and body shots of actors, and of movement between them.
Close-ups are used to convey a sense of emotion or urgency, and
panning long shots-establishing shots-are used to set
the speakers up, showing them in relation to one another. However,
another process of signification is at work with our choice of
long shots. They allow users to navigate between actors and between
stories. When actors are in view at a point where a link space
exists, the user may choose to move there. Thus, the pan signifies
differently in HyperCafe than it would on a movie screen.
A detached pan becomes an opportunity for action, serving as a
navigational bridge to and between narratives.
Instead of complying with the typical shot/reverse shot style
of representing conversation in film, HyperCafe allows
for a new grammar to be defined. Reverse shots can be answered
by an actor on the other side of the screen, from an entirely
different clip. Textual intrusion in the space between could disrupt
or enable the conversation. The mise-en-scene of the computer
screen, then, can be defined in wholly different terms. Shot composition
is no less important in hypervideo, but the choice of elements
with which to compose are quite different than in film.
Just as film form structures the user's interaction with
the text by the filmmaker's choice of shot, lighting, actor's
delivery of lines, and cinematography, so too does the hypervideo
interface structure a reader's formal interaction with the
videotext. Here, a user's assumptions about computing environments
and existing hypermedia applications will structure his or her
assumptions about interaction, assumptions which will invariably
prove different than that user's assumptions while watching
a movie.
While we are clearly departing from traditional film and video
form, questions must be asked: does HyperCafe reveal power/authority/univocal
traits of traditional cinema? What new (if any) forms of representation
are we enabling? These questions must necessarily be answered
by extended evaluation and analysis by these authors and other
interested theorists. Hypervideo is at its infancy; there exists
a unique window of opportunity today to invent this discipline,
to move away from traditionally confining modes of representation,
to support a more open, collaborative body of work.
The content for HyperCafe consists primarily of digital
video, taped with two Super-VHS video cameras (HyperCams 1 and
2). External microphones captured individual conversations and
ambient room noise. The two cameras simultaneously provided two
different perspectives/shot angles for each scene. One camera
generally remained stationary, providing a long shot, while the
other was mobile, providing close-ups and movement within the
shot. Some extreme close-ups (like actor's lips) provided
the desired dramatic effects for particular conversations. Several
top-level and below-level pans (shooting the actors'
feet) were taken to provide navigational bridges between
tables.
After the video shoot, all the video scenes (over 3 hours) were
edited, manually logged and transcribed into Storyspace
[30]. A linear thread through the Storyspace document was
created and later other interpretative hyper-links added (Figure
8). Storyspace served as a powerful tool for assessment
of the video scenes and greatly aided in the editing and selection
of appropriate scenes for use in the digital version. The Storyspace
"hyper-script" was also utilized to create and simulate
(in text only) the multiple narrative sequences through the digital
video. It must be noted that in HyperCafe, we did not utilize
the potentially rich semantic space of the spatial hypertext nodes
[24]; our screen mapping of the conversations closely followed
the general structure of the writing spaces in Storyspace
and the actual layout of the Cafe. It would be interesting to
consider the effect of overloading the arbitrary or intentional
semantics of the writing spaces to influence the architectonic
[20] presentation of the hypervideo.
Figure 8: A portion of the HyperCafe script organized and hyperlinked
using Storyspace
Selected video scenes were captured on a Macintosh PowerPC 8100/80
with Adobe Premiere 4.2 [1] using a 160x120 resolution
at 15 frames per second (fps). A black/white filter was applied
to the video producing a film-like grainy quality. A total of
25 minutes of video was eventually captured, occupying 300 MB
of disk space. These files were segmented into 48 QuickTime movie
files, and compressed via a Cinepak codec, reducing the combined
size of the files to 150 MB. Macromedia Director 4.04 on
the Macintosh [23], was selected as the development platform for
the initial prototype. Director was utilized due to its ability
to control digital video and for rapid prototyping of the design
concepts. All the QuickTime movies were imported into Director's
multimedia database. Ambient restaurant sounds and interpretative
text were added for use in several narrative sequences. Lingo
scripts were written to provide interactivity and hyper-linking.
The completed Director movie was compiled into a projector (movie
player) that could be played back independently on the Mac or
PC platform.
Having developed a proof-of-concept prototype in Macromedia Director,
we believe that a broader software tool can be developed for hypervideo
authoring and navigation. The tool could function similarly to
Storyspace, in that it would permit placement of (pointers
to) digital video and textual content into a hierarchy of hyper-linked
nodes. Several navigational paths through these video nodes could
be authored. Time attributes should be integrated in the hypervideo
model and manipulated at a higher level [9] such that temporal
links can be synchronized with playing video. In the navigation
mode, the software would dynamically generate link opportunities
and permit multiple paths through the video text.
Such a tool should aid both in the pre-production and post-production
of the hypervideo product. In pre-production, when no video has
been shot, the tool could be used to write hypertextual scripts
(in place of the video), and thus build the initial linked hierarchy
of the videotext. The hypertextual scripts would later aid in
editing and selecting appropriate video scenes. In post-production,
after all the video has been shot, edited, and captured, the video
scenes would be simply placed in the appropriate nodes. Additional
interpretative text, temporal, and spatio-temporal links could
be added. It must be noted that temporal links are explicitly
added by the authors, based on their creative intentions about
the videotext. These links later become "opportunities"
that are generated for the user as the videotext is navigated,
based on the state of the user in the system (see MIT's ConTour
system [10], for a more formal approach to dynamic generation
of media elements). The videotext could be navigated or "read"
at any time in the production. In fact, the videotext would never
be complete since each "reader" may be allowed to add
their own interpretative text, voice annotations, and links to
continually expand the hypertext and overall link space. Users
could create several personalized interpretations of the videotext
or collaborate to develop a single multi-user videotext.
HyperCafe demonstrates an application of hypervideo for
the production of fictional narratives. This could be utilized
by writers, film and video producers to develop non-sequential,
hyper-linked video narratives. Layers of textual interpretations
can be added to these narratives enabling richer experiences.
Different characters could be utilized to drive the narrative
along varying paths, such as in Agent Stories [3]. We hope
that this framework sets the stage for new forms of creative expression,
where the interaction and navigation through the videotext itself
creates the aesthetic and personalized interpretations.
A hypervideo tool could also be utilized by filmmakers producing
conventional films. It would offer the filmmaker alternate structures
of conceiving and creating a film than would otherwise be possible
without such a tool. For instance, dynamically previewing alternate
takes and sequences are possible with hypervideo; sequences can
be mapped out and rearranged, much as nonlinear digital video
editing suites enable video editors to do. However, with a built-in
hypertext tool in the package, a hypertext script could be linked
to the video sequences, and a hypervideo tool could be used to
makes hotspots in the video frame, or link related text to a particular
point in a video sequence. The entire process of conceiving a
film script to editing a film, even a linear film, could potentially
be reorganized and integrated using a hypervideo tool.
For any hypermedia tool, issues regarding scalability eventually
become important and sometimes problematic. A reasonable number
of links from one video node to another should be supported. If
at a point in time more temporal links exist than can fit on the
screen, thumbnails should be automatically generated or the number
of simultaneous temporal links constrained. Another concern is
the possibility of "dead ends" to the continuously playing
video narrative (what Terry Harpold considers the "moment
of the non-narrative" [17]). One of the aesthetic goals in
HyperCafe was to never permit a moment where the video
would stop and break the cinematic experience of the user. Yet
as the nodal structure of the videotext grows more complex, the
authors must painstakingly ensure that all video sequences lead
to other sequences, and thus all nodes are non-terminating.
A suitable approach needs to be developed to ensure that any terminating
nodes are returned to either a prior node, the next ordered node
after the prior node, or a parent node. In addition, specific
"filler" sequences could be shot and played in loops
to fill the "dead-ends" and holes in the narratives.
During the video production, innovative camera techniques aided
in producing "navigational bridges" between some scenes
without breaking the filmic aesthetic. Movement between tables
in the cafe was enabled by triggering specifically shot video
segments between the tables. We can point to one recent attempt
to produce continuous digital video navigation in the CityQuilt
[13] program by Tritza Even. In CityQuilt, Even employed
digital video techniques in Adobe Premiere to provide panning
within moving video, permitting a user to navigate across an endless
canvas of video scenes of New York city. We believe that further
development of such analog and digital video production techniques
is essential to effectively address the navigational issues in
hypervideo.
Collaborative hypervideo would require distributed video available
over networks with suitable bandwidth. In 1993, the film Wax
or the discovery of television among the bees. was multicast
via the MBone (multimedia backbone) of the Internet to about 450
sites. Later "Waxweb" [6] was developed as an experimental
hypertext groupware project. This indicates that a system permitting
hypervideo (authoring and navigation) over the Internet would
be desirable. If all the video content is made available on CD-ROM,
then it is feasible that multiple versions of the "videotext
structure" (i.e., player files) could be accessed and modified
by users over the Internet (via WWW or MOO), permitting collaborative
authoring of hypervideo. In a multi-user environment, appropriate
locking schemes would ensure data integrity if users are permitted
to add their own links or text annotations. Some form of version
control may allow users to access all previous evolving links
and annotations, and hence "navigate temporally" through
the historic state of the videotext.
The current system provides navigation of hypervideo in a two-dimensional
space. Previously, tools have been developed to present motion
picture time as a three- dimensional block of images flowing away
in distance and time [12]. For hypervideo, a three-dimensional
space would permit representation of multiple (and simultaneous)
video narratives, where the added dimension could signify temporal
or associative properties. Video sequences that have already been
visited could be represented by layers of (faded) video receding
into the background. Additionally, a series of forthcoming links
and prior (unrealized) links could also be shown in the foreground/background
of the three-dimensional space. A 3D authoring and navigation
tool for hypervideo, perhaps called "VideoSpace" (not
unlike Storyspace for hypertext), could be envisioned for
several interesting applications, such as multi-dimensional hypervideo
information spaces.
The hypervideo system described in this paper provides a glimpse
into the potential of creating dynamic hyper-linked videotext
narratives. An aesthetic design of navigation and structural representation
permits a new form of videotext expression for authors, and interpretative
experiences for readers. HyperCafe is unique in that it
presents "aesthetic opportunities" for navigating to
linked narratives, primarily on a spatial and temporal basis.
The presence of temporal and spatio-temporal link opportunities
creates a new grammar for hypermedia applications, based on a
cinematic language. In HyperCafe the textual narratives
intersect with dynamic video sequences, producing interpretative
videotexts. The techniques and methodologies for navigation in
the "video space" are experimental and untested, yet
provide an appealing minimalist approach. We hope that other developers
and theorists will consider the definition and navigational structures
we have mapped for hypervideo in constructing their own hypermedia
applications and creative expressions.
HyperCafe would not have been possible without the guidance
and assistance of faculty and several of our colleagues at the
Information Design and Technology program at Georgia Tech. We
must first thank Matthew Causey for his assistance in the video
production, and his continued enthusiasm for the project. Terry
Harpold provided valuable input during prototype design and development.
We wish to thank Melissa House and Mary Anne Stevens for their
camera work; Kelly Allison Johnson, Shawn Elson, Carolyn Cole,
and Andy Deller, for their talent and patience as actors in our
video production. Thanks to Jay David Bolter, Andreas Dieberger,
Terry Harpold, and Stuart Moulthrop for their invaluable comments
on this paper.
Last Update: May 5, 1996
A VISIT TO HYPERCAFE
INTRODUCTION
Related Work
CONCEPTUAL DESIGN OF HYPERCAFE
Design Aesthetic
Navigation and Structure
Figure 1: As the camera continually pans across the cafe, many
opportunities exist to select a single table of conversation and
navigate to the related video narratives.
Link Opportunities in HyperCafe
Temporal Link Opportunities
Spatial Link Opportunities
Interpretative Textual Links
Textual narration that is
annotated to specific video scenes and links between scenes constitutes
interpretative text. Such text appears at specific times while
the associated video is being played. Longer lines of text appear
scrolling horizontally across the bottom, with the directional
movement dynamically controlled by the location of the cursor.
The text represents associative links based on related discussions
of the conversants among different tables. In some cases the text
simply represents random bits of the dialog or even the actual
script of the narrative, sharing the same space as the production
video. Text intrudes on the video sequences, to offer commentary,
to replace or even displace the videotext. Words, spoken by the
participants are subverted and rewritten by words on the screen,
giving way to tension between word and image. Traditional hypertext
links also permit navigation to related scenes of the videotext.
Framework for Hypervideo
INTERSECTION OF HYPERTEXT AND FILM/VIDEO
HyperCafe and Hypertext
HyperCafe and Film
CONTENT PRODUCTION AND PROTOTYPE DEVELOPMENT
TOWARDS A HYPERVIDEO TOOL
FUTURE WORK
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES