1.
Rhetorical Convergence
How do you describe a Web
service? Often, the easiest solution is to compare it to
earlier media. In the case of a modern news site, we might
say "it is like a newspaper, but you can watch video there,
and it is constantly updated."
As we recognise a Web site
as similar to several earlier media, it is the patterns and
styles of writing, of photographing, of editing we
recognise. All these different kinds of details may be
covered with the term rhetoric. Earlier, I have
suggested the term rhetorical convergence for Web
sites that in this way are "between" earlier media
("Rhetorical Convergence").
Rhetorical convergence
is the combination in one medium of rhetorical forms or
devices that were earlier only seen in separate media. The
present essay is an attempt to chart the ease or difficulty
with which rhetorics in different media combine. Which forms
may be merged, which have to be altered, and which are
mutually exclusive? I will propose a multidimensional
model of media rhetoric, collected in four "axes" that
enable an overview of the different convergences of media
forms.
Along the way we will
examine a long row of examples, all perceived to be mixing
rhetorics of several established media. Each example shows a
form very similar in one respect to what we are used to see
in an earlier medium, but in another respect just like
another earlier medium; just as a daughter may be the
spitting image of her mother, yet no one can deny she has
her father's eyes. To gain an overview, I have collected all
such similarities "in one respect" and "in another respect,"
and tried to reduce the number of "respects" by assembling
them into groups and constructing a theoretical
model.
This is done regularly
within statistics; one may for example measure a set of
objects in many ways with many different instruments. For
the statistician, each reading would then be one dimension.
Using mathematics, the number of dimensions may further be
reduced, for example to three, by projecting the
measurements down to three dimensions. The objects may then
be compared using only these three numbers, a much more
manageable task.
When Web sites are compared
to texts in earlier media, the many aspects that we will
discuss below cannot always be expressed quantitatively. I
am not proposing a mathematical or statistical model, I am
using statistical methods as an analogy for the process of
collecting related aspects into larger
categories.
The approach is not without
precedence in studies of texts either. Barthes' method of
textual analysis, developed over a number of essays and
perfected in S/Z, consisted of dividing the text
into small units, and registering the presence of five
co-present "codes." There are several different sub-types
under each code. Both chronology, descriptions of time and
place, and common sayings are, for example, coded as the
"referential code" as they all refer to common knowledge
within a culture. The code of actions consists of all the
diverse actions performed by the characters in a story. To
explain this, Barthes used the analogy of a polyphonic
musical score, where several voices develop relatively
independently in the same piece. Genette, on the other hand,
used the term dimensions in The Architext: An
Introduction when arguing that genres are recognised by
(at least) shared themes, modes of enunciation and "formal
Ômedium' of imitation (in what language, in what
meter, etc.)" (78), aspects that in their turn are made of
different parts or dimensions, such as different meters or
different enunciative modes (of which all of narratology is
just one part).
I will use the analogy of
"axes" for this way of ordering the large number of
different qualities a text has, so it is easier to get an
overview over all the details. Still, it is not my ambition
to reduce the complexities of architextual
references we here call rhetorical convergence. Rather, I
want to present a perspective that enables us to see the
full range of complex relations, just as Barthes did in
S/Z and Genette in Architext.
In constructing the
following model, I have primarily relied on textual analysis
of Web sites, but earlier literature has been an important
source of inspiration. One cannot read much into the topic
of computer media without noticing all the deliberations on
two themes I will summarise as multimedia and
interactivity. Under the heading multimedia
I put texts about the computer's ability to combine
different sign systems such as writing, images, video and
music. [1]
Although the term interactivity is contested, I use
it here as a heading for literature concerned with how
computer texts respond to user input. [2]
Writers of hypertext theory
and Web design [3]
often comment on the fact that hypertext systems such as the
World Wide Web are simultaneous media that allows for
continuous updates and alterations, and may allow users to
comment or modify texts. This can be treated in relation to
the stress on live broadcasting within radio and television
that is theorised within media studies [4].
A Web site may be live and archive at the same
time.
More practical writers on
Web design [5]
never fail to stress how Web authors are constrained by
technical limitations such as bandwidth, screen resolution
and colour depth. We will see that much of the
"interactivity" in Web sites are actually remedies to
compensate for low screen resolution and
bandwidth.
I have chosen to project the
many dimensions of Web rhetoric on four axes. Multimedia,
interaction, "liveness" and technical limitations on the Web
were refined into four axes of rhetorical convergence and
labelled signification, acquisition, distribution
and restrictions. They are called "axes" for
several reasons; the first is that they are co-present.
Texts and genres are characterised by how authors have
chosen to combine these four aspects. They are different
sides of a text, different dimensions. Furthermore, each of
these four aspects are ranges of choices, on which texts
will position themselves differently. A broadcast is either
live or not. A text may be written, spoken, or both. When
discussing the four axes below, I will discuss different
modalities, or positions, along each axis, and a text's
position will thus be called its "mode of" distribution,
acquisition, restrictions, and signification.
I propose these four axes as
a projection, a way of getting an overview over the many
dimensions of rhetoric. I think of rhetorical convergence as
taking place in a four-dimensional space (to the extent I am
able to visualise such a construct). When we discuss popular
media what we tend to discuss is really their typical
genres. Examples from television might be the news show, the
serial, the soap opera and the talk show, from newspapers,
the report, the feature, and the op-ed. Each of these
media-specific realisations of different genres would occupy
a unique space in my four-dimensional space if their
modalities along the four axes were mapped. Media texts that
combine aspects of different genres would then position
themselves between the positions of established media,
similar to one genre along one axis, similar to another
genre along another axis.
We may thus rephrase
rhetorical convergence as being a form from one medium or
genre that is copied in another medium, but where the copy
differs from the original by substituting its mode on one or
more of the axes with the mode of a form we recognise from a
different medium's rhetoric.
Before the computer, the
mode of distribution, mode of acquisition, mode of
restrictions, and mode of signification were
usually given by technological constraints or conventions of
the medium. What makes rhetorical convergence an interesting
study object is that in a computer medium, few of these
aspects of rhetoric are given. Instead they are ranges of
choices to the author.
2.1. Mode
of Distribution: Bandwidth
Let us use the terms
creation and distribution to describe the
process of getting the messages from author to audience, a
process that takes time. Authors are aware of the time it
takes, and this influences media rhetoric. Newspapers are
printed during the night and distributed to subscribers and
newsstands in the early morning. It is a carefully timed
process, and all journalists have to deliver within deadline
or the newspaper is delayed. In broadcast media, the
distribution from sender to receiver is almost
instantaneous, but the creation of the broadcast takes a lot
more time. Recording and editing sound or video may take a
lot of time, and the raw material has to be transported to
the editing facility, and then to the sender facility. For
fast evolving news, the first reports are likely to be read
from studio or reported on telephone from a reporter in the
field, and only in later newscasts does edited video
material appear. That creation and distribution times
determine the rhetoric is true for all media. But the Web is
fundamentally different from earlier media as its
distribution time is variable.
A Web page is not loaded
before the reader asks for it, by typing in a Web address (a
URI) or following a link. Following Jan L. Bordewijk and Ben
van Kam's typology from the article "Towards a New
Classification of Tele-Information Services" the Web's
traffic pattern is consultation, as it is a
relation with one sender (the server of the page being read)
to (potentially) many receivers, and the receivers initiate
the transaction. In contrast, the traffic pattern of earlier
media is transmission in this model. In order to
see how this difference makes Web rhetoric different, it is
useful to look at three different aspects of distribution
which we will call bandwidth, latency, and
permanence.
As the Internet is a
packet-switching network of many different networks with
different standards, the capacity of transmission, known as
bandwidth, varies enormously. If the files that are
transported over the net are small, however, the user may
not necessarily notice the bandwidth difference; research in
Human-Computer Interaction has shown that we experience
everything that takes less than about a second as
instantaneous (Nielsen 42-44).For static pages (without
sound or moving parts), Nielsen's studies conclude that we
normally do not care much about differences of a couple of
seconds either (44). But some sign systems are slower to
distribute than others. When different sign systems are
coded in computers, the file formats takes up different
amounts of space. Text and numbers are coded extremely
economically. Images are coded less economically, sound
takes up more space than images, and video takes the most
space. Images, sound, and video may be compressed to be less
voluminous, at the expense of image or sound quality. Heavy
compression also requires more computer power, both to "wrap
for shipping" through the net and to "unwrap" or "deflate"
at the receiving end. A slow computer will thus slow down
the display of compressed media, but again, as long as the
decompression takes less than a second, the reader won't
notice.
A Web designer who is aware
of this fact will thus face choices that are unique to
computer communication. Many pictures in a magazine or
quality footage in television may slow down the production
process and bring up production cost, but the images always
arrive in the newsstand with the rest of the magazine.
Television would be a very different experience indeed if we
had to wait for a video segment in television news, while a
talking head was instantaneous. In Web television, this may
very well be the case, so in Web design, "heavy" or "slow"
elements such as big colour pictures or video have to be
used sparingly if the pages are meant to be appreciated by
any Web surfer.
The "interactive feature"
"Sights
and Sounds of the Way West"
in Nationalgeographic.com provides an example of an
aesthetic grown out of bandwidth concerns. "The Way West" is
a movie created solely of still photographs that are "made
moving" as if filmed by a movie camera that moved over each
image, zooming in or out. The projection of the show is
timed to a soundtrack with music, sound effects and
narration.Animating still images by moving would only be
used in mainstream television when there is no video footage
available, as often is the case in history documentaries. In
"The Way West", on the other hand, the whole feature is made
in this manner. In the introduction, for example,
photographer Jim Richardson who also is the narrator
introduces himself in voice-over, illustrated with two shots
of him doing field work. In television, it would be rare
indeed not to show the moving image of Richardson speaking.
On the Web, on the other hand, it makes more sense to do it
in this way, as the still images made moving can be large,
fine-grained and have rich colour while keeping the movie
files relatively small. Video would take up much more
computer memory, and thus require more bandwidth (or
patience during download) to be played.
The introduction sequence of
"The Way West" shows the influence of bandwidth concerns on
rhetoric in more detail. "The Way West" is made in Flash,
and requires that a fairly large file be downloaded before
one can start watching. To cover the download time, the
authors have taken advantage of the fact that writing loads
faster than recorded sound and colour photographs. While
download still is in progress, an animated title sequence
appears.

http://www.natinalgeographic.com/ngm/0009/feature2/media2.html
The title
screen shown above appears one word at a time. After a while
(how long depends on the download), the letters are animated
as "blowing away," accompanied by wind sounds.

A written
introduction to the sequence then appears three lines at a
time. The appearance is timed to the download, and on a fast
connection, the next three lines will appear a little before
one is finished reading the previous at normal reading
speed.


A little slower than a
television introduction, this "splash sequence" as it is
called in industry lingo cleverly fills the download time
with interest, and at the same time makes a smooth
transition to the moving images that are to
follow.
This is just one example of
how slide-motion films such as "The Way West" stand midway
between television and print rhetoric. The introduction
contains more writing than is normal on television, but it
is dynamic, it appears over time. In the body of "The Way
West", narration resembles radio (as it with few exceptions
is totally comprehensible without the images), but is
illustrated; the illustrations are not video but still
images made for print; yet they are not completely still, as
they are viewed through a moving frame. "The Way West" is
not exactly either print, radio, nor television, but borrows
a little from each; and it makes a reasonable compromise of
bandwidth concern and television rhetoric.
The concern for bandwidth is
principally about reducing the time readers have to wait for
a page to load. I call this waiting time latency,
and it is worth considering in itself.
2.2. Mode of
Distribution: Latency
Radio and television - the
broadcast media - eliminated distribution time when they
were introduced. Radio or television signals travel from
sender to receiver in almost no time, and in the years
before storage media such as magnetic tape were introduced
to broadcasting production, there was no delay between
creation and broadcast either. Nowadays, most television
programs are broadcast from an edited tape, but the
possibility (and the history) of broadcasting live is still
what gives television both its particular aesthetic, and its
particular place in our lives, according to Raymond
Williams, John Ellis, and Umberto Eco (Kant and the
Platypus), to mention just a few.
We saw in 2.1 that bandwidth
is a concern to all Web designers, as the Web is slow for
many users. Still, live broadcast is an absolute possibility
on the Web. In order to see the differences and similarities
between broadcasting and Internet media, we have to consider
two different measures of network speed. Computer scientists
speak of bandwidth and latency as two
related, but not identical features. Bandwidth, as
we saw above, is a line's capacity to carry large amounts
information. Latency is the speed with which a
single packet can be transmitted over a certain line. A
simplified explanation could be that latency
measures the time it takes before the user gets the
very first part of a message, while bandwidth
measures the time it takes before the user gets the
very last part, and the message is complete. A physical line
has the same bandwidth all the way, while latency increases
the longer the line is [6].
An example may clarify the distinction: With popular
"streaming media" formats such as Real Media, Windows Media
or QuickTime, it is possible to broadcast live video even
over narrowband modem connections. The image will be smaller
and with less detail than what would be possible on a
broadband connection, but the latency is short enough that
it is reasonable to call it live.
Both latency and bandwidth
may determine the rhetoric chosen, as authors strive to
match their texts to the expected bandwidth and the latency
they wish to obtain. The "Web TV" made by Yahoo!
FinanceVision is an excellent example of
this.
Until 2001,
FinanceVision was a live "Webcast". It was a page
where a video pane showed live streaming video of a talk
show about finance and stock markets in fairly traditional
television style, while stock quotes and other investor
information appeared in another pane whenever a company was
mentioned. FinanceVision's creators chose to make
it a live show, maybe to capture a fast-moving market, maybe
to obtain connotations of "being there when it happens,"
maybe both. The visual style and the semi-scripted
conversation closely resembled what can be seen on
television every day in a lot of countries. At the same
time, keeping the clothes of the presenters in few, clear
colours and having a similarly clean and monochrome studio
made video compression more effective, as there was less
information in the image. Few camera movements and edits,
and programme hosts sitting fairly still added to this. By
ensuring there is as little difference as possible from one
frame of the video to another, compression algorithms work
more effectively to get high quality video over low
bandwidth, while ensuring a short enough latency to call it
live.
Stig Hjarvard defines
live as the near simultaneous events of media
creation, broadcasting and consumption. Near simultaneous,
for in fact, there will always be some latency - for
satellite broadcasts, it may approach a second. Following
this definition, Web cameras are also "live". Web cameras
take snapshots of a scene at regular intervals, for example,
every minute, and the latest image is always available from
the server connected to the camera. Many Web camera pages
have text next to the camera pane reading "watch place x
now!", and the parallel to television cameras used for
surveillance is easily drawn, as Bolter and Grusin have
treated in depth. Bolter and Grusin also point out how
similar the promises of Web cameras are to television's aura
of being live.
But if a Web camera is
"live", then any Web page is live, as it is available from
the moment it is put on a server. During the high-profile
Orderud murder trial in Norway in 2001, tape recorders were
prohibited in the courtroom. Bypassing the ban, a journalist
of the Web newspaper VG Nett did his best to type
everything that was said throughout the trial. Every time he
finished a couple of sentences, he pushed a button, and the
new lines were sent via a wireless Internet connection and
added to the transcript on VG Nett's Web page.
After Hjarvard's definition, this is live, as creation,
distribution and consumption may be almost simultaneous.
What differs is that the events and what is said is mediated
by a journalist, and written by him on a keyboard. This is a
slower and more filtering process than a video camera, and
outside what is normally thought of as live.
These examples show that
latency is a variable on an ungraded continuum. The shortest
latencies will be felt as "live", but what the border values
are cannot be pinpointed exactly. Towards the other end of
the continuum, we find delayed release. Up to this point we
have discussed as if media texts are always distributed as
soon as they are ready, which of course is not always the
case. Most earlier media are periodic, and a story has to
wait to be published in one "issue", whether it is a
printing or a special segment in a broadcasting schedule. On
the Web, a story that is kept from immediate publication can
be released at any point in time; whether to publish at the
same time every day or week, whenever material is ready, or
even "live" is up to the author to choose.
Latency is thus the sum of
two parts: the time from event to broadcast, which in this
example of a live show was (almost) none; and the time from
the user enters the page to the video appears in the video
pane. The first, which we may call creation latency,
is a rhetorical choice the authors made.
Transmission latency, the other part, is a physical
limit of the line, the file sizes, and the speed of the
computers involved.
Like bandwidth, the
transmission latency needs to be considered by Web authors
if they want to achieve a low latency overall, such as live
or almost live distribution of an event. The (rhetorical)
choice of whether to transmit live or not may also be a
consequence of the third aspect of distribution:
permanence.
2.3. Mode of
Distribution: Permanence
A certain kind of live
broadcast that is very widespread on television becomes very
peculiar on the Web: the practise of interviewing someone
live during a newscast. What is interesting for our present
discussion is not the interview as such, but the fact that
it is conducted live during the newscast, while all the
information normally has been known for several hours. Very
little news happen to take place in front of a filming
camera team during the time the newscast's program slot.
Live interviews in television newscasts appear to be very
current, bringing the very latest information, but in fact
they only bring information that has been known for quite a
while, at least by the news desk. More important than the
actual freshness of what is said is the appearance
of freshness. Live interviews project an image of the
television news desk as "standing in the middle of the
stream of current affairs," as Hjarvard puts it.
A live broadcast needs an
audience to be worthwhile, however. It does not seem likely
that a Web news site would put up a Web camera and post a
journalist in front of it, so he could be interviewed live
whenever someone logged on to the server. Live streaming on
the Web is thus normally reserved for special, heavily
marketed events such as sports events or important speeches.
Unlike television news, live Web events have to be
advertised over a time period in order to attract an
audience. On television, the audience shows up every night
to watch the news without being asked. The evening news is a
performance, a spectacle of announcing the events of the
day, heightened by live reports from studio and locations
around the world, performed for the audience that has
gathered. This audience has gathered because it has little
choice; if they turn on the television half an hour later,
the news is over.
It is this lack of
permanence that is the source of television's
peculiar rhetoric of live interviews. After the interview is
finished, it is too late. The live interview is a way to
make up for the lack of permanence by reducing latency. But
this is a trick, a sliding in story time. What actually
happened earlier is turned into an event that happens now;
or rather, an event is staged, which sole content is an
earlier event. On the Web the recording of an interview may
be stored and offered as video-on-demand to readers. It may
become a (relatively) permanent offer. To stage a live
conversation at a point in time with only a random relation
to the time of the actual event makes no sense in a
technology allowing for permanence.
What technologies allow for
permanence, then? Print is in principle permanent, at lest
until the paper withers away. In practise, however, only
libraries are able to keep complete collections of periodic
publications. It seems that in earlier media, permanence
goes hand in hand with latency, both being results of the
distribution technology. The shorter the latency from sender
to receiver is, the shorter the permanence. A live broadcast
is gone when the broadcast is over, while books are kept in
the bookshelves. For print products, the user decides
whether an item should be saved or thrown away; but few
people collect magazines, even fewer save newspapers longer
than about a week, while bound books rarely are thrown
away.
In Web sites, permanence is
a choice of the author. A Web server is a public repository
of files, and any new file may be archived forever if the
author wants it. There are no longer connections between
latency and permanence, as a live video feed may be
available from an archive immediately after the live event
has ended. Most periodic Web sites make use of this
possibility and keep extensive archives of older
material.
When whether to keep an
archive becomes a rhetorical choice, a rhetoric of the
archive becomes a possibility, although not all sites take
the opportunity. In newspapers, few news reports stand very
well by themselves, as most news stories are follow-ups of
earlier stories, bringing new developments, comments or
reactions to issues already reported on. When an old news
article is pulled from an archive, the original context is
often lost, and it requires more thinking and guessing from
the reader to understand what is reported. The requirement
of background knowledge is well known within journalism,
however, so many stories will contain a summary of earlier
events, typically at the end of the article. Thus, for
readers of archived reports, the problem of lack of
background may be reversed: it is common to find several
articles with almost identical summaries of the background.
What is regarded as significant also changes over time. Not
everything deemed newsworthy in the day-to-day running of a
news desk stands the test of time, and in hindsight, it is
easy to see that some developments are more important than
others. Writing history means to summarise, and in an
archive, the lines of developments may be lost in the
details. (Not to mention the many "related articles" that
are found in some automated archives that turn out not to be
related at all.)
In contrast stand those
periodic Web sites that keep background material, overview
articles, biographies and timelines in one place, and
consistently link to this material whenever these issues are
reported on. MSNBC's "interactive" on Palestine,
titled "Searching
for Peace" is an
example of this. From a central timeline, a large number of
maps and overview articles about Israel and Palestine is
available, as well as biographies of key players in
international Middle East politics. Another example that
stands out (despite being old by Web standards) is
Nationalgeographic.com's
1996 feature "Gaza",
where an article about Gaza is linked to parts of earlier
articles from several years of National Geographic
Magazine with great effect. What makes both these sites
different from mere archives of earlier articles is that
they are edited as overview sites, and care is taken to
provide overview and understanding and avoiding redundancy
or excessive detail.
These few examples
demonstrate the whole new range of rhetorical possibilities
that opens with a technology that enables the author to
choose how long a text should be available. Permanence,
together with bandwidth and latency form the axis of
distribution.
2.4. Distribution and
Rhetorical Convergence
A Web form is perceived to
be the result of a rhetorical convergence if it blends
rhetorical properties of two or more genres and media. The
axis of distribution is one set of rhetorical properties: it
describes how a text is transported from author to reader.
It has two related dimensions which find a balance in each
text: the kind and amount of material to be transported; and
the time between writing and reading. (In the case of
reports of real events a third dimension is added: the time
between event and writing.) The axis of distribution is
useful for describing certain kinds of rhetorical
convergence, as may be expressed in this way:
A Web
form with a balance of amount of material and time between
authoring and reading similar to one genre in one medium,
but in other respects similar to another genre in another
medium is perceived as a result of rhetorical convergence
between the two.
We see this in many popular
Web media: continually updated Web newspapers do not change
all articles from one edition to the other, but add new
stories whenever they are ready, in a way we recognise from
news channels on TV or radio. When recorded newscasts are
offered as "video-on-demand", they appear more like print
newspapers, as they may be "read" at any time. Bandwidth
concerns tend to find compromises between video and print
practises, as when moving images are replaced by stills in a
documentary, or when a small video pane is placed on a page
surrounded by paragraphs of writing, as an illustrative
image would be.
Genre theorists will tell us
that any text is understood or recognised via its affinities
with texts we have encountered before. Rhetorical
convergence is when we recognise a single text as being
similar to other typical texts of two or more different
media. A Web newspaper is an example of rhetorical
convergence, as both writing and lay-out is very similar to
print, but its mode of distribution is broadcasting-like.
News in newspapers and television is compared to Web
newspapers and video-on-demand in the table below. By
registering aspects from the axes of distribution and
signification, we see clearly how Web newspapers and
video-on-demand may be seen as results of rhetorical
convergences of print and television news [7].
|
|
Mode of
Distribution
|
Mode of
Signification
|
|
|
|
8 hours
|
approx. 1
week
|
writing, still
images
|
|
≈
0
|
3 years
|
writing, still
images
|
|
≈
0
|
3 months
|
moving images,
speech
|
|
0
|
0
|
moving images,
speech
|
3.1. Mode
of Restrictions: Canvas
The second of the four axes
is the mode of restrictions, the limits of a texts
"canvas." We will use the term canvas for the
material that the signifiers are made of, or in the words of
Umberto Eco's Theory of Semiotics, the continuum
within which the signs are shaped (217). Let us initially
use the term for the paper in print, the silver screen in
cinema, the loudspeakers of a radio, or the screens of
television sets and computers. When these membranes are
written upon or vibrated to produce sound, they have
different physical qualities. Wide-screen cinema has far
better image resolution and colour range than television.
This has implications for the rhetoric; cinematographers may
photograph subtle images that would be wasted on TV, as the
subtlety would be invisible.
The properties of the canvas
are based in technical limitations and possibilities. Why
introduce a special term, canvas, for what seems to
be merely technology? The answer is that the canvas is not
just the technology in use. In using a painter's canvas as a
metaphor, I want to direct attention both to the physical
properties of a technology, as in the way painting on canvas
is different from painting on paper or a dry wall, and to
the choices the author makes, as when a painter cuts the
canvas to the desired proportions before painting, thus
setting the maximum size of the image.
These choices of the author
are restrictions. Like a painter's canvas is a reduction
from the original piece of canvas, any text is realised
within a subset of the possibilities the technology offers.
Each medium has some fixed standards, some flexible ones.
The frame in cinema is normally limited to a few standard
ratios of height and width (American, wide-screen, et
cetera), while the length of a feature is much less
standardised; most feature films are roughly between 90 and
150 minutes long, but there are many exceptions. Television
screen ratios are even more fixed than cinema's, while books
can be of virtually any size a person can handle without
special tools.
Thus, the author sets many
restrictions for the text. It appears to me that we form an
overview of the many properties of the physical appearance
of a text by seeing it as existing within a limited range
and having a limited number of details. The limits of
range and detail operate both in space and
time. Range is the distance between the outer
limits of a continuum for sign-production. A visible text is
restricted within a vertical and a horizontal range. In
addition, it has a range of contrast, the span between the
colours that contrast the most. Sound has little spatial
existence, but is nonetheless restricted at any point in
time by a frequency range (the span between the highest and
lowest frequencies possible) and a dynamic range (the span
between the softest and the loudest sounds).
The range used in a text is
often less than the largest range technology allows. In most
Web sites, the area used in any page is significantly
smaller than my computer screen. This is normal practice, in
order not to annoy those readers who have small screens.
Just as Web designers need to design for different
bandwidths, they also need to allow for the different
screens used by their target audience.
Within the range(s), there
is a lowest level of detail, that is, there is a
lower limit to how small a part can be, which also sets an
upper limit to the number of physical parts. In computer
images, these are called resolution (the largest
possible number of details within a range) and colour
depth (the number of possible colours). Also in
painting - which does not have any graded differences in the
same manner as computer graphics - there is a limit to how
fine a line is, set by the paint, the surface, the brushes
and the painter's technique. These limits are also
rhetorical choices: if the painting is a roof decoration,
for example, the smaller details are not called for, as all
viewers will stand several meters away from the
artwork.
Computer screens have very
low resolution compared to cinema and print [8],
and this puts a limit on the use of detailed graphics. Much
of what is called "interactivity" is rhetorics devised to
reduce or bypass this disadvantage. One example is the
"Congo
Trek" site made by
Nationalgeographic.com. The site is an archive of
more than seventy-five letters from conservationist Michael
Fay, who spent fifteen months walking across the rainforest
of central Africa. A map of the route Fay followed contains
links to all the seventy-five letters, but the details
needed of the map are not visible on a computer monitor. To
remedy this problem, the authors programmed a "magnifying
glass" that the reader can direct over the map to reveal the
details.

http://www.nationalgeographic.com/congotrek
Range and detail also have a
temporal element. A dynamic text has a range (time span),
and a number of possible changes within this range (the
frame rate in film, sample rate in recording, screen refresh
rate for computer monitors, polygons per second in graphic
computer games).
The temporal range is often
a clear indicator of genre. A feature film or novel is
supposed to be of a certain length, while an American
television sitcom like Friends is adapted to a
half-hour playing time, made to be interrupted by commercial
messages. In computer media, temporal detail - frame rate,
for example - is often a trade-off between technical quality
and bandwidth concerns, a part of what we called mode of
distribution above. A high frame rate is necessary to
show fluent motion, but it also drives up the file size. It
is thus a necessary decision for a Web author, that will
have rhetorical consequences. The table below places some
technical terms that may describe a text's mode of
restrictions in relation to range and detail in space and
time.
|
|
|
Size
Aspect ratio
Contrast range
|
Resolution
Colour Depth
Sample depth
|
|
Duration
|
Frame rate
Polygons/sec
Refresh rate
Sample rate
|
3.2. Canvas and
Rhetorical Convergence
A Web form is perceived to
be the result of a rhetorical convergence if it blends
rhetorical properties of two or more genres and media. The
axis of distribution is one set of rhetorical properties. It
describes the canvas: the range and detail in time and space
available for sign production in a text. Rhetorical
convergence may be perceived by a shared canvas:
A Web
form with a canvas similar to one genre in one medium, but
in other respects similar to a genre in another medium is
perceived as a result of rhetorical convergence between the
two.
A unique feature of computer
texts is that they are flexible, a fact that has been
deliberated at some length by, for example, J. David Bolter
in Writing Space and Nicholas Negroponte in
Being Digital. A consequence of this is that a text
may have a flexible canvas, it may vary between different
modes of restrictions. Most file formats have not set all
the restrictions we have discussed, but let the author
specify them for each new text. While, for example, the
frame rate of television is a given, it may be different in
two subsequent computer videos, or even change during the
course of a film. Some restrictions may be left for the
reader to decide; so-called "fluid" designs will fit the
computer window whatever width or height the user has set it
to. The possibilities of combining modes of restrictions
from different media's rhetorics thus seem virtually
unlimited. A few examples are given here only to demonstrate
rhetorical convergence along the axis of
restrictions.
In 2003, the visual quality
of computer displays is so poor compared to paper that there
are very few visual forms from these earlier media that can
be reused on the Web without changes. The only typography
from paper that can be recreated on the Web is that of small
formats: newspaper layout would be impractical on the Web,
as it would require a lot of scrolling both horizontally and
vertically [9].
Almost any Web form derived from print would be perceived as
converging with computer rhetoric (or computer interface
conventions, which is an alternative term) because of the
difference in canvas. The "magnifying glass" from
"Congo
Trek" discussed
above is clearly related to the "locator" interface in the
popular image editor application Adobe Photoshop.
"Congo Trek" is the Web
version of the story about Michael Fay's expedition. The
same year, Fay's story was also told in National
Geographic Magazine and in a televison documentary in
the National Geographic Explorer series. The table
below renders aspects of three of the four axes when
comparing the map in the "Congo Trek" Web site with the
similar maps in the National Geographic Magazine
and the television program. Although the Web site has a
canvas with even less detail than a television screen, it
uses a map almost as detailed as the one in print. To make
the map readable, a mode of acquisition ("display control",
that is, the magnifying glass) from computer media is used,
exemplified in the table with Acrobat Reader, an application
used to view on a computer documents designed for print.
Like Photoshop, Acrobat reader uses a "magnifying glass" as
an interface for moving images larger than the screen.
"Congo Trek" is thus a convergence of three different forms.
The flexibility of computer technology opens for a multitude
of positions between established media's modes of
restrictions. Web camera pages are a good example, as they
are midway between films and still images, technologically
speaking. A Web camera is in fact a video camera, but it is
connected to a computer that is programmed to copy the
camera's image at a much lower temporal detail (frequency or
frame rate) than video has, so it will not be perceived as
moving. Still, it is a dynamic image, it does change over
time. All earlier media are either dynamic to the degree
that they have enough temporal detail to make us perceive
continuous movement or sound, or else they are still. The
computer has opened up a whole range of rhetorical
possibilities between these two perceptual poles.
|
|
|
720x576
pixels
|
Small maps, few
details, animation
|
Watching/listening
|
|
351x280
pixels
|
Large map, many
details; writing
|
Reading + Display
control
|
|
12000x16200 dots
[10]
|
Large map, many
details; writing
|
Reading
|
|
1280x1024
pixels
|
Large map, many
details, writing
|
Reading + Display
control
|
4.1. Mode
of Acquisition
Among the established media,
there are marked differences in how they are read. Cinema is
viewed collectively in one sitting from beginning to end, in
a darkened, public theatre. A newspaper is read privately,
can be taken anywhere, put down and taken up again, flipped
through and read in any sequence. Few readers read the whole
paper, instead most readers rely on headlines, leads, images
and captions to decide what to read. We will discuss such
differences here under the heading mode of
acquisition.
Mode of acquisition is how
the reader accesses the signs of a text. Again, this notion
would often seem superfluous in earlier media. In computer
media, on the other hand, the kind of reading process the
text enables and encourages is a rhetorical choice, and
encompasses the different devices often contained under the
label interactivity.
While most films and novels
are told in one sequence which the reader is expected to
follow, Web sites rarely are. Most Web sites offer a partial
list of their contents as a menu of choices for the reader.
The reader chooses what links to follow, and only when a
link is activated (typically clicked on) by a user does new
page appear.
The reader's
choose-and-click reading activity is named interaction
by some writers, while others reserve this term for
other kinds of reading and writing activities, with
computers or between human beings only [11].
A less disputed term for the reading activity involved in
Web reading would be ergodic, coined by Espen
Aarseth in Cybertext. Aarseth defines ergodic
literature as literature in which "nontrivial effort is
required to allow the reader to traverse the text," and
having made this effort, the reader "will have effectuated a
semiotic sequence" (1). Reading traditional novels, watching
films or television of course requires cognitive effort, but
apart from interpretation [12]
the effort is considered trivial, as the reader only has to
turn pages, the viewer to watch. In hypertext or computer
games, however, the reader/player constantly has to make
decisions as to what to do with the text [13].
These "non-trivial efforts"
are manipulations of the text as a material and mechanic
object. Manipulations may be throwing dice or coins,
ordering pieces of paper, or controlling a computer
interface with mouse, keyboard or other input devices. To
focus in this way on the material structure of a text and
the ergodic effort involved in reading is what Aarseth terms
a cybertextual perspective (22). To view a text as
a cybertext is to view it as textual machine "for the
production of a variety of expression" (3), a machine that
according to some principle combines pieces of text into the
text the reader reads. The machine thus has three important
sets of parts: the textons, the pieces of text that
may be combined; the scriptons, which are the
pieces of text the reader is expected to read; and the
traversal function, "the mechanism by which
scriptons are revealed or generated from textons and
presented to the user of the text" (62). Examples of
traversal functions are the links in a hypertext, the
simulation and representation engines in an adventure game,
or throwing coins in the case of I
Ching.
We can in principle treat
any text from this cybertextual perspective. In a
traditional printed novel the textons and the scriptons
would then be identical, and the traversal function nothing.
By stating that, we have said very little about the novel,
however, the cybertextual perspective is only interesting
when studying ergodic texts. In a static Web site, the pages
stored at the server are the textons, the links between them
the traversal function, and the pages displayed to the
reader when links are followed are the scriptons. When
developing his typology of ergodic texts, however, Aarseth
also uses cybertext as the name of a genre.
Understood in this way, a cybertext (perhaps we might call
it a cybertext proper) is a text where the traversal
function involves some principle of calculation (75), so the
text will not look the same in two readings. In this
typology the static Web site would not be a cybertext
proper, while a computer game would.
Viewed from the cybertextual
perspective, most Web media are constructed differently from
earlier media. The Web technology was initially designed to
support distributed hypertext, and the basic structure of
saving Web pages as separate files makes it a manifestation
of the cybertext model. Links, and more advanced techniques
such as javascripts and server-side scripting, allow for the
construction of a wide range of traversal functions, opening
for an even wider range of modes of acquisition. It is rare
to see forms from other media being adapted to Web media
without adding cybertextual features such as links and
search routines.
What I call mode of
acquisition should then be understood as the reading
process that results from the mechanical (cybertextual)
construction. Modes of acquisition in computer media have
been classified by different theories.
Based on a text's mechanical
(or cybertextual) construction, Espen Aarseth is able to
classify it as inviting the user to perform one of four
user functions. The interpretative
function is to read and comprehend; the explorative
also requires the user to decide where to read next.
These two user-functions are required by unicursal
and multicursal texts respectively
[14].
Dynamic texts, that
is, texts where the number of scriptons is not constant,
open for two other user functions. The configurative
function is to let the reader "configure their
scriptons by rearranging textons or changing variables"
(64), while allowing the reader to add textons is a
textonic user function.
The mode of acquisition is
also related to the sign system(s) used. Sound and moving
images exist over time, and have to limit the reader at
least for a moment for at all to exist as texts, while all
writing in principle is open to be read in any sequence.
Still, a looping video of wind in trees, or a long sound
recording of waves may not be very different from a still
image in the way it is read.
Following Kant's distinction
of objective and subjective sequences, Gunnar Liestøl
has discerned between different kinds of activity in reading
texts. "With the consumption of dynamic information
(audio/video) the dominant activity is located in the
textual object itself, as object-action; with the
consumption of static information, however, the dominant
activity is located with the user-subject, as
subject-action" (45). Of subject-action, there are
two kinds, intervening and non-intervening.
Non-intervening subject-action is the activity of
reading a static text, while intervening subject-action
is to actively choose where to read on. Reading
hypermedia is to alternate between parts of the text
dominated by object-action and the different kinds of
subject-action, according to Liestøl.
Yet another way of
discerning between different modes of acquisition is
proposed by Jens F. Jensen in "Interactivity: Media Studies'
Blind Spot?." Jensen proposes a typology of twelve different
media positions, depending on whether the media allow for
registrational interactivity and conversational
interactivity, and which of three kinds of
selective interactivity is offered (none,
transmissional or consultational). Based on
Bordewijk and van Kaam's typology of "traffic patterns,"
Jensen's typology orders media after the power relations
they set up between providers and consumers.
Registrational interactivity is "a measure of a
medium's potential ability to register information from and
thereby also adapt and or respond to a given user's needs
and actions [...]" (60), or in other terms of the
flow of information from consumer to provider.
Conversational interactivity is information
exchange between consumers "in a two-way media system" (60).
Selective interactivity is the possibility for
consumers to select between different available programs or
texts made by providers. If the providers control the
distribution and consumers only select what to read, it is
of the transmissional kind, if consumers initiate
distribution, it is of the consultational kind.
Aarseth, Liestøl and
Jensen all provide perspectives from which the modes of
acquisition may be described and assessed; perspectives of
material construction, of sign system, and of power
relations respectively; perspectives that are not
necessarily contradictory, although they divide the range of
texts differently.
In my earlier paper
"Linearity and Multicursality," I have argued that the
selection of links by readers is governed very much by how
the links are signified. I am thus able to see a range of
different kinds of what we may call "explorative user
functions", "intervening subject-actions" or "consultational
selective interactivity" in the different vocabularies
reproduced above by identifying some ideal types of
acquisition mode.
Imagine a continuum of how
much influence and control a reader has of the sequence of
parts in a Web site. At one end we find movies: videos,
animations and narrated slide shows. The most television- or
film-like of these run in one sequence by themselves; I have
called this cinematic mode. Most movie streams,
however, have a pause button and perhaps VCR-style controls
for fast forward and "rewind." A similar control is handed
over to the user when Web pages are linked in a chain with a
"next" link on every page. Similar to reading a novel, there
is only one next page, but the reader chooses when to go to
it. Although these two modes have different dominant actions
(static Web pages with "next" links are dominated by
subject-action, while VCR-controls introduce subject-action
to texts dominated by object-action), I have chosen to group
them under the term progress control. All these
texts would be what we call linear, sequential, or
unicursal.
At the other side of the
continuum's middle are the multicursal sites; sites where
most pages have more than one link. In several of them,
there are links, but there is either so little information
of what will be at the other end of each link, or the text
of each page depends so much on what was written on a
previous one that one sequence is strongly prioritised
before all others. I have called this default sequence,
using a term inherited from Jane Yellowlees Douglas'
and Jill Walker's discussions of the novel Afternoon
by Michael Joyce. Further along the continuum, we find
texts where there are more possible or probable sequences,
but the reader still has limited control. These are sites
where links load random pages, and sites without a
prioritised sequence, but also so little information about
the links' destination that the reader is navigating
blindfolded through a labyrinth. In my paper "Linearity and
Multicursality", I called this oblique linking. An
example of this kind of text would be Michael Joyce's
canonical hypertext novel Afternoon. At the end of
the continuum furthest removed from films are Web sites
where not only there are many links, but where the structure
of the linking is made so explicit that the reader can
navigate freely, as in a building she knows well. I will
argue it is only this mode of acquisition deserves the name
multicursal in a strict sense. While other forms of
linking may open the possibility of many courses through the
text, only an explicit linking practice makes it likely that
the reader experiences the text as making multiple courses
possible.
In the figure below, my five
modes of acquisition are related to Espen Aarseth's user
functions and textual positions, and to Gunnar
Liestøl's kinds of activity. As these five modes only
describe texts with static textons, the figure only maps
what Jens F. Jensen calls the dimension of "selective
interactivity", and is not able to grasp the
"conversational" and "registrational" dimensions, or
Aarseth's "configurative" or "textonic" user
functions.
|
Object-action
|
Non-intervening
subject-action
|
Intervening
subject-action
|
|
Unicursal/Interpretative
|
Multicursal/Explorative
|
|
Cinematic
mode
|
Progress
Control
|
Default
Sequence
|
Oblique
linking
|
Multicursality
|
|
Less user
influence
|
More user
influence
|
This way of mapping modes of
acquisition implies that hypertext or computer rhetoric has
two levels: a cybertextual level and a rhetorical level.
Default sequence, oblique linking and
multicursality are all results of similar
cybertextual constructions, it is the writing on the pages
and links that separate them. Clearly, link-node hypertext
is a certain figure of cybertextual construction that may
give rise to many different rhetorical figures.
Espen Aarseth touched upon
this separation in his essay "Nonlinearity and Literary
Theory", and later in Cybertext. Following Pierre
Fontainier's nineteenth- century rhetoric, he separates
between tropes and "le figures non-tropes", or syntactical
and semantical figures, as Aarseth terms the two kinds
(Cybertext, 91). In "Nonlinearity" Aarseth lists
some "figures of nonlinearity" as syntactical figures, or
figures; static Web sites and other link-node hypertexts
typically make use of the figure linking/jumping. In
Cybertext (91), he addresses the other side of the
pair, the "tropes" or "semantic figures", in the analysis of
Michael Joyce's Afternoon.
As this particular use of
the traditional concept pair trope and figure
for semantic and syntactic figures respectively is far
from universal in the rhetorical tradition, I will prefer
the more common terms figure of thought and
figure of diction, leaving trope for
figures that transform the meaning of words or expressions
such as metaphors and metonymies (compare Plett 309). But
Aarseth's point, that the various cybertextual mechanisms
for textual production are different figures of diction
rather than figures of thought is an important
one.
Many cybertextual
constructions are neutral techniques that can be used with
different effects in different texts. Static Web sites are
examples of "link-node hypertext", a cybertextual figure
where stable scriptons are bound together with stable links,
so the scriptons in the text always have the same relation
to each other. I call this structure a figure of diction, a
way of constructing a text. As a structure of scriptons and
their relations, which may be filled with any semantic
content or mere gibberish, the structure will remain the
same.
Aarseth's two hypertextual
figures of thought, aporia and epiphany
describe the users frustration when lost in the textual
labyrinth of Michael Joyce's Afternoon, and then
the bliss when a new direction of reading is found. These
two figures of thought are put on top of a certain
structure, a use of the link-node figure and a figure of
restricted access, both figures of diction. The very same
structure, with the same conditional links could have been
made explicit, with every link and block explained to the
reader. In that case, the figures of thought would have been
different.
As with all other rhetorical
figures, the hypertextual figures we have discussed here,
both figures of diction and figures of thought, may be
combined with most other rhetorical or poetic devices. A
worked through system of navigation links, for example, is
usually associated with professional business sites, while
hypertext novels of the "Eastgate school" usually employ
relation links, but this need not always be the case. Bobby
Rabyd's Web novel Sunshine
'69 is an
example of a serious hypertext fiction using a clear and
understandable set of navigation links, resulting in a truly
multicursal work of fiction. In Hamlet on the Holodeck,
Janet Murray lists numerous experiments in immersive
computer fiction, among them multiform stories, stories that
present "a single situation or plotline in multiple
versions, versions that would be mutually exclusive in our
ordinary experience" (30). When giving multiform stories as
a writing assignment to her students, many respond with a
"violence hub" story, Murray reports (135). These are
stories where something violent or traumatic has taken
place, and the reader is invited to follow links to explore
how a number of characters react to the event. In these
texts, many rhetorical figures are at work simultaneously.
The "violence hub" is a certain realisation of the more
general multiform story. None of these need to be
multicursal works; in fact, most of Murray's examples of
multiform stories are Hollywood movies. But some are
multicursal computer texts, and these need to communicate
this structure to the reader, using certain linking figures.
All these devices, multiform, violence hub, and linking
figures, are figures of thought, used within a certain
figure of diction, the link-node structure. Similarly, the
decision to posit the reader as a character in the diegesis
as in many computer games, is a figure of thought not
concerning the cybertextual construction of a text, a figure
that also has been used in codex novels, such as Italo
Calvino's If on a Winter Night a
Traveler.
Mode of acquisition should
thus be understood as the reading effort and experience the
text invites and expects from its readers. It consists of
both the required handling of the text's mechanical
structure, how this requirement is communicated to the
reader, and how it is aligned with the text's
message.
4.2. Mode of Acquisition
and Rhetorical Convergence
As mentioned, a Web form is
perceived to be the result of a rhetorical convergence if it
blends rhetorical properties of two or more genres and
media. The mode of acquisition is one set of rhetorical
properties: it describes the reading process required of the
reader to read a text.
A Web
form with a required reading process similar to one genre in
one medium, but in other respects similar to a genre in
another medium is perceived as a result of rhetorical
convergence of the two.
The mode of acquisition is
an integral part of all earlier media, something which is
witnessed by the large literature on the difference between
the "linearity" of print and the "nonlinearity" of
hypertext. As with mode of distribution and mode of
restriction, a change in the mode of acquisition will be
perceived as changing the rhetoric towards the rhetoric of a
different medium. It further seems rare to recreate any
rhetoric on the Web without introducing some linking beyond
the imitation of page turning or the controls of a VCR.
Introducing a cybertextual figure is a convergence towards
computer rhetoric, but as we have seen, this may result in a
wide range of different rhetorical forms. A simple example
may show how the convergence of video and ergodic texts may
take place:
Many television networks
offer video recordings of their news on the Web. Many such
newscasts are chaptered, so the individual reports can be
selected from a list, and viewed out of sequence. Viewing
television in this form becomes more like reading a
newspaper. The reader may select what to view when, and is
not likely to have the patience or interest in following the
original sequence of the newscast, a sequence that probably
was made with care. Following her own priorities, the reader
will continually consider whether a story is worth her
attention, and be ready to break it off at any point by
selecting another item from the menu. And it is likely that
she will skip some items entirely, as they do not interest
her. Changing the mode of acquisition in this way is a
profound change from the kind of ceremony the evening news
is on broadcast television. The next table renders the
rhetorical convergence along two axes:
|
|
|
Multicursal/progress
control
|
Writing, still
images
|
|
Multicursal/progress
control
|
Moving images,
speech
|
|
Cinematic
|
Moving images,
speech
|
5.1. Mode
of Signification: Sign systems
What I here will call
mode of signification is not difficult to grasp, it
is simply the sign systems used in a text; for example,
video, writing, spoken language, or still images. The
similarities and differences between the sign systems are
complex, however. As with the other three axes of rhetorical
convergence, the mode of signification encompasses many
modalities a text can occupy. And as with some of the other
axes, the modalities are intertwined.
Any printed matter, sound
recording, video, film or broadcast may be reproduced, or
re-represented, or copied, in the computer. It will be
stored as one of four classes of file formats: text (mainly
ASCII, but other formats exist), images (formats such as
JPEG, GIF, TIFF, PICT, BMP, PNG), sound (formats such as
AIFF, WAV, MPEG, AAD), and video (formats such as AVI,
QuickTime, Real, Windows Media, MPEG). To the human ear and
eye, however, the distinctions blur. A photograph of a
poster would be stored in the computer as an image, but we
read it as writing nevertheless. Similarly, we may film a
still image, or draw an image with ASCII signs. Audio may
contain recognisable sounds of all kinds, including music
and spoken language. To form an understanding of multimedia
as communication between humans, we must instead distinguish
between the different kinds of signification these formats
store.
There are many ways of
relating the four classes of sign systems listed above. We
will in the following list several distinctions between
characteristics of sign systems, in order to show
similarities and differences between different modes of
signification.
Philosophers and scholars of
rhetoric, poetics, aesthetics, linguistics, and semiotics
have discussed at length the differences between different
communication systems such as language and image, spoken and
written language, the eye and the ear, the spatial and the
temporal. These dichotomies are all intertwined, as we may
illustrate by putting the four sign systems into a
table:
|
Writing
|
Speech
|
|
Pictures
|
Video
|
Of the sign systems in this
table, only speech is perceived with the ear, the other
three by the eye. The top two, writing and speech are based
on natural language, the bottom two are kinds of imagery.
The right half, speech and video exist in time, they are
dynamic or temporal, while the left half are fixed, spatial.
One way of labelling the rows and columns is
thus:
|
|
Fixed/Spatial:
|
Dynamic/Temporal:
|
|
Language:
|
Writing
|
Speech
|
|
Image:
|
Pictures
|
Video
|
Many of the differences
between popular genres in different media can be sorted
along these two dichotomies. Alphabetic writing is abstract,
for example, and the length of lines and pages do not matter
in many writing styles, while alterations of proportions or
size in images do change the impact of the image. When
writing and images are combined, writing will have to give
up some flexibility to fit with the pictures. Adding moving
images to explanatory graphics is another example that makes
the difference stand out. Using animation, processes and
time relations may be rendered more effectively, but at the
same time the ability a reader has to scan and compare the
different parts of a still image is taken away.
Within each row in the
table, the differences are also much discussed in
literature. The difference between spoken and written
language has been a topic for language philosophy since
Socrates, renewed by present-day linguistics. The film
theories of, for example, Mitry or Deleuze are concerned
with the specificities of the moving image as opposed to the
still.
What is lost in our
labelling of the categories in this manner is the
distinction between eye and ear mentioned above. This is an
important distinction, however, when discussing the
combination of forms. It is often easier to comprehend
combinations of sign systems involving two different senses,
a point we will return to in 5.2 below.
Sound is always temporal,
always dynamic [15].
But all sound is not speech. Music and sound effects are
obvious examples. To fit this distinction in, we might have
to reduce the language/image dichotomy to language and
non-linguistic signs. Thus:
|
|
Fixed/Spatial:
|
Dynamic/Temporal:
|
|
|
Eye:
|
Ear:
|
|
Linguistic:
|
Writing
|
Animated
writing
|
Speech
|
|
Non-linguistic:
|
Pictures
|
Video
|
Music, Sound
effects
|
This division also makes us
aware of animated writing as a distinct form. It occupies a
middle position between writing on one hand and video and
sound on the other, and Gunnar Liestøl has
demonstrated in "Aesthetic and Rhetorical Aspects" how
animated writing may smoothen the transition from writing to
video (a point we also touched upon while discussing
bandwidth above). Moving writing looses one of the powerful
aspects of alphabetic writing however: the ability to read
at different speeds. Reading more than a few words of moving
writing is thus likely to annoy many readers.
One might very well suspect
that the "non-linguistic" category is too broad, and this is
brought to the surface if we consider Web pages combining
photographs and diagrams, two very different forms of
images.
In September 2001, the
Spanish Web newspaper El Mundo carried an
"interactive graphic" of the September 11th disaster, titled
"Oleada
de atentados en Estados
Unidos" In the
feature, schematic drawings and photographs of the World
Trade Center are combined with great effect.

http://www.elmundo.es/elmundo/2001/graficos/septiembre/semana2/atentados/atentados2.html
The drawing explains what is
happening (a logos appeal), while the photograph is a
witness, and much more emotional (a pathos appeal). In
Languages of Art, Nelson Goodman clarifies the
distinction between the two, as well as the distinction
between language and image. In his vocabulary, languages are
differentiated, while images are dense
[16].
Like in Saussure's semiology, words are seen as disjunct,
differentiated signs with differentiated meanings in
Goodman's theory [17].
Dense sign systems (or dense notational schemes in
Goodman's vocabulary) do not have differentiated positions.
Thus, between two signs, there is a possible infinite number
of signs. His example is a mercury thermometer without a
grid marking the temperature scale. In such a thermometer,
any position of the mercury scale would be meaningful. Dense
sign systems may further be either relatively attenuated
or relatively replete. In a photograph or a
painting any aspect of the image is potentially meaningful,
so the image would be another if any detail was changed. In
a map, on the other hand, choices of colour or thickness of
line are relatively arbitrary. On a world map, it has little
importance if Zimbabwe is coloured green or pink as long as
its colour is different from the colours of neighbouring
Zambia or South Africa. Maps and diagrams are relatively
more attenuated than photographs, as some dimensions of the
visuals carry meaning while others do not. Below, Goodman's
distinctions are drawn into our diagram.
|
|
|
Fixed/Spatial:
|
Dynamic/Temporal:
|
|
|
|
Eye:
|
Ear:
|
|
Differentiated
(digital):
|
Writing
|
Animated
writing
|
Speech
|
|
Dense
(analog)
|
Attenuated:
|
Diagrams,
Typography
|
Moving
diagrams
|
Music
|
|
Replete:
|
Pictures
|
Video
|
Sound
effects
|
This further subdivision not
only helps us discern diagrams from other images, it also
tempts us to place of other sign systems used in Web sites,
such as typography and lay-out. Music may also be
distinguished from sound effects in this manner.
When we claim that
photographic images of the World Trade Center catastrophe
are "witnesses", it is based in the knowledge that
photography is a process of chemistry and optics. In Charles
Sanders Peirce's terms, the photograph is both
iconic, as it resembles its motive, and
indexical, as it is a physical imprint caused by
another physical imprint of the light rays that were
reflected off the motive. Now that we have introduced
Peirce's canonical trichotomy of signs: iconic sign,
index and symbol, we see that in the above
table, all the differentiated sign systems are symbolic,
while all the replete are iconic. The attenuated, however
will have aspects of both, while there is no separate place
in the table for the indexical. Furthermore, a shot in a
film of a landscape with a column of smoke rising from a
distant hill would be iconic first, in that the image
resembles an actual scene, then indexical second, as the
smoke is a sign that there is a campfire burning. Our table
cannot capture Peirce's typology of signification while
maintaining the differences we have charted so far. To map
all dimensions of signification in one diagram would be
overly complex.
Only a little reflection on
how music communicates will complicate this yet further.
Music may be seen as a system in which some parts (rhythm,
scales, harmonies) are parts of a differentiated system,
while others (timbre, volume, pulse, phrasing) are dense. In
addition, music always carries strong connotational
meanings. A simple example is found in the "Becoming Human:
The Documentary" section of the Becoming
Human Web site,
where (supposedly) Ethiopian music lends connotations of
"africanness" to a description of an excavation in Ethiopia.
These secondary meanings (in additon to the primary,
denotational meanings) apply to any interpretant to any of
the sign systems involved, and cannot be captured by the
above table either.
5.2. Mode of Signification
and Rhetorical Convergence
We have repeatedly stated
that a Web form is perceived to be the result of a
rhetorical convergence if it blends rhetorical properties of
two or more genres and media. The mode of signification is
one set of rhetorical properties: it describes the
particular combination of sign systems used in a
text.
A Web
form with a combination of sign systems similar to one genre
in one medium, but in other respects similar to a genre in
another medium is perceived as a result of rhetorical
convergence of the two.
Changing the sign system
while keeping the rest of the rhetoric is the most obvious
rhetorical convergence. All the examples used to illustrate
the other three axes used the mode of signification as one
dimension in the comparison. Many of them also in addition
contain in the combinations of sign systems. The narrated
slide show "Sights
and Sounds of the Way West"
described under 2.1 above combines a dynamic sound track
with still writing and imagery, but both photographs and
written words are made dynamic by moving the frame and
fading words in and out. Yahoo! FinanceVision, the
example from 2.2, put paragraphs of written text next to a
small video pane. The map from "Congo
Trek" discussed
under 3.1 uses the power of diagrams and maps to provide
overview of a large number of written parts.
When two different modes of
significations with different characteristics in all the
ways listed above have to be aligned, two principles govern
the combinations: the limits of the senses and perception,
and what I call containment.
Our vision cannot read
writing and images simultaneously (an observation elaborated
by many scholars, for example by Michel Foucault in his
analysis of Magritte's aesthetics in This is Not a
Pipe), so we will have to move back and forth between
the two. Thus, if a lengthy text is projected on top of a
video segment for a short while only, it will be very
difficult not to miss either some of the text or some of the
images. Eye and ear may cooperate nicely, however, as when a
voice-over explains images in a documentary film. It does
also seem to me that our ability of language processing is
such that we not only are unable to simultaneously
comprehend two people speaking at the same time, but also
that reading and comprehending several paragraphs of writing
while listening to a speech is equally
impossible.
Containment is a
word I use to describe the fact that sign systems do not
appear next to each other on the Web, but are convoluted. In
Web pages, either a video pane is inserted into a text page,
or text is inserted into a video window. Digital video is
always rectangular, and text will normally either be within
or around the rectangle. The fact that a video clip has to
be a separate file from the HTML page makes it even harder
to penetrate the edges of the video rectangle, if the author
should desire to do so. Apart from these technical reasons,
photographic video has always been separated by a frame, a
basis for theories on film by Jean Mitry, Lev Manovich, and
others. To escape the frame, or for text to penetrate it,
the video would have to loose all depth, all sense of
foreground and background. It is imaginable to shoot a video
against a monochrome background, matte it out, and script
text to blend in with parts of it, but I have never seen it
done in an actual Web page. What emerges is a master-servant
or parent-child relationship.
The
parent-child-relationship between the containing semiotic
system and the contained can also be treated as a time
relation; one reads the parent before the child. This makes
it possible to treat, for example, video inserted in a text
page similar to video that opens in a separate window from a
link in a text page. It is an advantage to do so, as the two
often are used with comparable effect. To classify a page in
one of the two categories, one would query which signs that
reaches readers first; those of the text or those of video.
Again, it is possible to imagine a page designed in a manner
that makes this distinction impossible, that it is random
what one reads first (which would require that the video
loaded as quickly as the HTML). I have not seen it in
reality, but were if it to be found, it would be a third
category requiring its own analysis.
Let us go through the
different distinctions between modes of signification. We
will identify different examples of rhetorical convergence,
and note the new intermediate forms that appear.
Eye and Ear. In the
opening paragraph from "The Way West," music (harmonica,
acoustic guitar, and double bass) plays as a background
accompaniment, contributing to the Western pioneering mood.
The two sign systems complement each other, each bringing
one part of a combined message. Using music to evoke
connotations in this way is common in film, but here it is
coupled with an imitation of a printed page.

http://www.nationalgeographic.com/ngm/0009/feature2/media2.html
Later on in the same
feature, still photography is coupled with radio - a
different kind of eye/ear combination. The landscape
photographs in the middle of next page are shown while a
narrator reads the story of the pioneers, and the sounds of
birds and a waterfall are also heard.

(Bird song)
Voice-over: Out onto the plains of Kansas. New
voice: "Now we were out of civilisation and the
influences of civilised society entirely, and cut out from
the rest of the world to take care of ourselves for a
while." Alicia D. Perkins 1849.

(Sound of
waterfall) Voice-over: When they came to Alcove springs
in Kansas, they described it as the most beautiful site on
the whole trail, even though their whole trail experience
had just been a few days.
Rather than bringing
different messages, sound and visuals here have the same
content; the message is doubled. The sense of reality and
immersion is heightened when they align. It is a little more
like standing there, experiencing the landscape than images
or sound alone would be.
Static and Dynamic.
In these two examples from "The Way West," sound and visuals
also bridge another division: that between static and
dynamic sign systems. As sound also exists over time, it
adds a temporal dimension and dynamic to the scene. In the
still image of the waterfall, one can almost see the water
moving when the sound is added.
Another effective
combination of static and dynamic sign systems is found in
an exercise page from the British Men's
Health
site, which combines written words and print-like
layout with video. The exercise program is explained in
writing, and can be consulted over and over, while the
exercises are demonstrated in video, thus showing the actual
movements.

http://www.menshealth.co.uk/fitness/owen/
Differentiated and
Dense. Not just different in being static and dynamic,
writing and video are also differentiated and dense
respectively. This adds to the effect of rhetorical
convergence in the above example from Men's Health;
as the video images are dense, all details are recorded, and
may be studied by the aspiring weight lifter, including the
finer points not mentioned in the written
instructions.
Another combination of
differentiated and dense sign systems is of course images
and writing, a combination so common, it hardly deserves the
name rhetorical convergence (was there ever a time
when people never drew images on the same surface they were
writing on?). A computer version of the combination, however
is to let the user make the writing visible at will.
Nationalgeographic.com's "Columbia
River" is a moving
panoramic image of the river. When the reader positions the
mouse over an element in the image, a written label appears,
and when the reader clicks, a smaller pane with more writing
opens. This particular combination of image and writing
makes it possible to combine a large-scale, detailed image
with explanatory labels without cluttering the image with
letters.
In the illustration below, a
fish in the water was clicked, bringing up the pane with
writing and images.

http://www.nationalgeographic.com/earthpulse/columbia/index_flash.html

Attenuated and Replete.
The image in "Columbia River" is a stylised drawing,
which allows the artist to draw attention to the details he
conciders to be important. In Goodman's vocabulary, it is
relatively attenuated. When the fish is clicked, however, a
photograph opens, showing what the fish "really" looks like
in all its details. The photograph is replete. In 5.1, we
noticed how effectively "Oleada"
combined photography and drawing to reap the benefits of
both the repleteness of photography and the attenuation of
drawings, providing both overview and detail.

http://www.elmundo.es/elmundo/2001/graficos/septiembre/semana2/atentados/atentados2.html
Iconic, Indexical,
and Symbolic. "Oleada" also demonstrates the
combination of iconic, indexical, and symbolic signs. The
photographs are indexical and iconic, the physical traces of
the catastrophe, while the drawing is iconic but not
indexical.
Similarly, BBC
News regularly
combines the indexical with the symbolic, by linking parts
of radio programs from newspaper-like written news stories.
A quote in writing may thus be backed by the recording of
how the statement fell, and the recording will also reveal
the tone of voice of the speaker.
The ability to combine so
many modes of significations within a Web site was the
starting point for our investigation of rhetorical
convergence of the Web, and its most visible and basic
manifestation. But the obvious multitude of possible
combinations of signifiers with different properties, and of
different kinds of semiosis makes the term convergence
seem a less fitting description for the actual
resulting text. Does not the discussion above indicate a
divergence of rhetorical forms? We will return to
that question towards the end of this essay, but first we
should view the four axes of rhetorical convergence
together.
The table below lists the
four axes and the different terms we have discussed.
(Neither their placement, nor the dividing lines have any
significance, the figure is merely meant as a
summary.)

6. Limits
of Technology
We have discussed the four
axes of rhetorical convergence: mode of distribution,
mode of restrictions, mode of acquisition, and mode
of signification. Any rhetoric may be described by
registering the variables listed for each of the axes, also
established, well-known genre rhetorics. Most Web texts will
score so close to a genre rhetoric known from earlier media
that we recognise them as fairly similar. In many cases,
however, there are also some variables that are different,
they will show similarities to a different rhetoric of a
different established medium. It is this simultaneous
resemblance to two or more established rhetorics we call
rhetorical convergence.
What we have left to clarify
is the relation between the four axes. It might be tempting
to align the four axes with other familiar distinctions, for
instance, to say that mode of distribution and
canvas are aspects of technology or even medium,
mode of acquisition is syntax, and mode of
signification is semantics. Such a division would not
stand up to inspection; technology is present as a factor in
all four axes. As well as setting the limits for
distribution speed and canvas, technology also governs which
sign systems may be used, and the possibilities for user
input and influence on the text. The mode of acquisition and
the mode of distribution chosen by the authors will also
influence on the semantic content of a text. We have already
seen how a text may be read differently if it is live, or
how bandwidth limits the use of video. Furthermore,
multicursal aspects or merely playback controls open for a
different reading and understanding of a text.
A perhaps more helpful way
of relating the axes is to view them as a process of
communication.

Then the mode of
distribution would be how the message is brought to the
reading surface, canvas is the properties of that surface,
the mode of acquisition governs how the reader manipulates
the surface to experience the signs, and the signs in turn
are what the reader reads.
Such a highly abstracted
view of a reading process may be helpful to memory, but as
all models, this way of aligning the axes also obscures some
relations as it highlights others. Although reading and
comprehension take place in the meeting of text and reader,
the rhetoric of the text is shaped by all earlier stages of
the communication process. The axes of rhetorical
convergence are not events that take place one after the
other, but simultaneous dimensions that may describe any
given rhetoric.
It is further important to
realise that technology determines all four sets of aspects.
Technology sets the premises for the possible modes of
distribution and acquisition, technology dictates which sign
systems may be used, and the canvas is just a subset of the
possibilities of the technology. As such, the four axes may
also be seen as a set of limitations, of subsets of subsets
of technology's possibilities, as in the following
drawing.

The canvas is a set of
chosen limits within the possibilities of the technology, a
subset that sets the limits for which sign systems can be
used and in which ways. The canvas also determines the
distribution process, as a large and detailed canvas
requires more bandwidth. The mode of signification and the
mode of distribution are relatively independent, however, as
the mode of distribution describes the communication as
process, and not the semiosis itself. It is a way of
expressing the temporal relation between signifier and
signified, between story and discourse.
As modes of acquisition have
to be signified, these depend in turn on the sign systems.
The mode of distribution also determines the mode of
acquisition, as all other acquisition modes than the
cinematic require some permanence.
Such a theoretical model is
not a description of a procedure or design workflow, it is
probably rarely the case that screen size is decided on
first, then writing and editing, and then linking. It
remains a concern, however, that if one is to link from a
video, for example, the presence of a link needs to be
signified.
Also this perspective
obscures one important relation: that the distribution
process is a prerequisite for the communication to be
possible at all. Each of these alternative views of the
relations between the axes have its strengths and
weaknesses, which is why it seems justified to view them as
interrelated axes on which a multidimensional space is
projected.
How does this
multidimensional model relate to the rhetorical heritage?
Traditional rhetorical figures reside inside what is
signified, and are thus not visible in this model. This does
not mean that they are not important, on the contrary: I
have argued that the effects from choices along the four
axes of rhetorical convergence have effect on the message.
Still, this effect is probably less in terms of persuasion
than the message conveyed by the signs. In our larger
understanding of human communication, rhetorical convergence
remains a footnote.
7.
Rhetorical Divergence?
In an area in the Pacific,
known as the intertropical convergence zone, the strong
trade winds from the east collide head on with winds from
the west. It is a phenomenon meteorologists know as
convergence. The colliding winds bend upwards, and
the high pressure is released as the winds spread out again
in a divergence on a higher altitude.
I have argued that the
perceived convergence of media may be viewed as combinations
of earlier form, or forms sliding towards each other as
variables change. Some combinations are time-tested and
traditional; others are new and creative. Combinations
adhere to certain restrictions and follow certain patterns,
described in the section above using mode of
distribution, mode of restrictions, mode of acquisition,
and mode of signification.
As each of the four axes
span many modes and dimensions, their possible combinations
are numerous. Traditional media utilise only a few
combinations of modes, many more are possible on the
computer. What has become possible is a veritable
divergence of rhetorics. I have neither hope nor
ambition of describing all aspects of rhetorical
convergence, but even the small number of perspectives I
have discussed can combine in a vast number of different
forms, as a little math would easily demonstrate. And the
likely finding of just one other rhetorical dimension would
drastically increase the total number of possibilities.
Hence, although I have argued that convergence is a
meaningful term to use, this convergence can only
result in a divergence of forms. Computer media
allow authors to choose many rhetorical modes that
previously were dictated by the various media's
technologies. Media convergence has, as it were, broken
apart the building blocks of genres in earlier media, and
given them all to the digital author to set up new
rhetorical constructions with many combinations of materials
hitherto unheard of. We haven't seen most of them yet.