One of the potentially confusing aspects of evaluation within computer-supported cooperative work (CSCW) is that there are many activities one might wish to carry out at different times that evaluate socio-technical systems. I identify five ideal types: the effects of a new computer system in an organisation; the formative development of a piece of software; the evaluation of conceptual developments; the evaluation of a cooperative system where factors other than the computers are more interesting; and the determination of which piece of software to buy.
The evaluation of Information Systems (IS) has proved to be hard,
as is evidenced by the ongoing search for appropriate methodologies
in conferences like this one and past conferences. Symons and
Walsham (1988) argued that this difficulty is due to the multiple
perspectives involved, and the difficulties of quantifying benefits.
The evaluation of Computer-Supported Cooperative Work (CSCW) systems,
sometimes called groupware, has proved to be even harder. This
paper will not discuss why this should be in detail, as my colleagues
and I (Ross et al. 1995) have already done so at some length,
as have others in earlier papers (Bannon 1993; Grudin 1988). All
the same reasons why IS evaluation is difficult arise here also,
but they are compounded by the social embededness of the computer
systems involved: in IS in general, cooperation takes place using
the computer; in CSCW, that cooperative takes place via
the computer.
This means that any organisational conflicts or clashes in individual
personalities and cultures will not only be more readily apparent
but will directly affect how well the system works. It may well
be the case that a computer system will be designed perfectly,
with all the right sort of software engineering procedures, requirements
analysis, and usability testing; but that the system is introduced
insensitively, or it cuts across the way people have become used
to working, or it changes the power relationships between workers
at different levels of the organisation. All of these have been
well documented in case studies in the CSCW literature, (e.g.
Bowers et al. 1995; Grudin 1988; Ramage 1994). In such situations,
it may be the case that the system is not used by sufficient people
to attain the critical mass Grudin points out is necessary for
some systems to be useful; or it may be that people devise work-arounds
that bypass the computer and feed it information later, as in
Bowers' study; or it may be that people grumble about the "Big
Brother" systems but have no choice but to use them (as in
my study).
Morten Kyng has argued (1991) for cooperative design to parallel
the cooperative nature of the work. This is one solution to the
problem - to increase the acceptance of the system by ensuring
it is designed with user involvement. If this works well, it should
both make the system closer to the way people work (hence making
it more useful), and be more likely to be used as users will hopefully
feel it is more their system.
This is an excellent approach where it is possible. I wish to
concentrate, however, on evaluation - either with this kind of
design, or with more traditional modes, or as part of an iterative
design solution. The conclusions that my colleagues and I came
to in earlier work (Ross et al. 1995) was that one of the things
that makes CSCW evaluation so fraught is the wide range of different
perspectives that need to be brought to bear: usability, individual
psychology, group dynamics, the efficiency of communications,
the effects of and on organisational structures and cultures,
and so on. All too often, most of these are excluded, and a narrow
disciplinary view is taken (according to the backgrounds and skills
of the evaluators). We argued for the importance of multiplicity
of theories brought to bear on the evaluation, to counter
this and ensure the whole range of experiences with the system
be considered.
We also argued in that paper that a further problem is the dominance
of the views of experts - Scientists in White Coats - over those
of the people who actually use the systems. In fact, both are
necessary: we must be aware of the theoretical, analytic side
to the evaluation (based on scientific analysis of users' experiences)
but also the users' own perception of their experiences. Accordingly,
we proposed a framework based on multiple perspectives - in our
case study, those of users and evaluators - on evaluation, and
methods to allow both to be reflected.
The question of multiple perspectives needs further unpacking, however. The category "user" is one that has become increasingly disputed, as ignoring the reality of people's work and shoving them into a category whose focus is on the computer (Grudin 1993). I have been particularly interested in at the different needs of the many groups of people with a stake in the nature and effects of a CSCW system; and how these different needs change the way they evaluate it. For the print-room staff in Bowers et al. (1995)'s study, the workflow system imposed upon them interfered with their work, and made it less efficient and interesting; but for their managers, it provided useful data about how they were working. The question of multiple stakeholders in the evaluation thus arises (Ramage and Matzdorf 1996).
So CSCW evaluation requires questions of multiple theories, methods,
perspectives and stakeholders to be considered. I shall address
in the rest of this paper another kind of multiplicity, albeit
at a different level than the others: that there are many different
kinds of activity that one might consider performing under the
name of 'evaluation' pertinent to CSCW systems.
I have to date identified five kinds of activity. No doubt there
are others - this taxonomy is an open one to be changed, presented
for the provocation of thought. A further caveat is that these
are ideal types (in Weber's sense): I would in practice expect
to see them appear in combination and in modified form. My point
for describing these types, however, is that CSCW evaluation can
get terrible confused (and confusing) unless one realises what
type of evaluation one is conducting. Thus, there is nothing wrong
in purely conducting a usability evaluation, for formative purposes,
but if one believes one has completely evaluated the system and
its potential effects on an organisation in that way, unfortunate
consequences could result...
The five types are presented here in diagrammatic form; each will be described, with examples, later.
Before moving to discuss the five types in detail, one point must be made. I am not just talking about the evaluation of technology, but rather of "cooperative systems" - collections of people, organisations, technological artefacts and environments, with tasks and goals. A full discussion of the implications of this term awaits another paper, but it is interesting to note here the parallels to the use of the concept "socio-technical system" by researchers at the Tavistock Institute. I am similarly arguing for an interweaving of the social ordering of an organisation and the technology introduced into it. The interesting difference is the directions of travel: the Tavistock researchers, being mostly psychodynamic psychologists, took the significance of the social order as a given and sought to bring the perspective of the technical into the debate, as Trist (1992), the founder of that approach, makes clear in an introduction to a volume of papers on the subject. CSCW, on the other hand, is a dialogue which has involved the introduction of concepts of social ordering into what was previously a technical discipline. To put this another way in slogan form: Evaluation is no good if it just considers the computer. The situation is also up for evaluation.
What happens when you introduce a new computer system into an
organisation? How does it change the organisation's work, members
and outcomes (such as profits)? How does the technology change
for the needs of the organisation? These questions are those of
evaluation in its classic form.
They resemble the work on the evaluation of organisational change
(Legge 1984) and of educational and social projects. In these
areas, a big debate has gone on about methodology: the 'paradigm
wars' between qualitative and quantitative researchers.
In CSCW, the starting point has substantially been qualitative;
field studies are carried out using ethnographic methods on the
whole, although guided by a variety of different theoretical frameworks,
including ethnomethodology (Harper et al. 1991), structuration
theory (Orlikowski and Gash 1994), distributed cognition (Rogers
1994) and Bateson's learning types (Star and Ruhleder 1994). This
is in the nature of the shift that CSCW entails, from the usability
trials based on cognitive psychology used by researchers and practitioners
in Human-Computer Interaction (HCI), to considering the social
organisation of work and its effects on technology.
So the typical model of these evaluation studies is this: the
researcher either asks or is asked to come into the organisation;
they 'hang around', watching what is going on, and perform interviews;
they structure these ideas in terms of their preferred theory;
and then they present a conclusion to the members of the organisation.
That is, they conduct a fairly typical kind of qualitative research.
It is usually the case in these evaluations that the researcher/evaluator
is an outsider to an organisation, called in as a consultant or
coming in for their own research agenda. This has all the standard
advantages and disadvantages of outsiders: they can see things
the organisation's members can't, as they are not bound by its
norms and assumptions; they also fail to see things for precisely
the same reason, and there are risks about them becoming the Scientists
in White Coats I previously warned against.
On the more positive side, the presence of evaluators in an organisation
can be valuable of itself. They can bring valuable experience
with computers and with organisational theory, and help members
of the organisation learn about these things in little ways that
enhance the use of the system they're evaluating. Indeed, the
process of doing evaluation may act to provide organisational
learning, by raise issues for the members of the organisation
that they hadn't previously given thought to.
Similarly, members of an organisation conduct their own evaluations of the effects of a computer system: managers are keen to know what is going on, for example, so they can decide whether to keep it in its current form or to change it. Few of these are reported in the literature. Their form would reflect the kind involving outsiders, only with a different scope and more problematic issues about power (employees may be reluctant to tell managers a system is terrible if the instigator of that system is a powerful political operator).
There are those within the CSCW community whose interest is to
build systems (in the narrow, technical sense of the term) and
have them be used, either for commercial purposes or within a
research community. Their need for evaluation is to develop their
systems further, making them more usable and more appropriate
for the intended or actual users - their evaluation is a formative
one.
We can see two main sub-types of this: taking place before the
system has been completed, and taking place once it is finished
and has been 'shipped' to customers or colleagues. Formative evaluation
is also related to the iterative prototyping often advocated within
the Human-Computer Interaction literature as being a way of improving
a system as you go along, as well as empowering users by giving
them a say on what their system will do (Nielsen 1993).
The first kind of formative evaluation often takes place without
it being explicitly recognised. However, among those who do write
about it, key phrases recur, such as "initial experiences"
and "some problems which need to be refined" . These
studies tend to be led by the needs of the designers rather than
the users - they may be based on real work for methodological
reasons, but will tend to take place in such settings as usability
labs. It is seldom the case that in practice formative evaluation
gets much beyond usability questions - it may speculate on
the effects on people and work, but is not able to find out much
about these, in limited time.
The second kind, though little represented in the CSCW literature, is nonetheless interesting. One sense of this is the 'beta-testing' procedure followed by many commercial firms, where a semi-finished product is passed around a selection of users to gauge their reactions. Another is the development of the product from one version to another. An example of this is given by Abbott and Sarin (1994), who talk about the development of different versions of a workflow system according to the experience of users with it.
Not all systems development concerns the design or use of products
intended for 'real' organisations. Many pieces of development
are conducted purely for their research interests. The evaluation
of such systems is therefore not at a level where one is examining
the effect of a system in use, or trying to redesign it for future
use. Rather, what one is seeking to evaluate here are the concepts
that underlie the system, and whether those concepts are applicable.
There are four kinds of research one might wish to evaluate in
this way: research projects carried out within commercial research
centres; academic research projects; PhD projects; and externally-funded
research projects.
The studies in commercial research centres are those that in an
earlier paper (Plowman et al. 1995) we identified as semi-situated,
that is neither completely situated in the 'real world' nor completely
artificial and in the laboratory. Thus "real work is still
under study, but it is the real work of researchers rather than
of typical users" (ibid., p.311). These have been particularly
seen within CSCW in centres such as the Xerox Palo Alto Research
Centre (PARC), and its European counterpart EuroPARC; but also
at Hewlett-Packard and IBM. A good example of these studies are
those concerning 'active badges' (Harper 1992). Methodologically,
they are often qualitative, based on users' experiences within
the research centres over some time (sometimes with specific experiments
added to test particular questions). The general model is one
of "looking at ourselves" (ibid.). Many interesting
results have arisen from such studies, at a theoretical level
- how people cooperate at work, and how we built systems to support
them - rather than a product level.
The academic research project, conducted for its own sake, will
use evaluation as a tool to improve its systems. Many of the instances
of this kind in the literature (Wan and Johnson 1994) do just
this, conducting experiments to find out how their prototype system
is used and what is learned from its use - the subjects being
either one's colleagues (similar to the research centres) or one's
students (similar to the classic psychology experiments). The
results are often of theoretical interest as well as improving
the system - Wan and Johnson comment that "lessons learned
through the design and evaluation of [the system] provide new
insights into both collaborative learning systems and collaborative
learning theories".
Evaluation of a computer system produced as part of a PhD project
is often felt by external examiners to be lacking. This seems
to be for one of two reasons: the system produced may not have
been properly studied (although this would have been appropriate);
or it may have not been clear how the system could be evaluated.
The former case requires more work, and a greater awareness of
the importance of evaluation at many stages of a project; an example
of good evaluation of this kind is Twidale et al. (1994). This
was concerned with a toolkit for building cooperative systems,
the evaluation of which is a particular difficult issue; this
study solved the problem by building a sample system and evaluating
that, a good solution if there is one main kind of system covered.
For others, an evaluation of the toolkit according to theory and
objectives seems a good option.
Evaluation for external funding bodies usually appears in the form of 'deliverables': weighty documents that show what the project has done, what papers and computer systems it has produced, and how it has met the goals that were specified at the start of the project. This last point is important: these documents are essentially rationalisations, proofs of the worthiness of the work conducted. Thus, they fit in neatly with the 'goal-based' model of evaluation. Another important feature is the evaluator's report - a good example is the evaluation conducted of the British Alvey programme of IT projects (Guy et al. 1991). The subject has been considered at some length as part of the European learning technologies programme DELTA, which has a specific evaluation component (Cullen et al. 1993).
As I have argued above, it is important when looking at CSCW to
consider not just the technology but the whole socio-technical
system. Accordingly, some evaluations that are of cooperative
systems (in the sense of collections of people, technology and
organisations) will have as their main focus issues other than
technology. I must stress that this is not to denigrate the importance
of the technology: the question is rather one of focus.
As an example, I have recently been evaluating the work of a project
team who are looking at organisational learning in the surveying
profession (Ramage and Matzdorf 1996). My main role has been to
look at the team's own organisational learning, how they function
as a group, whether this is different from a standard research
team and if so how this affects their research. As it happens,
the team is split across three sites in Sheffield, and so maintain
communication by phone, email (with ingenious use of attachments),
fax and post, as well as meetings. They are clearly a cooperative
system, and make use of CSCW technology; in that sense my evaluation
of them is an evaluation of a CSCW system. However, my main interest
in the evaluation (and their main interest in my being there)
is not the communications mechanisms they use, but rather their
functioning as a group. My focus is on issues other than
the technological ones.
This same approach can be seen for a number of authors in the CSCW literature. Clement (1990) and Sherry (1995) are two good examples: both present case studies of situations that involve CSCW use (an office automation system and a communications system among Navajo Indians); but both take as their main focus of interest questions of power balance, empowerment and authority. The case study is used as a justification for an academic discussion about a topic deeply important to the authors, on which they had explicitly worked with the research participants for purposes of their empowerment.
The question "should I buy WordPerfect, Microsoft Word or
Lotus WordPro for my company?" is a form of evaluation: it
is an exercise in examining alternatives, weighing up their pros
and cons, and coming to a conclusion. So how does the hard-pressed
IS manager make the decision as to which system to buy?
The answers appear to be three-fold: they go on the prevailing
fashions; they take advice from knowledgeable sources; or they
get information from suppliers. On the first, it was well known
in the past that "no-one ever got sacked for buying IBM"
(now it is probably Microsoft) - trends come and go, and at the
moment no well-dressed company would be seen without its Web pages.
A more rational approach is to get good advice, and this usually
comes either through one's peers (say in other companies) or from
magazines. Rather in the same way that the consumer magazines
assist people in buying cars or cameras, computer magazines give
reviews and analyses of what is best about the current set of
groupware tools, Web servers, word processors or spreadsheets.
For example, a recent issue of Byte magazine lists the
relative advantages of Lotus Notes and the Web (Roberts 1996).
Finally, the software manufacturers and resellers will often provide
information to allow decisions to be made. Lotus provide a "NotesSuite
Evaluators Guide" for potential purchasers of their office
software (Lotus Development Corporation 1996); others do likewise.
This guide contains short paragraphs on the each of the components
that make up their software suite, and then ten pages headed "Why
should I buy NotesSuite?", listing its key features, advantages
over comparable packages (Microsoft Office), solutions for businesses
and developers, and so on. Appealing to the profit margins of
companies directly, Lotus also provide a lengthy consultants'
report on the Return on Investment (McCready et al. 1995) to be
gained from buying Notes.
The occasional academic study describes this process going on.
(Fanning and Raphael 1986) describe experiences in the mid-1980s
at Hewlett-Packard with computer teleconferencing, and give a
good description of the process by which they selected one system
and the criteria they applied to come to that decision. They list
several systems with the reasons why they might have been appropriate,
and why not. Interestingly, the selection was made by a team of
seven people, "ranged from technician to lab manager",
so perhaps the suggestion above that this is a task purely for
the IS manager is an unfair one.
Indeed, the composition of the purchasing team is much of the concern of Green et al. (1991), who discuss the purchase of a library automation system by a team coming from all levels of staff. They explicitly talk about "systems evaluation techniques" - a combination of demonstrations of available systems by their suppliers, and visits by the team to other libraries where the systems were used (mainly the former). The process of evaluation for purchase "did open up systems selection and development issues to basic-grade women library staff" (p.41), without the usual intermediaries, and leading to a much stronger focus on staff and borrower needs than technical requirements (than might have been the case had the purchasing decisions been by managers). The issues of gender and empowerment that they discuss seem to have had a direct impact on the quality of the decisions made.
Typologies are dangerous things, especially so if they attempt
to classify a wide range of work or people. They may appear to
be prescriptive rather than descriptive; they may cause offence
by placing people into categories they don't feel are appropriate;
and they may restrict debate and action by over-defining an area.
It is therefore vital for one who inflicts a new typology upon
an already highly classified world to explain clearly its purpose.
I therefore restate my earlier remark: the five types in this
paper are not intended to be taken as a set of closed,
complete categories - I recognise that they overlap and may well
be incomplete.
I have presented them for two reasons: self-understanding and
to make a case about the nature of evaluation. First, it seems
to me most useful for an evaluator to understand what kind of
evaluation they are performing. Types 3 and 4 are rather distinct
from types 1 and 2, though they may appear to have similarities.
The kind of evaluation conducted as part of PhD research is qualitatively
different from that conducted in commercial software development,
and it is well to recognise this.
A second reason for my typology is my increasing awareness of
the breadth of different activity that might properly be conducted
under the title of evaluation. It has been a surprise to me to
realise that type 3 (conceptual) is an independent activity, as
has been clear when those in my department developing systems
for their PhD have asked "how should I evaluate this?".
Again, while I have for some time been in favour of socio-technical
evaluation, it has been a surprise to me to realise that a purely
social focus is still a valid form of CSCW evaluation. And finally,
it has surprised me to realise that when an organisation examines
which computer systems to buy, its activity then is also evaluation.
My argument then is for the recognition of those three kinds of evaluation, along with the more traditional effects and formative forms. Doing so will not lessen the complexity of the process of CSCW evaluation - many different issues and stakeholder perspectives still need to be taken into account - but perhaps it may make one aspect of that complexity somewhat clearer.
This paper has benefited from discussions with and comments by Fredrik Ljungberg and Fides Matzdorf. The work was funded by the EPSRC and Digital Equipment Corporation.
Kenneth Abbott and Sunil Sarin (1994). Experiences with Workflow
Management: Issues for the Next Generation. Proceedings of
the Conference on Computer-Supported Cooperative Work (CSCW '94).
Liam Bannon (1993). Use, Design, and Evaluation: Steps towards
an integration. Shaerding CSCW Workshop.
John Bowers, Graham Button and Wes Sharrock (1995). Workflow from
Within and Without: Technology and Cooperative Work on the Print
Industry Shopfloor. Proceedings of the Fourth European Conference
on Computer-Supported Cooperative Work (ECSCW 95), pp.
51-66.
Andrew Clement (1990). Cooperative Support for Computer Work:
A Social Perspective on the Empowering of End Users. Proceedings
of the Conference on Computer-Supported Cooperative Work (CSCW'90),
pp. 223-236.
J. Cullen, J. Kelleher and E. Stern (1993). Evaluation in DELTA.
Journal of Computer Assisted Learning, 9: 115-126.
Tony Fanning and Bert Raphael (1986). Computer Teleconferencing:
Experience at Hewlett-Packard. In Irene Greif (Ed.), Proceedings
of the Conference on Computer-Supported Cooperative Work (CSCW
86), pp. 291-306.
Eileen Green, Jenny Owen and Den Pain (1991). Office Systems Development
and Gender: Implications for Computer-Supported Cooperative Work.
Proceedings of the Second European Conference on Computer-Supported
Cooperative Work (ECSCW 91), pp. 33-48.
Jonathan Grudin (1988). Why CSCW Applications Fail: Problems in
the Design and Evaluation of Organisational Interfaces. Proceedings
of the Conference on Computer-Supported Cooperative Work (CSCW
'88) .
Jonathan Grudin (1993). Interface. Communications of the ACM,
36 (4): 112-119.
Ken Guy, Paul Quintas, Michael Hobday, Luke Georghiou, Hugh Cameron
and Tim Ray (1991). Evaluation of the Alvey Programme for Advanced
Information Technology. London: HMSO.
Richard Harper (1992). Looking at Ourselves: An Examination of
the Social Organisation of Two Research Laboratories. Proceedings
of the Conference on Computer-Supported Cooperative Work (CSCW
92).
Richard Harper, John Hughes and Dan Shapiro (1991). Harmonious
Working and CSCW: Computer technology and air traffic control.
In John Bowers and Steve Benford (Ed.), Studies in Computer
Supported Cooperative Work: Theory, Practice and Design, pp.
225-234. Amsterdam: North Holland.
Kari Kuutti (1995). Debates in IS and CSCW Research: Anticipating
System Design for Post-Fordist Work. In Wanda Orlikowski et al.
(eds.), Information Technology and Changes in Organizational
Work, pp. 177-196. London: Chapman & Hall.
Morten Kyng (1991). Designing for cooperation: cooperating in
design. Communications of the ACM, 34 (12): 65-73.
Karen Legge (1984). Evaluating planned organizational change.
London: Academic Press.
Lotus Development Corporation (1996). NotesSuite Evaluators Guide.
http://www.lotus.com/notesuit/evlgd.htm.
Scott McCready, Ann Palermo, Gerry Murray and Darby Johnson (1995).
Lotus Notes: Agent of Change; The Financial Impact of Lotus Notes
on Business. http://www.lotus.com/ntsdoc96/roi.htm.
Jakob Nielsen (1993). Usability Engineering. London: Academic
Press.
Wanda Orlikowski and Debra Gash (1994). Technological Frames:
Making Sense of Information Technology in Organizations. ACM
Transactions on Information Systems, 12 (2): 174-207.
Lydia Plowman, Yvonne Rogers and Magnus Ramage (1995). What are
workplace studies for? Proceedings of the Fourth European Conference
on Computer-Supported Cooperative Work (ECSCW 95).
Magnus Ramage (1994). Engineering a smooth flow? A study of workflow
software and its connections with business process reengineering.
MSc Dissertation, University of Sussex, Brighton, England.
Magnus Ramage and Fides Matzdorf (1996). Cui Bono? A stakeholder
approach to CSCW evaluation. Unpublished manuscript. Available
from author.
Bill Roberts. (1996). Groupwar Strategies: Six key technologies
will tell you if you need Notes OR the Web or Notes AND the Web.
Byte, July 1996.
Yvonne Rogers (1994). Exploring obstacles: integrating CSCW in
evolving organisations. Proceedings of the Conference on Computer-Supported
Cooperative Work (CSCW 94).
Susi Ross, Magnus Ramage and Yvonne Rogers (1995). PETRA: Participatory
Evaluation Through Redesign And Analysis. Interacting With
Computers, 7 (4): 335-360.
John Sherry (1995). Cooperation and Power. Proceedings of the
Fourth European Conference on Computer-Supported Cooperative Work
(ECSCW 95), pp. 67-82.
Susan Leigh Star and Karen Ruhleder (1994). Steps towards an ecology
of infrastructure: complex problems in design and access for large-scale
collaborative systems. Proceedings of the Conference on Computer-Supported
Cooperative Work (CSCW 94).
Veronica Symons and Geoff Walsham (1988). The evaluation of information
systems: a critique. Journal of Applied Systems Analysis 15:119-132.
Eric Trist (1992). Introduction to Volume II. In Eric Trist, Hugh
Murray and Beulah Trist (Ed.), The Social Engagement of Social
Science: A Tavistock Anthology - Volume II: The Socio-Technical
Perspective, pp. 36-60. Philadelphia: University
of Pennsylvania Press.
Michael Twidale, David Randall and Richard Bentley (1994). Situated
evaluation for cooperative systems. Proceedings of the Conference
on Computer-Supported Cooperative Work (CSCW 94).
Dadong Wan and Philip Johnson (1994). Computer Supported Collaborative Learning Using CLARE: The Approach and Experimental Findings. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 94).