CSCW Evaluation in Five Types

Magnus Ramage
Computing Department
Lancaster University
Lancaster, LA1 4YR, UK
magnus@comp.lancs.ac.uk

Abstract

One of the potentially confusing aspects of evaluation within computer-supported cooperative work (CSCW) is that there are many activities one might wish to carry out at different times that evaluate socio-technical systems. I identify five ideal types: the effects of a new computer system in an organisation; the formative development of a piece of software; the evaluation of conceptual developments; the evaluation of a cooperative system where factors other than the computers are more interesting; and the determination of which piece of software to buy.

On multiplicity

The evaluation of Information Systems (IS) has proved to be hard, as is evidenced by the ongoing search for appropriate methodologies in conferences like this one and past conferences. Symons and Walsham (1988) argued that this difficulty is due to the multiple perspectives involved, and the difficulties of quantifying benefits.

The evaluation of Computer-Supported Cooperative Work (CSCW) systems, sometimes called groupware, has proved to be even harder. This paper will not discuss why this should be in detail, as my colleagues and I (Ross et al. 1995) have already done so at some length, as have others in earlier papers (Bannon 1993; Grudin 1988). All the same reasons why IS evaluation is difficult arise here also, but they are compounded by the social embededness of the computer systems involved: in IS in general, cooperation takes place using the computer; in CSCW, that cooperative takes place via the computer.

This means that any organisational conflicts or clashes in individual personalities and cultures will not only be more readily apparent but will directly affect how well the system works. It may well be the case that a computer system will be designed perfectly, with all the right sort of software engineering procedures, requirements analysis, and usability testing; but that the system is introduced insensitively, or it cuts across the way people have become used to working, or it changes the power relationships between workers at different levels of the organisation. All of these have been well documented in case studies in the CSCW literature, (e.g. Bowers et al. 1995; Grudin 1988; Ramage 1994). In such situations, it may be the case that the system is not used by sufficient people to attain the critical mass Grudin points out is necessary for some systems to be useful; or it may be that people devise work-arounds that bypass the computer and feed it information later, as in Bowers' study; or it may be that people grumble about the "Big Brother" systems but have no choice but to use them (as in my study).

Morten Kyng has argued (1991) for cooperative design to parallel the cooperative nature of the work. This is one solution to the problem - to increase the acceptance of the system by ensuring it is designed with user involvement. If this works well, it should both make the system closer to the way people work (hence making it more useful), and be more likely to be used as users will hopefully feel it is more their system.

This is an excellent approach where it is possible. I wish to concentrate, however, on evaluation - either with this kind of design, or with more traditional modes, or as part of an iterative design solution. The conclusions that my colleagues and I came to in earlier work (Ross et al. 1995) was that one of the things that makes CSCW evaluation so fraught is the wide range of different perspectives that need to be brought to bear: usability, individual psychology, group dynamics, the efficiency of communications, the effects of and on organisational structures and cultures, and so on. All too often, most of these are excluded, and a narrow disciplinary view is taken (according to the backgrounds and skills of the evaluators). We argued for the importance of multiplicity of theories brought to bear on the evaluation, to counter this and ensure the whole range of experiences with the system be considered.

We also argued in that paper that a further problem is the dominance of the views of experts - Scientists in White Coats - over those of the people who actually use the systems. In fact, both are necessary: we must be aware of the theoretical, analytic side to the evaluation (based on scientific analysis of users' experiences) but also the users' own perception of their experiences. Accordingly, we proposed a framework based on multiple perspectives - in our case study, those of users and evaluators - on evaluation, and methods to allow both to be reflected.

The question of multiple perspectives needs further unpacking, however. The category "user" is one that has become increasingly disputed, as ignoring the reality of people's work and shoving them into a category whose focus is on the computer (Grudin 1993). I have been particularly interested in at the different needs of the many groups of people with a stake in the nature and effects of a CSCW system; and how these different needs change the way they evaluate it. For the print-room staff in Bowers et al. (1995)'s study, the workflow system imposed upon them interfered with their work, and made it less efficient and interesting; but for their managers, it provided useful data about how they were working. The question of multiple stakeholders in the evaluation thus arises (Ramage and Matzdorf 1996).

Multiple types of CSCW evaluation

So CSCW evaluation requires questions of multiple theories, methods, perspectives and stakeholders to be considered. I shall address in the rest of this paper another kind of multiplicity, albeit at a different level than the others: that there are many different kinds of activity that one might consider performing under the name of 'evaluation' pertinent to CSCW systems.

I have to date identified five kinds of activity. No doubt there are others - this taxonomy is an open one to be changed, presented for the provocation of thought. A further caveat is that these are ideal types (in Weber's sense): I would in practice expect to see them appear in combination and in modified form. My point for describing these types, however, is that CSCW evaluation can get terrible confused (and confusing) unless one realises what type of evaluation one is conducting. Thus, there is nothing wrong in purely conducting a usability evaluation, for formative purposes, but if one believes one has completely evaluated the system and its potential effects on an organisation in that way, unfortunate consequences could result...

The five types are presented here in diagrammatic form; each will be described, with examples, later.

Figure 1: The Five Types of CSCW Evaluation

Before moving to discuss the five types in detail, one point must be made. I am not just talking about the evaluation of technology, but rather of "cooperative systems" - collections of people, organisations, technological artefacts and environments, with tasks and goals. A full discussion of the implications of this term awaits another paper, but it is interesting to note here the parallels to the use of the concept "socio-technical system" by researchers at the Tavistock Institute. I am similarly arguing for an interweaving of the social ordering of an organisation and the technology introduced into it. The interesting difference is the directions of travel: the Tavistock researchers, being mostly psychodynamic psychologists, took the significance of the social order as a given and sought to bring the perspective of the technical into the debate, as Trist (1992), the founder of that approach, makes clear in an introduction to a volume of papers on the subject. CSCW, on the other hand, is a dialogue which has involved the introduction of concepts of social ordering into what was previously a technical discipline. To put this another way in slogan form: Evaluation is no good if it just considers the computer. The situation is also up for evaluation.

Type 1: Effects

What happens when you introduce a new computer system into an organisation? How does it change the organisation's work, members and outcomes (such as profits)? How does the technology change for the needs of the organisation? These questions are those of evaluation in its classic form.

They resemble the work on the evaluation of organisational change (Legge 1984) and of educational and social projects. In these areas, a big debate has gone on about methodology: the 'paradigm wars' between qualitative and quantitative researchers.

In CSCW, the starting point has substantially been qualitative; field studies are carried out using ethnographic methods on the whole, although guided by a variety of different theoretical frameworks, including ethnomethodology (Harper et al. 1991), structuration theory (Orlikowski and Gash 1994), distributed cognition (Rogers 1994) and Bateson's learning types (Star and Ruhleder 1994). This is in the nature of the shift that CSCW entails, from the usability trials based on cognitive psychology used by researchers and practitioners in Human-Computer Interaction (HCI), to considering the social organisation of work and its effects on technology.

So the typical model of these evaluation studies is this: the researcher either asks or is asked to come into the organisation; they 'hang around', watching what is going on, and perform interviews; they structure these ideas in terms of their preferred theory; and then they present a conclusion to the members of the organisation. That is, they conduct a fairly typical kind of qualitative research.

It is usually the case in these evaluations that the researcher/evaluator is an outsider to an organisation, called in as a consultant or coming in for their own research agenda. This has all the standard advantages and disadvantages of outsiders: they can see things the organisation's members can't, as they are not bound by its norms and assumptions; they also fail to see things for precisely the same reason, and there are risks about them becoming the Scientists in White Coats I previously warned against.

On the more positive side, the presence of evaluators in an organisation can be valuable of itself. They can bring valuable experience with computers and with organisational theory, and help members of the organisation learn about these things in little ways that enhance the use of the system they're evaluating. Indeed, the process of doing evaluation may act to provide organisational learning, by raise issues for the members of the organisation that they hadn't previously given thought to.

Similarly, members of an organisation conduct their own evaluations of the effects of a computer system: managers are keen to know what is going on, for example, so they can decide whether to keep it in its current form or to change it. Few of these are reported in the literature. Their form would reflect the kind involving outsiders, only with a different scope and more problematic issues about power (employees may be reluctant to tell managers a system is terrible if the instigator of that system is a powerful political operator).

Type 2: Formative

There are those within the CSCW community whose interest is to build systems (in the narrow, technical sense of the term) and have them be used, either for commercial purposes or within a research community. Their need for evaluation is to develop their systems further, making them more usable and more appropriate for the intended or actual users - their evaluation is a formative one.

We can see two main sub-types of this: taking place before the system has been completed, and taking place once it is finished and has been 'shipped' to customers or colleagues. Formative evaluation is also related to the iterative prototyping often advocated within the Human-Computer Interaction literature as being a way of improving a system as you go along, as well as empowering users by giving them a say on what their system will do (Nielsen 1993).

The first kind of formative evaluation often takes place without it being explicitly recognised. However, among those who do write about it, key phrases recur, such as "initial experiences" and "some problems which need to be refined" . These studies tend to be led by the needs of the designers rather than the users - they may be based on real work for methodological reasons, but will tend to take place in such settings as usability labs. It is seldom the case that in practice formative evaluation gets much beyond usability questions - it may speculate on the effects on people and work, but is not able to find out much about these, in limited time.

The second kind, though little represented in the CSCW literature, is nonetheless interesting. One sense of this is the 'beta-testing' procedure followed by many commercial firms, where a semi-finished product is passed around a selection of users to gauge their reactions. Another is the development of the product from one version to another. An example of this is given by Abbott and Sarin (1994), who talk about the development of different versions of a workflow system according to the experience of users with it.

Type 3: Conceptual

Not all systems development concerns the design or use of products intended for 'real' organisations. Many pieces of development are conducted purely for their research interests. The evaluation of such systems is therefore not at a level where one is examining the effect of a system in use, or trying to redesign it for future use. Rather, what one is seeking to evaluate here are the concepts that underlie the system, and whether those concepts are applicable.

There are four kinds of research one might wish to evaluate in this way: research projects carried out within commercial research centres; academic research projects; PhD projects; and externally-funded research projects.

The studies in commercial research centres are those that in an earlier paper (Plowman et al. 1995) we identified as semi-situated, that is neither completely situated in the 'real world' nor completely artificial and in the laboratory. Thus "real work is still under study, but it is the real work of researchers rather than of typical users" (ibid., p.311). These have been particularly seen within CSCW in centres such as the Xerox Palo Alto Research Centre (PARC), and its European counterpart EuroPARC; but also at Hewlett-Packard and IBM. A good example of these studies are those concerning 'active badges' (Harper 1992). Methodologically, they are often qualitative, based on users' experiences within the research centres over some time (sometimes with specific experiments added to test particular questions). The general model is one of "looking at ourselves" (ibid.). Many interesting results have arisen from such studies, at a theoretical level - how people cooperate at work, and how we built systems to support them - rather than a product level.

The academic research project, conducted for its own sake, will use evaluation as a tool to improve its systems. Many of the instances of this kind in the literature (Wan and Johnson 1994) do just this, conducting experiments to find out how their prototype system is used and what is learned from its use - the subjects being either one's colleagues (similar to the research centres) or one's students (similar to the classic psychology experiments). The results are often of theoretical interest as well as improving the system - Wan and Johnson comment that "lessons learned through the design and evaluation of [the system] provide new insights into both collaborative learning systems and collaborative learning theories".

Evaluation of a computer system produced as part of a PhD project is often felt by external examiners to be lacking. This seems to be for one of two reasons: the system produced may not have been properly studied (although this would have been appropriate); or it may have not been clear how the system could be evaluated. The former case requires more work, and a greater awareness of the importance of evaluation at many stages of a project; an example of good evaluation of this kind is Twidale et al. (1994). This was concerned with a toolkit for building cooperative systems, the evaluation of which is a particular difficult issue; this study solved the problem by building a sample system and evaluating that, a good solution if there is one main kind of system covered. For others, an evaluation of the toolkit according to theory and objectives seems a good option.

Evaluation for external funding bodies usually appears in the form of 'deliverables': weighty documents that show what the project has done, what papers and computer systems it has produced, and how it has met the goals that were specified at the start of the project. This last point is important: these documents are essentially rationalisations, proofs of the worthiness of the work conducted. Thus, they fit in neatly with the 'goal-based' model of evaluation. Another important feature is the evaluator's report - a good example is the evaluation conducted of the British Alvey programme of IT projects (Guy et al. 1991). The subject has been considered at some length as part of the European learning technologies programme DELTA, which has a specific evaluation component (Cullen et al. 1993).

Type 4: People-Focus

As I have argued above, it is important when looking at CSCW to consider not just the technology but the whole socio-technical system. Accordingly, some evaluations that are of cooperative systems (in the sense of collections of people, technology and organisations) will have as their main focus issues other than technology. I must stress that this is not to denigrate the importance of the technology: the question is rather one of focus.

As an example, I have recently been evaluating the work of a project team who are looking at organisational learning in the surveying profession (Ramage and Matzdorf 1996). My main role has been to look at the team's own organisational learning, how they function as a group, whether this is different from a standard research team and if so how this affects their research. As it happens, the team is split across three sites in Sheffield, and so maintain communication by phone, email (with ingenious use of attachments), fax and post, as well as meetings. They are clearly a cooperative system, and make use of CSCW technology; in that sense my evaluation of them is an evaluation of a CSCW system. However, my main interest in the evaluation (and their main interest in my being there) is not the communications mechanisms they use, but rather their functioning as a group. My focus is on issues other than the technological ones.

This same approach can be seen for a number of authors in the CSCW literature. Clement (1990) and Sherry (1995) are two good examples: both present case studies of situations that involve CSCW use (an office automation system and a communications system among Navajo Indians); but both take as their main focus of interest questions of power balance, empowerment and authority. The case study is used as a justification for an academic discussion about a topic deeply important to the authors, on which they had explicitly worked with the research participants for purposes of their empowerment.

Type 5: Buying

The question "should I buy WordPerfect, Microsoft Word or Lotus WordPro for my company?" is a form of evaluation: it is an exercise in examining alternatives, weighing up their pros and cons, and coming to a conclusion. So how does the hard-pressed IS manager make the decision as to which system to buy?

The answers appear to be three-fold: they go on the prevailing fashions; they take advice from knowledgeable sources; or they get information from suppliers. On the first, it was well known in the past that "no-one ever got sacked for buying IBM" (now it is probably Microsoft) - trends come and go, and at the moment no well-dressed company would be seen without its Web pages.

A more rational approach is to get good advice, and this usually comes either through one's peers (say in other companies) or from magazines. Rather in the same way that the consumer magazines assist people in buying cars or cameras, computer magazines give reviews and analyses of what is best about the current set of groupware tools, Web servers, word processors or spreadsheets. For example, a recent issue of Byte magazine lists the relative advantages of Lotus Notes and the Web (Roberts 1996).

Finally, the software manufacturers and resellers will often provide information to allow decisions to be made. Lotus provide a "NotesSuite Evaluators Guide" for potential purchasers of their office software (Lotus Development Corporation 1996); others do likewise. This guide contains short paragraphs on the each of the components that make up their software suite, and then ten pages headed "Why should I buy NotesSuite?", listing its key features, advantages over comparable packages (Microsoft Office), solutions for businesses and developers, and so on. Appealing to the profit margins of companies directly, Lotus also provide a lengthy consultants' report on the Return on Investment (McCready et al. 1995) to be gained from buying Notes.

The occasional academic study describes this process going on. (Fanning and Raphael 1986) describe experiences in the mid-1980s at Hewlett-Packard with computer teleconferencing, and give a good description of the process by which they selected one system and the criteria they applied to come to that decision. They list several systems with the reasons why they might have been appropriate, and why not. Interestingly, the selection was made by a team of seven people, "ranged from technician to lab manager", so perhaps the suggestion above that this is a task purely for the IS manager is an unfair one.

Indeed, the composition of the purchasing team is much of the concern of Green et al. (1991), who discuss the purchase of a library automation system by a team coming from all levels of staff. They explicitly talk about "systems evaluation techniques" - a combination of demonstrations of available systems by their suppliers, and visits by the team to other libraries where the systems were used (mainly the former). The process of evaluation for purchase "did open up systems selection and development issues to basic-grade women library staff" (p.41), without the usual intermediaries, and leading to a much stronger focus on staff and borrower needs than technical requirements (than might have been the case had the purchasing decisions been by managers). The issues of gender and empowerment that they discuss seem to have had a direct impact on the quality of the decisions made.

Conclusions

Typologies are dangerous things, especially so if they attempt to classify a wide range of work or people. They may appear to be prescriptive rather than descriptive; they may cause offence by placing people into categories they don't feel are appropriate; and they may restrict debate and action by over-defining an area. It is therefore vital for one who inflicts a new typology upon an already highly classified world to explain clearly its purpose.

I therefore restate my earlier remark: the five types in this paper are not intended to be taken as a set of closed, complete categories - I recognise that they overlap and may well be incomplete.

I have presented them for two reasons: self-understanding and to make a case about the nature of evaluation. First, it seems to me most useful for an evaluator to understand what kind of evaluation they are performing. Types 3 and 4 are rather distinct from types 1 and 2, though they may appear to have similarities. The kind of evaluation conducted as part of PhD research is qualitatively different from that conducted in commercial software development, and it is well to recognise this.

A second reason for my typology is my increasing awareness of the breadth of different activity that might properly be conducted under the title of evaluation. It has been a surprise to me to realise that type 3 (conceptual) is an independent activity, as has been clear when those in my department developing systems for their PhD have asked "how should I evaluate this?". Again, while I have for some time been in favour of socio-technical evaluation, it has been a surprise to me to realise that a purely social focus is still a valid form of CSCW evaluation. And finally, it has surprised me to realise that when an organisation examines which computer systems to buy, its activity then is also evaluation.

My argument then is for the recognition of those three kinds of evaluation, along with the more traditional effects and formative forms. Doing so will not lessen the complexity of the process of CSCW evaluation - many different issues and stakeholder perspectives still need to be taken into account - but perhaps it may make one aspect of that complexity somewhat clearer.

Acknowledgements

This paper has benefited from discussions with and comments by Fredrik Ljungberg and Fides Matzdorf. The work was funded by the EPSRC and Digital Equipment Corporation.

References

Kenneth Abbott and Sunil Sarin (1994). Experiences with Workflow Management: Issues for the Next Generation. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW '94).

Liam Bannon (1993). Use, Design, and Evaluation: Steps towards an integration. Shaerding CSCW Workshop.

John Bowers, Graham Button and Wes Sharrock (1995). Workflow from Within and Without: Technology and Cooperative Work on the Print Industry Shopfloor. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work (ECSCW 95), pp. 51-66.

Andrew Clement (1990). Cooperative Support for Computer Work: A Social Perspective on the Empowering of End Users. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW'90), pp. 223-236.

J. Cullen, J. Kelleher and E. Stern (1993). Evaluation in DELTA. Journal of Computer Assisted Learning, 9: 115-126.

Tony Fanning and Bert Raphael (1986). Computer Teleconferencing: Experience at Hewlett-Packard. In Irene Greif (Ed.), Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 86), pp. 291-306.

Eileen Green, Jenny Owen and Den Pain (1991). Office Systems Development and Gender: Implications for Computer-Supported Cooperative Work. Proceedings of the Second European Conference on Computer-Supported Cooperative Work (ECSCW 91), pp. 33-48.

Jonathan Grudin (1988). Why CSCW Applications Fail: Problems in the Design and Evaluation of Organisational Interfaces. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW '88) .

Jonathan Grudin (1993). Interface. Communications of the ACM, 36 (4): 112-119.

Ken Guy, Paul Quintas, Michael Hobday, Luke Georghiou, Hugh Cameron and Tim Ray (1991). Evaluation of the Alvey Programme for Advanced Information Technology. London: HMSO.

Richard Harper (1992). Looking at Ourselves: An Examination of the Social Organisation of Two Research Laboratories. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 92).

Richard Harper, John Hughes and Dan Shapiro (1991). Harmonious Working and CSCW: Computer technology and air traffic control. In John Bowers and Steve Benford (Ed.), Studies in Computer Supported Cooperative Work: Theory, Practice and Design, pp. 225-234. Amsterdam: North Holland.

Kari Kuutti (1995). Debates in IS and CSCW Research: Anticipating System Design for Post-Fordist Work. In Wanda Orlikowski et al. (eds.), Information Technology and Changes in Organizational Work, pp. 177-196. London: Chapman & Hall.

Morten Kyng (1991). Designing for cooperation: cooperating in design. Communications of the ACM, 34 (12): 65-73.

Karen Legge (1984). Evaluating planned organizational change. London: Academic Press.

Lotus Development Corporation (1996). NotesSuite Evaluators Guide. http://www.lotus.com/notesuit/evlgd.htm.

Scott McCready, Ann Palermo, Gerry Murray and Darby Johnson (1995). Lotus Notes: Agent of Change; The Financial Impact of Lotus Notes on Business. http://www.lotus.com/ntsdoc96/roi.htm.

Jakob Nielsen (1993). Usability Engineering. London: Academic Press.

Wanda Orlikowski and Debra Gash (1994). Technological Frames: Making Sense of Information Technology in Organizations. ACM Transactions on Information Systems, 12 (2): 174-207.

Lydia Plowman, Yvonne Rogers and Magnus Ramage (1995). What are workplace studies for? Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work (ECSCW 95).

Magnus Ramage (1994). Engineering a smooth flow? A study of workflow software and its connections with business process reengineering. MSc Dissertation, University of Sussex, Brighton, England.

Magnus Ramage and Fides Matzdorf (1996). Cui Bono? A stakeholder approach to CSCW evaluation. Unpublished manuscript. Available from author.

Bill Roberts. (1996). Groupwar Strategies: Six key technologies will tell you if you need Notes OR the Web or Notes AND the Web. Byte, July 1996.

Yvonne Rogers (1994). Exploring obstacles: integrating CSCW in evolving organisations. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 94).

Susi Ross, Magnus Ramage and Yvonne Rogers (1995). PETRA: Participatory Evaluation Through Redesign And Analysis. Interacting With Computers, 7 (4): 335-360.

John Sherry (1995). Cooperation and Power. Proceedings of the Fourth European Conference on Computer-Supported Cooperative Work (ECSCW 95), pp. 67-82.

Susan Leigh Star and Karen Ruhleder (1994). Steps towards an ecology of infrastructure: complex problems in design and access for large-scale collaborative systems. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 94).

Veronica Symons and Geoff Walsham (1988). The evaluation of information systems: a critique. Journal of Applied Systems Analysis 15:119-132.

Eric Trist (1992). Introduction to Volume II. In Eric Trist, Hugh Murray and Beulah Trist (Ed.), The Social Engagement of Social Science: A Tavistock Anthology - Volume II: The Socio-Technical Perspective, pp. 36-60. Philadelphia: University of Pennsylvania Press.

Michael Twidale, David Randall and Richard Bentley (1994). Situated evaluation for cooperative systems. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 94).

Dadong Wan and Philip Johnson (1994). Computer Supported Collaborative Learning Using CLARE: The Approach and Experimental Findings. Proceedings of the Conference on Computer-Supported Cooperative Work (CSCW 94).