Web Collaboration (Archived)

Group Collaboration
- Business Collaboration
- Business Collaboration talk
- Abstract/Contents
1 Introduction
2 Review
3 Web Collaboration Software
4 Design > Test
5 Conclusion
6 Glossary
7 Appendix
8 References

This is an archived version, please use the menu on the left, or jump to the new version: 3.00 Web content management

(2 Review) Previous <=+=> Next (4 Design > Test)

This research concentrates on giving people the ability to collaborate on documents to allow group creation and editing, specifically co-operative authoring of documents. One of the driving forces behind using the Web is to use a technology that is currently widely available and accessible by anyone, anywhere at anytime.

3.1 Co-operative authoring and Collaboration

To understand why this research has been called "Co-operative Authoring and Collaboration over the World Wide Web", one must look at the definitions of the words (in particular the group, "Co-operative Authoring and Collaboration"). Although at first sight the phrase "Co-operative Authoring" and the word "Collaboration" may seem to mean the same thing, they in fact mean different things, and they are also different in terms of the scope of the definition. The following definitions in italics are from the Merriam-Webster dictionary [Merriam Webster]. When the word is defined in terms of a similar word, that similar word has also been defined.

Co-operative: a) marked by co-operation <co-operative efforts>. b) marked by a willingness and ability to work with others <co-operative neighbours>

Co-operation: the action of co-operating: common effort

Co-operate: to act or work with another or others: act together

Authoring: to be the author of

Author: a) one that originates or creates. b) the writer of a literary work (as a book)

Collaboration: to work jointly with others or together especially in an intellectual endeavour

By stringing the definitions for the terms together, it is easy to see that the dictionary definition for "Co-operative Authoring" is "marked by a willingness and ability to work with others to be a writer (or creator) of a literary work". In terms of this research the literary work is a web page, but using the YEdit engine, it could be any kind of document.

Collaboration is easier, as it is not defined using similar terms. Collaboration means "to work jointly with others or together especially in an intellectual endeavour".

As can be seen by these definitions, Co-operative Authoring is group creation of a literary work (which in terms of this research means a single, or group of web pages), whereas Collaboration means to work with others, which covers a much wider group of activities, which may (and will in this research) include Co-operative Authoring, but also covers a diverse set of other collaboration activities.

Both the terms co-operative authoring and collaboration are required to describe the content of this thesis, because although the main focus is on the co-operative authoring of individual documents, there is also the aspect of collaborating with others to create the whole site (or portion thereof).

In terms of this research, co-operative authoring and collaboration are achieved by using the Web as the interface between the user and the system. This is achieved initially by using the ability of web pages to display forms for the input of data. These forms are used to display the current form of the web page in pure HTML, and they allow the user to edit this and then to submit the changes to the web server. Although this method of editing the raw HTML makes it more difficult for the casual editor, those that know HTML should be able to do so with little trouble. Although it could have been possible to create a subset of HTML to use, and format it appropriately, one of the ideas behind this thesis is to allow anyone to edit the pages, as full web pages. This means that there are no artificial limits placed on the content of the web pages (other than for security, by removing potentially malicious code). No artificial limits means that anything that is available for insertion on web pages is available for use in this system, rather than limiting what can be included to any particular subset or variant of HTML. In the future other methods of editing the web pages will be available, such as using many web browsers' in-built editor. This would be supported using the "PUT" operation that is available in HTTP (HyperText Transfer Protocol), which allows uploading of contents to a web server. Other options include dedicated web page editors, other methods like WebDAV, and other non-web access methods.

One of the reasons why this research should look at co-operative authoring and collaboration is that it allows the Web to be used as more than an information distribution medium. It allows for a free flowing web of information, in all directions, person to person (through the web site), person to web site operator, etc, rather than the normal web site creator to web surfer. This should enhance the way the Web is currently used and allow greater use of the Web for collaboration. It also has advantages that allow for more than just use over the Web, and so should be able to be used for any co-operative authoring and collaboration.

3.2 Co-operative authoring and the Web

Since the Web was created in 1989, it has been growing at an astounding rate (the beginning of the trend is detailed in the Post Web creation history (1989-1995) timeline displayed earlier in section 2.1). It is becoming (or already has become) an essential part of many people's lives as a lot of people either have access to the Internet, or will have access in the near future through a variety of devices (such as computers, TVs, cell phones, etc). It is important to use a technology such as the Web that allows a large portion of the potential users to be able to access it, and to communicate with one another. Communication between people is essential for collaboration between them to take place.

A lot of people assume that the Web is all there is to the Internet, because the Web gets so much attention these days. In reality, the Web is just one of the more visible applications that run over the Internet, along with a couple of others such as Email.

Technically the application that is being proposed will run over the Internet, rather than just as an application on the Web. The reason that this research is starting out with the option to create it as an application on the Web is because by using the Web the majority of people will have quick and easy access to it. There is the potential to have more powerful versions that will work beside the Web version and support more advanced features. However they will require some effort on the user's part, whereas the Web version will be wholly server-side because it means that there is practically no installation or set-up required by the user, which will allow it to be readily available to a large number of people. However, the proposed system is not inherently Web-oriented. Later versions, running a more specialised protocol over the Internet could be more powerful, though at the cost of client-side complexity because they will require that the user installs an application on their computer. The reason that it will be more complex is because the application will need to be installed on all the computers that they use to access the system, and they may not be comfortable (or allowed to) install new software on those computers.

3.2.1 The future of the Web

The original vision of the Web was for a much more interactive medium than we currently have. Currently there is a reasonably good system for one-way communication over the Web, and limited two-way (usually limited to email responses or specific local responses). This research is looking at what can be done to enhance the current lack of widely usable co-operative authoring available through the Web.

One direction that could be taken would be to follow current tends, and extrapolate from there. This would involve using the technology that we have at present, and only going from the point that we are currently at with regard to the possible future for the Web, without looking at what has and has not worked in the past. Thus we would not be looking back at the past to see what has and hasn't worked previously.

"Those who cannot remember the past, are condemned to repeat it." - George Santayana [Santayana 1905].

If we use the current technologies, but at the same time look back to the past to see what has and has not been done, we are much more likely to avoid repeating the past that we wish to avoid, and more likely to repeat successes that have happened in the past. This should be able to be achieved by using current technologies, such as web browsers and editors, and following the trends that made the Web successful in the first place, such as integration of existing data, open or public standards, being cross platform, etc. We could take the current state of the art, both of web browsers/editors and other document-editing applications, and make sure that we take into consideration the original intentions and ideals, for example the collaboration and ease of access intentions, while adding the best of new ideas. By using and pushing the limits of our current technology, while we make certain that everyone can use it, we should be able to lead the Web to a long and fruitful future that surpasses our current expectations.

How the Web became successful

To take a look at how the Web became the success that it is today, one needs to look back at the traits that the Web had at the time, to see what possibilities worked. By looking back at how it became successful, there are some traits that would be useful to follow for any future endeavours, such as this research.

The success of the Web has come from some basic traits that have allowed it to expand at a tremendous rate, such as:

* Having a core initial user group (at CERN)

* Integration of existing data such as diverse databases and systems that required specific applications to access.

* Using standards that were either already in use, or that were easily used (both simple and public, or open)

* Being cross platform

* Working across organisations

Success on the Web today

Just looking back at what made the Web successful to begin with is not enough. To be successful with any new endeavour, one must also know what is successful today. This allows the original reasons for success to be combined with the current ways that the Web succeeds in order to emulate those successes.

The main area that is a resounding success on the Web is the dissemination of information. One of the main problems with this is that the sheer amount of information that is available can at times be a problem. There were 21,166,912 web hosts when Netcraft conducted a web server survey in September 2000 [Netcraft 2000]. This can present a problem to people who are searching for information, as it can sometimes be difficult to locate the information that you require, even when it is out there.

Other areas that are quite successful, in a well designed web site, include:

* Using the Web for general reference material (provided that appropriate precautions are taken for screening out inappropriate material, such as X X X or sex sites, from minors)

* Support mechanisms for collaborative design of other things (e.g. many Open Source [Comerford 1999], [Hecker 1999] programs. Open Source is a software development model that harnesses the many programmers spread throughout the Internet, rather than the traditional model of harnessing a few programmers in a specific location. See http://www.opensource.org/ for more information).

* User support (The ability for users to search for and find information that will help them with any queries or problems)

* Sharing information that you possess

* Allowing universal readership (although it is a pity that many webmasters don't take note of this one. e.g. a site that requires graphics, or scripting)

* Format negotiation - the ability to receive information in the most appropriate format (such as your native language, provided the web site has a translation available).

* Searching, provided that you know what you are searching for, and how to express it in a non-ambiguous way (not always as easy as it sounds)

Some common misconceptions about collaboration and conferencing

There are some common misconceptions about what makes an application a group collaboration, or conferencing tool. Common misconceptions about these tools are listed below. As can bee seen by some of the misconceptions, it may not be clear to people just what a group collaboration tool is, and what a conferencing tool is, and they do have some overlap, which can add to the confusion. The reason for mentioning conferencing is because some people are not sure if there are any differences between conferencing, and collaboration. Collaboration can include conferencing as part of the collaboration process, just as co-operative authoring can be part of the collaboration process, but conferencing and co-operative authoring are different. Conferencing is more for getting a group of people together at the same time to discuss, whether it is a conference in person, teleconferenced, video-conferenced, or network-conferenced (with text only, sound only, or with video and sound, and possibly shared screens and the like). Co-operative authoring on the other hand is creating documents by a group of people that are not constrained by time or place, and can work on the documents with little change to their normal methods of writing.

Some common misconceptions about web collaboration and conferencing are:

* that they only refer to threaded discussions on web pages, or real time white-boards

* that they only entail group discussions

* that they are not real time

* that they are real time

* that they must use a web browser (even if it is not the natural way of communicating that information)

* that they must be text messages

* that they must be full multi-media

Original and current Web collaboration

If you take a very weak definition of collaboration (in that most of the information flow is one way, with just comments and the like coming back) then the Web is already a very successful collaborative environment. If you take a view that is more in the spirit of W3 with a slightly tighter perspective of collaboration, then the Web has some way to go from its current domain of information publication, to a free flowing web of information.

One of the main concepts in the creation of the Web was collaboration [Berners-Lee 1990]. As detailed above, the Web has had major growth, but opportunities for collaboration have not been seen to be utilised. If groups do exist that use the full potential of the Web, then they seem to be in isolated pockets with their focus inwards, such that those that use it know about it, but no one else knows. If this is the case, then these groups do not seem to be providing much to enhance the potential for the Web for everyone else. The Web does allow collaboration that was not possible before, but much more can be achieved. Previous attempts at collaboration (except maybe for Berners-Lee's original ideas, such as the original 'WorldWideWeb' browser/editor for NeXT [Berners-Lee WtW, Chap 4]) have tended to use collaboration that is already available in some current easy form. The main two at present are real time chat and video calls, that are both based on the concept of the original 'Talk', IRC [RFC2810-3] and MUD applications/protocols, protocols that are still widely used, and on converting Usenet/Mailing lists to web form. When set up correctly, the Web does a good job of archiving Usenet and Mailing lists. In the current form, using the Web for access to Usenet or Mailing lists is very "clunky", and difficult compared to using email and Usenet in the first place. One of the reasons why the Web is not as quick and easy to use as the normal programs is because every command must be returned to the web server to have any effect. This has a great effect on slow connections like dial-up, and still has an effect on fast connections. Because all the information is stored only on the web server, it is not possible to download the new email/newsgroups and read them offline. Also you are limited to the actions that they have implemented for the web interface, rather than the variety of options that are available through the range of email and Usenet programs. When used for comments on articles, the Web does better, but can still be improved on. An example of a web site that allows comments on articles that are posted Slashdot (http://slashdot.org/). Email and Usenet should be used the way they were intended and then the results can be archived to the Web for storage and searching. This gives the advantages of both, without many of the disadvantages of either. These points also affect the YEdit engine, which is one of the reasons that Web user interface will not be the only user interface.

A new protocol called WebDAV [RFC2518] (an open distributed authoring and versioning protocol) supporting HTTP [RFC2616] extensions for distributed authoring will help collaboration efforts over the Web, as it adds extra abilities to the HTTP protocol that will help with collaboration, versioning and related actions. This new protocol will fit in with this research well, as this research can be extended to work with the new protocol, and that will ease actions like locking, versioning and transfers of files.

Standards should be employed wherever possible, unless there is a compelling reason to use something else, as they are generally easier to use than methods that just seek to replicate the standards. Also by utilising standards, compatibility and interoperability are enhanced. Standards such as HTTP access will allow any standards-compatible web browser/editor to access, author and publish information over the Web. In the future more advanced methods will be used, such as the WebDAV protocol, and/or other standard protocols for non-web access.

Areas that the Web has not been as successful

Today, if you look at the success of the Web, it would be easy to assume that the Web is a complete success. If on the other hand, you take a more in-depth look at the original vision for the Web, part of which involves collaboration, a different picture emerges.

The collaborative features contained in the original intentions for the Web form part of the basis of the idea of a web through which information flows freely in all directions. This allows for information to flow not only from web sites to users, but also from web site to web site, user to web site, and user to user (through the web site). This is achieved by allowing the web site itself to be the medium for the communication of ideas. This is quite different from the main information distribution model that is currently seen on the Web, that of the web site being controlled by one person or group, with them providing all the content, other than limited areas such as web boards, comments and such.

The collaborative features of the Web have not been as great a success as other aspects of the Web. If you don't look back in history, to both the original design goals for the Web, and the other visionaries that envisioned a future with free and easy access to information in a easy and connected manner, it would be very easy to get a slanted view of whether the Web has lived up to its full potential. There has been some work in the collaboration area with CSCW and BSCW, although these focus on the processes that surround the creation of a document, rather than the creation of the document itself. So for example, they have the ability to track external notes, dates, agendas, but the document itself is generally in some other format and is processed in a file-sharing manner, similar to FTP, file shares, or email transfer.

The free flowing Web of Information

The goals of implementing this research (Co-operative authoring and collaboration) are to provide options for co-operative authoring and collaboration over the Web that will enhance the potential of the Web. This will allow much more information sharing than is currently easily available, and would allow enhancement of the Web in the areas that it is currently not fulfilling its potential. This collaboration would include the documents that are being created, along with the collaboration surrounding that work. As the web pages could have the ability to be edited by anyone (or a set group), a larger number of people would be able to provide information, especially those that may just have a few words to contribute, which would not be worth setting up a full web site for. This would also allow people to work in the environment where they feel most comfortable, and would allow them to be more productive. This environment might not be a computer, as we currently know it.

3.3 Collaboration today

Currently there are many different forms of collaboration and co-operative authoring, and all of the computer-based forms of collaboration have a basis in manual pen/pencil and paper-based collaboration. There are some products such as Microsoft Word 2000 that provide some web integration within the word processor, the problem with this though is that most of the useful features are limited to working with their own products. I have yet to see any that go beyond simple group collaboration (with the occasional ability to publish limited sets of information automatically), and make it possible to work fully on the Web, without requiring specific proprietary programs installed on the computers accessing the information.

Currently there are two common methods for creation of a group document (word processor file, source code, etc). Other methods of group document creation will be looked at shortly. The first common method is to write a portion of the document and pass it along by email (this requires everyone to have the email addresses of everyone else). Each person adds to the document when they have it, or they make changes to an original document and then an editor combines all the changes. The second common method is to use a shared directory where everyone in the authoring group can access to edit the document. This requires everyone who is to collaborate on the document to use the same program to edit the document, and requires them to turn on the change tracking mechanism.

One problem with this approach is that each of the applications used to edit the document must be capable of reading and writing the file formats produced by all of the other applications. Even if all the authors use the same editing application, different versions of the same program can cause incompatibility problems. Often problems such as these only show up after continued use.

3.3.1 Collaborative editing and version tracking packages

Some packages have version tracking mechanisms, but the user can be limited to using one program, (sometimes others may be compatible, but unless they are completely compatible, problems may not show up until it is too late to solve them). Generally different versions of the same program will have compatible versioning information, for example Microsoft Word 6 and 97.

For work in an environment that enforces the use of only a particular software package, then it may be possible to just use the in-built abilities of the particular package.

On the other hand, when an organisation needs to collaborate with people outside of that organisation, or they allow their staff to use different applications for the same purpose, then there are more factors that need to be taken into consideration. The people that the organisation wants to collaborate with may not have a package that supports collaboration in the same way as their package, and even if it does, there can be problems when information is saved in different incompatible formats. This could cause delays in the collaboration and cause major problems if there are time sensitive issues associated with the collaboration.

3.3.2 Version tracking information

It is tempting when dealing with the problem of what to do when people need to work with others who work in places using different packages, or work with others outside the organisation, to avoid any question of version tracking. This is achieved by keeping very little in the way of version information; maybe a copy each time a change is made, assuming of course that there are backup copies available of the latest version at all times. Often this may be enough information, but in a complex document someone may need to know who changed what and when, or what changes a particular person made, or other specific change information.

If all that is needed in a package is the ability to save a version occasionally as a backup (for example, a copy of the document as at the end of each week), then any package will work much the same as any other. Unfortunately this will result in the loss of information about who changed what, although when the change was made can be narrowed down by the length of time between the backups.

Simple backups may not be enough if the document being created is large, complex or involves many people.

3.3.3 Currently available options

For documents such as source code, commonly used programs have a 'check in and out' facility to allow for versioning. This stores each change in the file along with information on when it was made, and by whom. This works well for those who understand the complex issues and programs that they have to drive. Some examples of these programs are CVS [CVS], [Fogel 2000], PRCS [PRCS], RCS [RCS], and SourceSafe [SourceSafe]. These can be quite powerful and complex programs to use to control source code and the user interfaces are improving all the time, making it easier for people to use them.

CVS (Concurrent Versions System) [CVS], [Fogel 2000] is an example of a very commonly used source code revision-tracking tool. CVS is a "Source control" or "Revision control" tool to keep track of changes to source code files made by multiple developers. This allows multiple developers to work on the same source code at the same time. CVS keeps a history of all the changes made to a set of files in one central location, so that multiple developers can check in and out the code that they are working on. Another ability is to be able to branch versions, so that there can be multiple branches for different objectives, for example one branch for vendor code, one for the current version's bug fixes, and one for the next version. Because all the history of the files is kept, any changes that cause problems can be compared against an earlier version that may not have had those problems. CVS also has the ability to automatically merge files that have been changed by multiple people at the same time, as long as the changes have not been made to the same location in the same file. If changes have been made in the same location in a file, that file needs to be manually merged. CVS is also available on a variety of computing platforms, and can be used in a client/server mode to allow developers to access it from any location.

There are other version tracking tools. Another of the tools is RCS (Revision Control System) [RCS], which is used by CVS to do all of its underlying work, as RCS is for tracking individual files, rather than multiple files like CVS. PRCS (Project Revision Control System) [PRCS] is a similar revision control system to CVS but it lacks multi-platform support, and currently does not support a distributed client/server model. Another version control system is SourceSafe [SourceSafe], which is primarily aimed at only the Windows platform, although there are a couple of other companies that provide compatible programs for other platforms.

Each of the major word processing suites has some version tracking included that allows some group collaboration. The version tracking that is included in these suites is generally simple tracking that tracks the changes that have been made in a single document, and stores who made the change and when. This tracking ability is limited when compared with the abilities of the above group of source code revision systems. Examples of these word processing suites are Microsoft Office (Microsoft http://www.microsoft.com/office/), WordPerfect Office Suite (Corel http://www.officecommunity.com/), and StarOffice (Sun http://www.sun.com/products/staroffice/).

In the process of creating other approaches to collaboration some try to only replicate the features of an earlier technology (such as email or Usenet), without looking at the original inspiration and current implementations for that technology. In the process, they sometimes implement features that are not really needed, and don't implement the features that really are required. As an example, the web based BBS (web based bulletin board systems, not to be confused with the dial-in BBS's that were a lot more popular before the Internet took over) try to replicate Email (SMTP [RFC821 / STD 10] and POP3 [RFC1939 / STD53]) and Usenet (NNTP [RFC977/RFC1036]). Web based bulletin board systems typically provide web pages that allow you to add a comment to a web page. This can sometimes be simply added to the end of the page, or in more sophisticated systems, they can be threaded in a tree like structure. Often they are trying to achieve a discussion forum by trying to replicate Usenet on a web page. Usenet is the discussion forum that has been around since a discussion forum was needed on the early Internet. It provides a place where messages can be publicly posted and replied to, which has been in use for many years (since 1979). Email and Usenet model the conventional mail and notice board systems online, on the Internet. The web based BBS's try to constrain Email and Usenet to a web based interface. These web-based interfaces do not contain the features and benefits that many people come to expect from an Email or Usenet. For example the offline use, that is the ability to download the messages, and then read and reply to them at your leisure, which also cuts down on the cost, as many places have to pay for an Internet connection by the minute. The ease of use of a program that is dedicated to reading Usenet or Email can be a great advantage as it was designed to work with Usenet or Email, rather than just being an add on. Finally the speed of use is usually much faster because the programs usually have advanced features for reading and writing, and they are dedicated to these functions, things such as downloading messages to read, rather than having to get each one separately when you want to read it.

Following are some examples of how technology has impacted on the methods used for collaboration

Face to face meetings

Face to face meetings, both formal and informal, are an important part of life, whether it is a one-on-one, for example talking things over with a work mate, or a group meeting such as a presentation, a brainstorming session, or another type of meeting.

These are important for sharing information and communication about projects and the like. They are also important for collaboration, as it allows you to talk with others about both your ideas and theirs, and how the ideas interact with each other.

Video/Phone conferencing

Video and phone links allow you to talk with people in an office down the hall, or on the other side of the world.

This has the advantage that you are not constrained to only talking with people who are at your present location, but it has the disadvantage that you lose out on a lot of the non-visual clues (video conferencing helps, but is still limited).

Instant messaging / Internet chat (IRC, Talk, ICQ, etc)

Instant messaging and Internet chat (such as Talk, IRC, ICQ, etc) have the advantage over video and phone connections that they can be used to connect many people together instantly, at low cost, while they are using their computers. Some of these can store messages so that if the recipient is not at their computer at the time, they can pick up the message later. You can also be talking to a number of people at once, and not all of the conversations need to be accessible to the others that you are talking with. Although the ease of use is much higher than face-to-face meetings and video/phone based communications, especially for communicating with people who are not in the same location, there are some disadvantages over them. Not only do you lose the visual clues as to what the person is saying, you also lose the audio clues, the inflections in the voice.

Interactive Chat

Other options are interactive authoring of documents with whiteboard / chat / screen sharing tools. The major disadvantages of tools such as these are that all the people who are to work on a document must be coordinated to connect at the same time using the same software, and then they all work on the same document at the same time. This can be useful for demonstrations or for brainstorming, but it can be tricky to actually write content for the document in a session such as this, unless there are just a few people.

Email, Usenet, mailing list

Email and Usenet are similar, in that they are both intended for sending text messages from one person to another. Email is primarily for one to one (or a few), and Usenet is primarily for one (or many) to many. There is some middle ground, where there is a group of people who want to talk with each other, while not opening it up to everyone, and they commonly use either individual Usenet news servers, or mailing lists that contain the people that they want to talk with. These methods add to the ease of use of the Internet chat, but they are designed for conversations over time, not for real time chatting, although at times this type of communication can get very close to this. This has the advantage of not requiring everyone to be present when you communicate with them; they will get the information when they next check their email or news. This is useful for general communication, but has the same limitations as instant messaging and Internet chat, in relation to not having any visual or audio clues to work from.

Word processed documents marked up electronically or via printout

Often people will create a document in a word processor which can be set up so that people can see the alterations that others have made to a shared document, and then pass that electronic document on to someone else to work on. This can be done either using the inbuilt methods for tracking changes that some word processors have (such as using underline for additions, strikethrough for removals, and different colours for different authors), or it can be achieved by printing out the document and then marking it up manually. Even when marked up manually on a print out, this still has the advantage that the original electronic version can be easily edited. When changes are made electronically the editor can either accept or reject the changes to the document with much more ease, but it requires that all users use the same word processor. These are in stark contrast to the completely manual method discussed next that requires that the document be completely rewritten/re-typed for every change (as paper does not have the ability to easily change text, move it around, add text, or remove it).

Typewriter/Hand-written manual mark-up

Collaboration via typewritten or hand-written documents (in other words, no electronic version to re-edit) is very much in the past in most places today. If you look at old documents, especially if you can get your hands on documents that were in the process of being created, you will have the chance to see how collaboration on documents was achieved back in the days before computers were readily available. The document would be created and then passed on to the reviewers, collaborators, etc, and each of them would have a coloured pencil or pen. They would then edit the document, using the coloured marks to show who changed what and when they changed it. One of the major advantages that this has over (even current day collaboration) is that anyone could view the document, anyone could edit it, and it was easy to keep a record of those changes, a physical paper trail. The major disadvantage with this method was that each time changes were made, the document had to be rewritten, and that you had to have physical access to the document to make any changes. This meant that it could take a while to post the document to someone to edit, and then to post it back, and repeat.

This has had a major influence on my ideas for co-operative authoring and collaboration, and this is one of the reasons that the application is not limited to one front or back-end. The approach taken in this research is to use the good ideas that have been used in the past, such as the ideas from hand/type-written paper-based collaboration and editing, and to move along similar lines. It will be usable over the Web, locally, or using other access methods. This mirrors the flexibility of paper, in that it can be viewed by anyone, at any time, anywhere, but it will go beyond the limits of paper such as the limits of physical transfer (both of the information and the transport of that information), reproduction and will enhance the possibilities of use. It also demonstrates for those that may not realise, that co-operative authoring, editing, and collaboration are a part of our history, and by looking back at the systems that were used for paper, it is possible to use some of the properties of this medium to enhance the Web based version.

3.3.4 Web collaboration today

The major problem with the majority of these tools is that they are limited to people who share the same working environment, so people are forced to use software that they are not familiar with, and less productive with. Some of the tools only work interactively, (which works for meetings and demonstrations, but means that everyone has to be organised to a specific time) for any work to be done. There are some experiments underway with regard to co-operative authoring, such as WikiWiki, and web collaboration such as BSCW.

WikiWiki

WikiWiki [Cunningham] (http://c2.com/cgi-bin/wiki) is a web site created by W. Cunningham that allows anyone to edit the pages, with certain restrictions. This is the first Wiki; others have appeared, as people have set up their own Wiki clones. WikiWiki was written in Perl [Dominus 1998], and some of the newer Wiki clones have been written in other languages, but they all follow the same general guidelines. A couple of the general guidelines that the Wiki clones follow are that anyone can edit and add pages, that they use a simplified mark-up language, rather than the full HTML standard. As the Wiki clones evolve, these guidelines are subject to change as the majority of users of Wiki decide.

WikiWiki can be described in many ways, such as this quote from its home page (current October 2000) "(It is) ... a fun way of communicating asynchronously across the network", or as a set of web pages that are open and free for anyone to edit as they wish. Wiki is not real-time; therefore people have time to think before they follow up a web page, often days or weeks, so they have time to consider what they write. It was created for the discussion of People, Projects, and Patterns (a pattern is a recurring solution to a common problem in a given context and system of forces [Alexander 1977] [Alexander 1979]).

BSCW

Quote from BSCW's [BSCW] web page at http://bscw.gmd.de/

"BSCW (Basic Support for Cooperative Work) enables collaboration over the Web. BSCW is a 'shared workspace' system which supports document upload, event notification, group management and much more. To access a workspace you only need a standard Web browser."

BSCW supports CSCW (Computer Supported Cooperative Work) over the Web. [Bentley Horstmann Trevor 1997], [Appelt 1999] This increases the reach of CSCW, as it allows anyone with a web browser to access the information.

As the titles for both BSCW and CSCW imply, they provide support for co-operative work. When looked at beside the free flowing web of information that I am proposing, they are complementary, as the free flowing web of information is concerned with the co-operative creation and editing of documents, whereas BSCW and CSCW provide support for that co-operative creation and editing. The way that BSCW and CSCW work is that they are a support mechanism for co-operative work (which is what the "SCW" in their titles stands for). The support that they provide varies depending on the implementation, but they generally support the work around a document, rather than the work specifically on the document, often this is left to the tools (word processors and the like) that the members of the group have.

What is needed is a form of co-operative authoring and collaboration that anyone can use and access at anytime from anywhere using the system that they are most familiar with and productive in. The other advances such as WebDAV (which is discussed later in the next chapter), and BSCW are important, as they provide the support for areas that are not directly covered by the free flowing web of information, in that they specify the protocol to be used over the Web, and support surrounding the co-operative authoring respectively.

3.3.5 Lost Versions

The free flowing web of information can also solve the problems of old/new documents and of revisions (both intended and unintended) causing lost information

Thankfully the Web Consortium (http://www.w3.org/) has kept historical information about the development of the Web, and left some of the web pages as they were written, and they have refrained from updating them as new information has become available (which means that all the historical information is still there). This is important because one overlooked problem in updating documents and keeping them current is that you can unintentionally lose valuable and important information. This commonly happens when documents are rewritten to make them clearer (as the original emphasis can easily be lost), and seemingly unimportant information can be dropped, as it just seems to add clutter to the document. Old information may also be replaced in a document by new information, which can be very good for the immediate purposes of the document, but the information can be permanently lost if the document is its only repository.

All of the above, along with restructures (either company, web site, or other), can easily and quickly change documents, unless a full history of the document (not just changes) is kept somewhere. When this happens, a lot of good information can be lost until somebody else comes along and reinvents the same things. Often they do not realising that there was a good information base available in the past that would greatly ease any trouble they have reinventing it. Occasionally when this seemingly irrelevant old information is stumbled on by somebody who is interested in that particular area (especially when they are looking for the old, original documents, ideas and objectives), the results can be significant, once the information has been processed into a form that they want.

This brings to light another point that is important when it comes to old documents (not only web, but also other electronic forms, and even hardcopy).

Information may not continue to be available over time. A document that was written yesterday may be easily available, but then again it may not. There are many causes of information becoming unavailable, and this can happen at any time, potentially even immediately after the information has become available. For example a document may be created, the only copy (in other words, no backup) might be stored on a floppy disk, or a web site, and then the floppy disk becomes damaged, or the web site crashes, losing the information. There are many other potential ways for information to be lost, even within hours of it being created or made available, and a couple of the longer term access problems will be discussed next. That is the information is still there, but there is no way to access that information.

With the current rate of change in the computer industry, and the associated changes in storage formats, documents that we now rely on may not remain accessible. As an example, if you were to try to access information that was stored on a 8-inch CP/M floppy, or tried to read an old word-processing format, you may find that they are no longer supported, and you may not be able to retrieve the information.

Of course incompatibility between old document formats and new applications is not the only side to the continuity-of-information coin. Another problem that occurs is when a document has been created in the latest version of a program; a previous version may not be able to read the file. This happens when people with an older or reasonably current word processor/web browser try to read something that was created in the latest version, and saved with no thought about those with current or older software/hardware, or even those that don't have everything enabled by default. The best case is that the person (who in the case of web pages is surfing the Web) that is trying to access the information realises that something is wrong, and they try to do something about it. If it is too much trouble, they may not bother (which occurs all too often), or even worse, they may see nothing, and so assume that there is nothing there, and they never come back.

One example of this problem is when a someone using a particular version of a word processor, suddenly finds that they can not read a document, because one of the members upgraded their version of the word processor to a later version. A second example is when you visit a web site of a company, and all you get is a blank page (because they assume that you have images and/or JavaScript/VBScript turned on). Some people have images turned off because they take a long time to download over a modem. Also some people do not have JavaScript/VBScript turned on, as it increases the complexity of rendering the web page in the browser and has the possibility of leading to some instability.

This demonstrates the need to keep old versions of documents, even versions that seem at the time to have no significance. It shows the importance of keeping at least all of the major revisions of documents, in addition to at least the most recent set of changes so that any changes that have unforeseen consequences can be rolled back and any problems solved with ease.

3.4 Types of Web publishing

Creating the web sites as multi-tiered, that is, having different areas (that may be located in different places) that support the main activity of the user, should ease the use and stability of the web sites that implement the free flowing web of information. Showing the full co-operative authoring interface to all users is probably not ideal, as it would increase the complexity for both first time users, and those that only want to read the web site. The advantage of a multi-tier web site is that it can combine different types of web pages in a layered fashion. That is, the pages can be dynamically created, then stored as static pages, which allows people to access either set of pages, depending on the functionality that they want. For example a web site that allows pages to be edited could have the dynamic site available for both read and write access, while keeping an up to date set of static web pages, which most people access for reading only. This allows people read access at all times, even when the dynamic site is having problems. This is applicable to all web sites that use dynamic generation for web pages and can be very useful to ensure reliability. This was not included in earlier versions of the YEdit system, but is now included and can be used with little extra set up required. The differences between the different types of web page are discussed below.

Static web pages

There are several different types of web pages. The easiest and probably the most common is the static web page (a web page that is written and then served to the user as is, with no other information added to it), all the web server does is display it. Often these types of pages will be the basis for an interactive site, as static pages load faster, and because they don't have any changing information in them, you know how they will be displayed, as there is less to go wrong in them.

* Advantages of static web pages are that they are fast, reliable, simple and easy to create.

* Disadvantages of static web pages are that they can be time consuming to update, and it is easy to have broken links and associated problems because a web page or link was missed when they were updated.

Static web pages with Dynamic content

This type of web page is written as a static page, but has statements in the page that the web server will execute. Examples of this type of page include pages that call SSI (Server Side Include) functions, embedded Java Servlets, and a few others. These are commonly used to include dynamic information into an otherwise static web page. This can also include having the static web page as a template, and using the dynamic content to fill that template. There are many options and different ways of combining static web pages with dynamic content. There is a risk that if the dynamic content is not available when the web page is requested, that the page may not display properly, or at all.

* Advantages of static web pages with dynamic content are that they are much more flexible than a static web page, it is possible to update the content without changing the web page, and they are somewhat customisable to visitors.

* Disadvantages of static web pages with dynamic content are that some web pages may be slow to load, while some are fast. This variation in behaviour makes the web site seem inconsistent, so the web surfer does not know what to expect next. The dynamic content can potentially have quite complex interactions, which can cause problems for both the web master and the web surfer. Also static web pages with dynamic content are less flexible than a fully dynamic page.

Dynamic web pages

Dynamic web pages are fully generated on the web server commonly using the CGI (Common Gateway Interface). CGI is the main method of calling executable programs from the web server, and it allows you to call almost any type of executable. Often the executables are written in Perl (a script language), an executable, or a shell script (such as 'bash' on Unix or batch files on Windows) and most are located in a "/cgi-bin/" directory on the web site. There are exceptions to this; for example some CGI programs are run from individual directories, and some are called via other means, such as Java Servlets that are commonly located in the '/servlets/' directory. Of course they can all be embedded in a web page for use in a static web page with dynamic content, or located elsewhere on the server, provided that the server is set up to expect that. With dynamic web pages there is an even greater risk that pages will not display properly, or at all, if there are any problems with the programs retrieving the content, or the content itself.

* Advantages of dynamic web pages are that all information (including the web page itself) can be updated in real time, and the whole site can be custom-adjusted to visitors.

* Disadvantages of dynamic web pages are that they are slow (compared to static pages), and so require a more powerful web server. They are less reliable than a static web page, as there are more places that a problem can occur, and they are much more complex than static web pages (even ones with dynamic content).

Of course in the real world, there are many pages that straddle the boundaries between those definitions. One example is that a web site might have an index that is updated once a day, but is stored in a static file for the rest of the day, and could be served either straight from the static file, or through a dynamic page.

Multi-tiered web sites

A multi-tiered web site is one in which different parts of the site are located in different places. This can be as simple as having static pages on one site, and dynamic pages on another, through to having separate web servers each with a specific function. For example, there may be some to serve static pages, some to serve images, some to serve the dynamic pages, and some to store the dynamic information (commonly in a database).

One of the major advantages that a multi-tier web site has over one that combines everything is that any problems that occur should only effect one portion of the web site. A multi-tier web site commonly has the best advantages of each type of web page, while reducing the risk that a problem with any one part of the system will bring the whole lot crashing down.

The advantages of static pages are their speed and reliability, and the advantage of dynamic pages is their flexibility. By combining them using a multi-tier web site, the advantages of both can be combined, such as the reliability of static pages and the flexibility of dynamic pages. Also the disadvantages of each should be mitigated to some extent (that is, the lack of flexibly of static pages, and the speed and reliability of dynamic pages). Unlike sites that combine static web pages with dynamic content, the multi-tier web sites split the two by placing them on different servers or at different locations, so if one fails, the other can still keep serving web pages. They would not be quite as reliable as static pages, or quite as flexible as dynamic pages, but they would be a lot more useful. This view is similar to caching the output of the dynamic pages and then using that for all other requests for the page, until the dynamic page is updated, that then automatically updates the cached page. The difference from caching is simply that the dynamic page updates a static page (that need not be on the same server), and the web server then serves this static page (that was written by the dynamic page) as if it were a static page.

This splits the dynamic content pages from the static pages, and would allow all the static pages to be on one web server, which is available for anyone to read, while the dynamic web server is available for those who are updating the documents. This means that even if the web server, which controls the dynamic pages, crashes; the only ones that will know (or possibly notice) are the people who are co-authoring the documents. Even then, they can still read the documents. This means that the web site is much more robust from the general surfer's point of view, and also means that the web master of the site has less to worry about.

One of the advantages of a separate multi-tiered web site (which has static read-only copies of the web pages on one server, and same documents, but editable on another) is that it gives the advantages of a static web site (that is the speed, reliability and simplicity), with out the disadvantages, by having the static site automatically updated, as is required, from the interactive editable site.

Disadvantages of a separate multi-tiered web site are that it is a little more complex to set up, and for proper redundancy you need at least two servers, preferably in different physical locations, with different bandwidth providers. If a less robust system is acceptable, then it could be set up to work from the same location, or if really necessary, on the same web server. This would be much less robust, but would still give better robustness than no multi-tier arrangement.

3.5 The future of the Web

How to make co-operative authoring successful

Some ways of ensuring the success of co-operative authoring over the Web are to require that people can use the programs that they are currently used to using and to make everything as easy to use as is possible, as well as incorporating some of the methods that made the Internet and the Web as successful as they have been.

Some of the ways to achieve these goals are to use tools on the server for as much of the work as possible, because they allow the user to do their work wherever they happen to be, even if they have not got their favourite programs with them. Other ways to achieve the goals are to limit any requirement for client side tools to those that need little or no user intervention to work (such as Java applets in browsers). Also to take advantage of the programs that the user already has (such as web browsers), rather than requiring them to get more programs to clutter up their computer. They must be cross platform, Unix, Windows, and Mac, as the bare minimum. Tools that are already available should be used (don't reinvent the wheel). Examples of such tools are online chat (IRC, Talk, ICQ, etc), email, Usenet. Enhance them with new abilities, where they don't already exist, but don't replicate existing features, just to put them on the Web.

The future of the Web

Web sites of the future will be much more advanced than we are used to at the moment. As technology moves forward it will make it possible to do things that we can only dream about at the moment. One of the ways to enhance the Web is to bring about a resurgence of more of the original ideas that led to the creation of the Web in the first place. Ideas such as the free flowing web of information that allows co-operative authoring of documents, along with collaboration to work on those documents. The ideas of co-operative authoring have been around since even before the Web, and have been used to some extent, but the ideas that people have had in the past have not caught on, possible because other major things were happening at the time such as when Berners-Lee created the 'WorldWideWeb' browser/editor of NeXT.

By utilising ideas for new web sites with ideas for co-operative authoring and collaboration, we can combine them in new ways, with our current technology. In the future web sites will be as easy to create and to edit as conventional documents are to create and edit with word processors, and both types of information representation will be used as often as each other (and in conjunction with each other).

Web sites based on the multi-tier model above will allow webmasters to set up sites for people to use co-operative authoring and collaboration just as easily as they can create documents in word processors today. The collaboration will be available without added complexity and will allow a variety of clients to edit, update and view information on the site (with full version history, and security).

To achieve this, the web site will need to combine different technologies into one coherent bundle. The web sites will also need to be robust and well designed to keep the complexity of the process hidden from the user, and to present a simplified model of use. This will allow all web sites to incorporate the full potential of co-operatively authored web sites, while keeping the whole process simple enough so that anyone can use that potential. The site designers may not use all of the potential for every web site, but will be able to pick and choose the parts that are most appropriate for them.

A multi-tier web site such as this that has a dynamic back end (that may be on a different server, for reliability) that updates a static front end whenever a change is made on the back end. This combines the reliability and speed of a static site, with the flexibility and dynamism of a dynamic site, with the good points from each, while mitigating against the bad points. It gives the reliability and speed of a static web site because it is a static web site (that is updated periodically by the dynamic site), and the flexibility of a dynamic site, because the dynamic site can update the static site at any time (generally whenever something changes). So even if the dynamic site crashes, the static site will still be available and contain the information up to the point that the dynamic site crashed.

In the past few applications other than web browsers and editors could understand HTML, and therefore would not have been able to edit web sites (and in some cases, even though they can now read and write HTML, you would not want to use them to do so). Advances in applications and the protocols to access web servers are advancing such that many more applications are able to interact with web sites and support the standardised (W3) web features (in other words, no OS or browser specific HTML, that could break the display of the page on other systems), that may, or may not be a web browser as we currently recognise them. For advanced document creation with YEdit additional software or a reasonably recent web browser/editor (although this will still be platform-independent) may be required for extra features (although these features should at least be partially available for use in any web browser/editor). These extra features are only for advanced web page creation (in other words anyone can still view it easily).

All web sites that use the YEdit engine will have will have the ability to include full co-operative authoring and collaboration with no additional software required, on either end.

In addition to browser based, or advanced editing, the YEdit system will in the future allow other software to hook into the YEdit system, allowing the user to use the software that they are used to used to using. This will allow for features such as uploading and downloading of Word or WordPerfect files, for editing. Software should be the users choice; the server will convert to and from its internal representation as is needed. This means that the user will be able to use anything from EdLin (the infamous text file editor that was included with early versions of MS-DOS), through to the current web browsers and anything in the future to edit the documents with.

3.5.1 Increasing the success of the Web

There is a range of currently available technology (such as email and Usenet) that is being used for the purposes of communication, and quite often the same technology is reinvented again and again, sometimes using other techniques. Occasionally this brings out new abilities that were not available before; unfortunately, most of the time it only introduces added complexity to areas that can already be complex and difficult enough for the average user.

Letting people use the programs that they are used to and understand (rather than asking them to learn new programs often), will increase the ease with which people are able to use their computer to do what they want. It makes sense to let people use the programs that they understand because when they understand how to do things, they are much happier and productive (and a happy and productive user is MUCH less burden on a company). A user who is forced into upgrades, changes, and having to learn the software over and over again, when their current software works perfectly well for them, is often a burden on resources and support lines. On the other hand, if they are given a good reason to upgrade, they will, quite happily, with few problems; force them, and watch the support issues blossom.

3.6 One possible answer

The YEdit system (which a prototype has been created for, and is available for public use at http://www.YEdit.com/) for co-operative authoring and collaboration is based on the free flowing web of information that includes some of the original intentions for the Web; that is to use the Web for both reading and writing. YEdit will provide a possible answer to the problem of many people working on documents together that do not support easy co-operative authoring and collaboration, without extra work, such as remembering to email the latest version on to others.

In searching for the very information that I needed to work on this problem, I came up against one of the problems that I am trying to solve. That is, the problem that as information is revised, it is very easy to unintentionally lose important information. This problem is not so evident in print mediums, because once information is printed, it can be stored that way, and is very seldom rewritten and replaced. Unfortunately, on the Web, things are changing so quickly that information that is present one day may not be present the next. To keep pages updated authors often add, replace or remove information. Sometimes this is deliberate changing of information, but more often it is just that the information that was present, seems to the author to have less relevance now. The problem with this is that the very piece of information that one person may think is no longer relevant, and years out of date, may be the very information that someone is looking for, as I found in my search. Luckily W3C (The World Wide Web Consortium, http://www.w3.org/) did keep the old outdated pages, and some digging around their site revealed them, even if it took a little work.

While looking through these documents I realised that one of Berners-Lee's original visions for the web was along similar lines to what I was thinking would be a good direction for the Web to go. My idea was for an engine that could be used by almost any front end (be it web browser, word processor, web site, etc) that would allow interaction with full version tracking information. This would allow web sites (or any server or application) to keep the full revision history for documents (any kind of document, text, pictures, sound, etc). It will also allow any groups that have permission, to edit, update and change any available document.

The engine that I am creating works as an extension of the web server. It can also work standalone for applications that require co-operative authoring, utilising interfaces other than the Web. The engine abstracts both the user interface and the file system interface, allowing other methods of interaction to be plugged in and work seamlessly. This allows for the engine to be used through a normal web browser and via other connection methods at the same time, using the same base of files. It also works the same the other way around, (refer to Figure 4: Interaction of the main components of YEdit for a graphical overview) allowing any base of files that it has an interface for, to be used by any of the access methods. This flexibility means that the engine has a much wider range of abilities to support co-operative authoring than just the Web, or any particular file system. For example it could be used via a command line interface, direct into word processors, and store information in databases, version controlled repositories, etc. This will also allow it to be used where legacy systems exist, either as the front end, or back end, or both, allowing disparate legacy systems to interact, when under normal circumstances they would not be able to.

The engine promotes co-operative authoring and collaboration by providing an interface that is built specifically for co-operative authoring, with full version histories of all documents, by default.

The Web interface to YEdit can be used to read different versions of documents in the same way as you normally read documents over the Web. This means that anyone can access the documents without needing to learn how to run a new application. The Web interface also supports modes for browsing the documents with co-operative authoring in mind (that is, displaying information that is appropriate to the user about the document, for example the version number, last author, last date of change, and previous versions). This part of the interface also allows you to edit the document, and at present has a simple locking scheme to prevent changes being overwritten. Because all of the changes are kept in different versions, it is possible to keep track of each change, who changed it and when. This allows certainty about the history of the document. The process of either reading the documents in the system, or browsing them with a view to editing them, or looking for information about changes, has been implemented so that as many of the abilities as possible are implicit in the display. Therefore anyone who knows how to surf the Web should be able to surf though the documents stored in this system, and need not know the inner workings to retrieve the information that they are looking for. The reading part of the interface is simply presenting the documents as they would appear on a normal web server, and may in fact be served as normal static documents, under a multi-tier web site design. Even the browsing part of the interface is designed so that the user is as unaware of the complexities of the system as is possible. This is achieved by providing a simple menu above the document, which provides information about the document, such as the last author, version number (and view versions), and the option to edit the document. Only if the user decides to edit the document does any of the complexity of the system show itself, and that will decrease as user suggestions (such as a simpler form for editing a page) is incorporated into the design.

This engine will allow web sites to be fully interactive, while keeping the version information that is important to ensure that the information is correct, and will even allow the whole site to be rolled back to a previous known working configuration, should disaster strike.

Because the full version history is included, and the ability to go back and check the previous versions, it will be easier to trust the information that you receive. If you have a query, you can easily check who changed what and when, especially if the site uses some form of write once media to either store the information, or to back up the new information.

(2 Review) Previous

Next (4 Design > Test)