Chapter 12 of 12 · 4357 words · ~22 min read

Chapter 1

file, and save it and subsequent pages that I have prepared, to build up the whole book. After I have proofed the OCR result, I paste the finished text into a Microsoft Word document, setting the font at Courier New size 10. This sets the lines at the right length for Gutenberg. When I have finished the whole book in Word, I save it as text-with-line-breaks, to get the final text file, which I send to be posted on the Gutenberg site. I proof my work two or three times, depending on the quality of the OCR result, and do a final spelling check with MS Word. I don't ask other people to proof my texts, because Miss Yonge's idiosyncrasies are liable to get edited out, unless the proofer has the book to hand.

It took me 6 months to prepare my first text, The Heir of Redclyffe, but I can do 10 pages an hour now.

In my Gutenberg folder, I have other useful files for reference, mostly downloaded Gutenberg Instructions files. So if I need to find something out, I can look in these files--it is much easier than searching on the Internet. If I need to know something I can't find in these files, I may ask a question on the Volunteers WWW Board, although I try not to, because the answers are nearly always in the files.

I try to process 2 sheets of 16 octavo pages a day, taking about 3 or 4 hours. I do my housework & gardening in the morning, then settle down to an afternoon's happy Gutenberging :-).

WHY DO I GUTENBERG?

When I became semi-retired, I wanted to do some voluntary work on the Internet. Coincidentally I began reading the works of Charlotte M Yonge, and discovered that most of her works are out of print now. I felt that they deserved a much wider audience, so I decided that my voluntary job would be to do just that. Miss Yonge lived in a village only a couple of miles away from me, so I had a local interest, too. On my web page, http://www.menorot.com/cmyonge.htm, you will find out a little about her, and Otterbourne, the village she lived in all her life, and find links to other web sites about her.

I discovered the Charlotte M Yonge Fellowship http://www.cmyf.org.uk/ and am now in contact with other people who appreciate her work, including academics who write clever things about her. Her books are about families, their interactions with each other, and how they, in Christian terms, grow in grace. I don't think there is another writer who can write so well about families. She was a Tractarian, a Christian who, in the nineteenth century, believed that people could be influenced for good by what they read. For this reason, 20th century people found her characters too moralistic, and her prose too turgid. I think her novels are delightful, her characters lovable, and her prose is minutely descriptive. It was said about her that she was 'able to make goodness exciting'. This is a rare talent, perhaps only found in other Christian writers like John Bunyan or Charles Kingsley.

Through the Gutenberg site, Miss Yonge's works are more easily available than ever. She originally wrote for upper and middle class young women. Even though I live a century and a half later, I can recognise her characters in their 'descendants' who live around me, but I sometimes wonder what Chinese, African, or even modern American readers think of her, their own backgrounds so different from the English Victorians.

I enjoy making Gutenberg texts, the work is simple, once you know how to. I would prefer, however, to see them presented in HTML. The modern ebooks all need to be in HTML format to present nicely on their tiny pages. I believe Gutenberg is going to publish HTML files, I would like to learn how to do it. Eventually, I think Gutenberg files will be available in a format that will work on all PCs, handhelds, palms, and ebooks;--but I don't know what that format is yet, I don't think standards have even been worked out among the ebook publishers.

Finally, yes, I do find mistakes in my published texts. When I have finished all 200+ of Miss Yonge's books, I am going to go through them all for the second time, and remove the mistakes. So, my work is cut out for many years to come. . . .

Suzanne Shell

Over the past several years, I visited the Project Gutenberg website occasionally, looked at what was involved in making a significant contribution to the effort, and left after downloading a few books--PG was a project that would need to wait until I retired.

In the summer and fall of 2002, I was doing research on e-books (sources, devices, costs) for my library, and ran across Distributed Proofreaders. I discovered Blackmask.com at about this time, and also followed a link from there to Distributed Proofreaders. Serendipity! After backing away a few times, I took the plunge and registered on November 5, then began proofing. The however-many-pages-I-wanted-to-proof commitment was just right for letting me get a feel for the process, and to start me thinking of the ways I could exploit all this free labor to get the books _I_ wanted into PG.

I was feeling quite virtuous about proofing my 10-20 pages per day, when I visited the site on November 8, and NONE of the books I was working on were available. Also there was this perfectly absurd number listed for number of proofers having proofed at least one page (it had roughly quadrupled). I KNEW the site had been hacked. Actually the site had been slash dotted. The DP discussion forums were so active, it was hard to find time to read all the messages, questions, suggestions, and complaints; these rapidly led to new documentation and more detailed proofing guidelines. Books moved through the site so rapidly that they brought out the "hard stuff" from the bottom of the to-do stack, and were STILL desperate for content. I was a relative "veteran" after just a few days, and helped out a little by answering questions, but I was still a beginner. I had some PG dreams that DP could make reality, but I needed to learn the ropes first.

Some of my ambitions revolved around professional goals--there are some public domain titles, which, if available in electronic form, would be extremely useful to my library's patrons. There are also some standard reference books and indexes--Granger's Index to Poetry is one example--that have pre-1923 editions that could still be important resources. In order to learn what I needed to know about providing content, though, I decided to start with something less overwhelming (wanting to read it on my e-book reader was just a coincidence). I went to my bookshelves and pulled out my P. G. Wodehouse reprints. I downloaded and read the scanning and submitting FAQ from the DP site, requested and received clearance for the first book (_Uneasy Money_) in late December, and got to work mastering my scanner. I tried Omnipage Pro first, but decided that ABBYY Finereader Pro did a significantly better job of the OCR. I offered to be a "behind the scenes" manager for the book while it worked its way through the site, but was made an official "Project Manager" instead. Although the first frenzy following the slash dot invasion had calmed down, DP was still feeling a need for more content and more hands to manage projects.

On January 5, _Uneasy Money_ started proofing; it went through 2 rounds of proofing in less than 20 hours. I felt a like a hick marveling at a traffic light changing colors, but I sat at my PC and watched the page count go down. By this time, I had also scanned and OCR'd a couple more Wodehouse reprints and a short book of poetry. I was hooked! Juliet Sutherland and the other admins had recruited some experienced DP'ers to help train new post-processors in the job of preparing final PG texts. I was handed over to one of them. After several projects, I "graduated" and was given permission to upload my own projects. My intent was to do 3 or 4 projects a month, no more than I could handle post-processing by myself. I planned to process an occasional reference book in addition to all the Wodehouse I could get my hands on. So much for plans...

One ongoing concern of many Distributed Proofreaders was how to train new volunteers in the DP style of proofreading. (It is somewhat idiosyncratic because of the distributed nature of the process.) We were still coping with the aftereffects of the massive influx of slash dotters--quantity benefited, but quality suffered. Super7, one of the highest volume proofreaders, suggested setting aside a project without complex formatting for "Beginners" and asking that the second round proofers (all of whom should be veterans) send feedback and encouragement to the newcomers. This was tried successfully, and with a couple of variations. Since I had been planning to start running a variety of genre fiction through the site, I then volunteered to manage these as beginners' projects for as long as the supply held out. All of a sudden, starting in February 2003, the amount of time I needed to spend locating, scanning, OCR'ing and managing books increased drastically, and the amount of time I could devote to post-processing decreased. Luckily, "veterans" stepped in to answer newcomers' questions, and to serve as "Mentors" in the second round of proofing. Recently, others have provided "beginners' projects", to help keep up with the demand of a steadily increasing flow of new volunteers. These projects are also useful for helping new post-processors learn the job.

I still have some ambitious projects planned; Granger's _Index to Poetry_, the unabridged edition of _The Golden Bough_, Curtis' _The North American Indian_, and the _Book Review Digest_ (volumes for 1905-1921). A couple of volumes are already waiting to be proofed, others are waiting to be scanned on the PG tabloid scanner. But, in the meantime, there are 23 new Wodehouse books in PG thanks to Distributed Proofreaders, not to mention such remnants of early 20th century popular culture as _The Sheik_.

I believe that a major accomplishment of Distributed Proofreaders has been the creation of way to provide on-the-job training for PG volunteers. Steady improvement in the quantity and quality of training techniques and documentation, enhancements to the user-friendliness of the site, and ready access to the collective experience and advice of a wide range of volunteers in the Forums have resulted in a growing core of active and experienced volunteers in all the facets of e-book production. I'm sure that I could not have progressed from a total newbie to a regular PG contributor within a 5-month period without this support structure. Regular communication and collaboration with book-lovers from around the world has enriched my life. The fact that it is easier to get leave from my job than from DP, is perhaps beside the point...

Tony Adam

How did you learn about PG?

It's been so long, I don't really remember! I probably read about it on a library listserv (I'm a librarian), and since making old texts accessible has always been a concern of mine, I jumped right in.

What was your first contact like?

Great! Mike Hart has always been easy to deal with via e-mail, although we've never talked. He and the "crew du jour" directed me to the FAQ and I took it from there.

What was the first PG job you did? How did it go?

My first job might have been Henry James' _Turn of the Screw_ (I just found a note from September 1993 on copyright clearance for it). Since in a former incarnation I was editorial assistant for the _Henry James Review_, I thought that would be a good start. I've always typed the files (I'm a fast typist), and I think we had few problems along the way.

How did you develop your PG experience from there?

Helter-skelter, much like my reading habits. I work at a historically black university, so getting 19th C African-American works posted is a central concern. I've done _Clotelle_ (the first A-A American novel) and the autobiography of Henry O. Flipper, the West Point cadet, and I'm always looking for something new in that area. Somewhere along the way I got sidetracked into essays by Whittier and other U.S. poets, and I've collaborated on early American historical documents and Sir Walter Scott with a fellow PGer up in Ohio and Chinese documents with another contact in Japan. A couple of years ago, I saw that someone in San Francisco needed help with the Shakespeare Apocrypha, and that has occupied my time on and off since. It's always something!

Can you tell us about the first text you produced?

I think it was _The Turn of the Screw_, which was a good starting point--not too long, a good read, etc. Just plugging away at the text a few pages a day made the process go quickly.

Why do you spend your hours contributing to PG?

I love the idea of making all of this print knowledge available to anyone anywhere. Working in a library that has suffered budget problems over the years opened my eyes to the need for acquisition of as much free stuff as possible for our students and faculty. Besides, in a perverse way, it's fun!

Do you specialize in any particular kind of work? of texts?

I've probably focused more on plays, historical documents, and 19th C U.S. works than anything else.

What do you like about making a PG text?

Having a project come to fruition--finally seeing an almost forgotten text come to life again.

What do you dislike about making a PG text?

The work can be tedious at times, depending on the author. But sometimes you have to plow through to get something significant processed. For example, we probably should have more philosophers represented, but what a horrible thing it would be to scan Kant!

Where do you get your eligible books?

Mostly from my library's collection, although I finally purchased my own copy of the Shakespeare Apocrypha (it's very hard to find, which makes it very suitable for posting). I've interlibrary loaned some items, but that's also been unusual.

Do you type or scan? What Scanner / OCR / Editor / WP do you prefer?

I still type everything--it's easier when working with a play, I've discovered. But I'm purchasing a scanner in the very near future and will do more with that.

How do you check your text? Any special tools? spellchecker? Do you print it out and read it? Put it on your PDA and read it? Have a voice synthesis program read it aloud to you from your PC?

I usually run it through the spellchecker, although depending on the work, I read it line by line a second time.

Do you have any tips'n'tricks or special routines you go through when preparing a text?

The best thing to do is put yourself on a schedule--do a set amount of pages every day, and you'll be surprised how quickly you get to the end. I also make a pencil mark in the book at a stopping point and even read back a paragraph to double check what I last entered.

How long does it take you to make a text?

Depends on my work schedule, other assignments, time of year, etc. A play might take a couple of weeks, but a Walter Scott novel could take six months. I think my record is probably one day for an essay, but that's unusual.

Do you work alone, or do you share the work of each text? Does anyone regularly help you proof the text?

I've worked alone and on teams, depending on the text. No one regularly helps to proof the text, but occasionally someone else does.

Do you do some PG work regularly, or drift in and out as opportunity permits, or when you feel like it?

I consider myself a regular, as time permits. In other words, I haven't dropped out of the picture, but sometimes I might not enter anything for up to a month.

How many different kinds of work, or different books, have you done?

Not sure how many different books I've done, but it's been a wide variety: James' and Scott's novels, Whittier's essays, a whole collection of early American documents (mostly New Netherlands), Shakespeare (accepted canon and the apocryphal works), some odd works (_The Psychology of Beauty_ comes to mind)--the list goes on and on. I've even forgotten that I've done some titles!

What do you like about the PG process?

That it's open-ended--if I think I have something that should be posted, I don't have to jump through hoops and ladders to get permission (other than copyright clearance).

What do you dislike about the PG process?

Can't think of anything offhand.

Is there anything you'd like to see PG doing differently?

I know it's a bone of contention, but we probably need to explore moving away from ASCII.

If one of your friends approached you to ask advice about how to get started contributing to PG, what would you tell them?

Start with something fun, that's close to your heart, and keep plugging away a little bit at a time.

What do you expect Project Gutenberg to be like in 5 years? 10 years?

We'll probably be a whole lot bigger (texts and personnel), with a different look to the texts. Maybe we'll even have more audio versions of texts, using some of the new software that's coming out.

Tonya Allen

I discovered Project Gutenberg in about 1997. After several years of enjoying PG's texts, in June of 2002 I decided it was time to start contributing. Via the PG web site I learned that the easiest way to do this would be to help out with proofreading via Charles Franks' Distributed Proofreaders web site. The day I signed on I proofed nine whole pages of a children's book called _Curly and Floppy Twistytail_ and felt very proud to be contributing.

At that time, there were probably only about 40 active volunteers on the site each day. Often I proofed an entire book almost all by myself over the course of a week or so. Things moved at a leisurely pace; guidelines were few and simple; and I had fun reading old books and discovering new authors.

After a few months a request was made for volunteers to post-process texts in French. I volunteered to help with this, and that was how I became a post-processor (PPer). Shortly afterwards, the web page listing texts available for post-processing and sign-out was unveiled. I remember several times checking and being disappointed because there was nothing currently available (hard to imagine now when there are always at least 40 texts waiting).

One day in November, I picked out a likely-looking text from the proofing page, and settled down for an hour of reading. As I recall, it was _The Greek View of Life_, a sizeable text of which only a few pages had been proofed so far, and which I thought would last for several days at least. At about that time, someone emailed me to say that DP had been "/.ed." "What does that mean?" I replied. I soon found out.

I had been proofing away peacefully for awhile when suddenly instead of the next page, I got a page about twenty pages further on. The same thing happened again and again, and suddenly all the pages were gone; the whole text had been completed. DP had indeed been slashdotted.

Since then, a lot of amazing things have happened. The number of

## active volunteers per day has increased almost 1000%. The number of

texts that go through the site has increased exponentially. All kinds of proofing and processing tools have been developed. I now spend most of my time checking texts that others have PPed, and submitting them to PG, at an average rate of one to four per day--quite a leap from nine pages of _Curly and Floppy Twistytail_. And I'm looking forward to everything that lies ahead as DP continues to evolve.

Walter Debeuf

Quite by chance I became aware of PG when I was surfing and looking for interesting sites. I vaguely knew the name because I had heard of the Project a long time ago. After reading the "History and Philosophy of PG", I immediately became wildly enthusiastic about it. This was what I had been looking for for years, a meaningful use of my PC, and because I am a fervent lover of good literature, I didn't hesitate to contact the founders of the Project. I made a suggestion that I should work on French and Dutch e-texts. The very same day I received an answer from PG in which they told me they were very pleased with my contribution but that I had to keep in mind that all books must be free of copyright and published before 1923.

This wasn't so great. . . . After I browsed in the "Help And FAQ" of the PG site, I read that I didn't have to worry about all that, because they are willing to do all the clearance!

On my own bookshelf I found an old book of Jules Renard, "Poil de Carotte". It seemed old enough to me, but I couldn't find any copyright notations. So, I mailed to Mr Hart all the information I found on the title page and the verso, and asked him what he thought about it. The next day I received his answer, he wrote: "We still have to prove this edition was pre-1923, so I am forwarding to our authority on such copyright research." This authority is Ms. Dianne Bean who mailed me a few days later very pleasantly that I could start typing, because the copyright issues had been resolved. She asked me to send a "TP&V" (a photocopy of the title page and verso) of the book to Mr. Hart, because they need that for legal reasons.

But something wasn't very clear to me concerning the format I had to use. In the "FAQ" they spoke about "plain vanilla ASCII", something I never had heard about in my life! In "How to Volunteer, PG Volunteers' Board" Mr. Jim Tinsley answered all kind of questions about all kinds of problems people have when they start volunteering. So I did the same and sent him my question. I received an extensive answer about all kind of formats in the "ISO 8859 Alphabet Soup" and he recommended me to use "Codepage 1252" which is very common in Windows. Here are the addresses which Jim sent to me:

"If you are interested in the differences, I recommend the excellent web page

http://czyborra.com/charsets/codepages.html

in the excellent reference site http://czyborra.com"

I chose a French book, first because I had it already on my bookshelf, and secondly because I wanted to perfect my knowledge of the French language and typing seemed the right way to do it. When copying an author's text, you are very close to it. You also have to pay full attention to the spelling of the words. Gradually you come under the spell of the story and you forget that you are typing . . . Nevertheless, it is hard work, especially when it is not your native language, and therefore you shouldn't try to rush it. At first I started with two or three pages a day, which means that you would need about two months typing for an average book. But good typists can do it more quickly.

I can only applaud the aim of PG, to put books available on the net as much as possible and without cost, for every one in the whole world. I love to co-operate with it.

In the meantime there are thousands and thousands of books in the PG-collection, and that makes it a little difficult to find other examples which are free of copyright, because they must be from before 1923. Since I've got the "PG-bug" it's a challenge for me to find suitable copies, and I look for them high and low. I can buy a few books for a song and I take them home as a trophy, looking forward to the work which is waiting for me . . .

In libraries you can find old publications which you can find nowhere else.

It's amazing how fascinating old books can be and how much you can learn from them. For the moment I'm working on "Pecheur d'Islande" by Pierre Loti, in which I get acquainted with an old tradition of fishermen, very interesting. Without PG I would probably never have read this. There must be still a lot of little treasures in some old and dusty attics, waiting to be born again by the magic touch of a PG-volunteer.

If you do it, no compensation or payment is waiting, but . . . doing something disinterested and unselfish gives you a good feeling.

## Bookmarks:

B.1. Project Gutenberg:

Home Page and Search <https://www.gutenberg.org/> Contact Information <https://www.gutenberg.org/contactinfo.html> Donations <https://www.gutenberg.org/donation.html> List of FTP sites <https://www.gutenberg.org/list.html> Web Browse to texts <http://www.ibiblio.org/pub/docs/books/gutenberg/>

Mailing Lists <https://www.gutenberg.org/subs.html> Volunteers' Board <https://www.gutenberg.org/vol/wwwboard/> Copyright Rules <https://www.gutenberg.org/vol/pd.html> Books In Progress <http://www.dprice48.freeserve.co.uk/GutIP.html> (The InProg List)

Greek Transliteration <https://www.gutenberg.org/vol/greek.html>

Music <http://www.ibiblio.org/gutenberg/music/music_helpex.html#what-software>

GUTINDEX.ALL <ftp://ibiblio.org/pub/docs/books/gutenberg/GUTINDEX.ALL> (Complete list of posted eBooks)

B.2. Distributed Proofing Sites:

Charles Franks <https://www.pgdp.net/> JC Byers <http://www.wollamshram.ca/1001/index.htm> Dewayne Cushman <http://www.metalbox.net/dcushman/pgroot.htm>

B.3. Other On-Line eBook Pages:

The On-Line Books Page <http://onlinebooks.library.upenn.edu/> /In Progress List <http://onlinebooks.library.upenn.edu/in-progress.html> Internet Public Library <http://www.ipl.org/>

B.4. Lists of Suggested Books to Transcribe:

PG Books In Progress <http://www.dprice48.freeserve.co.uk/GutIP.html> On-Line Requested List <http://onlinebooks.library.upenn.edu/in-progress.html#requests> Steve Harris' "To-do"s <http://www.steveharris.net/PGList.htm>

B.5. Finding Paper Books On-Line:

Advanced Book Exchange <http://www.abebooks.com> Alibris <http://www.alibris.com> Trussel BookSearch <http://www.trussel.com/f_books.htm> Library of Congress Catalog <http://catalog.loc.gov>

B.6. Character Sets

Overviews <http://czyborra.com> <http://www.cs.tut.fi/~jkorpela/chars/index.html> ISO-8859 <http://czyborra.com/charsets/iso8859.html> Microsoft & Other Codepages <http://czyborra.com/charsets/codepages.html> Unicode <http://www.unicode.org>