Thursday, September 18, 2008

Common text formats for scanlation groups

I'm assisting aeriandria in finding new proofreaders. We send out a test to qualified applicants with instructions to proof them and return them. Some of the tests I've gotten back were in .DOCX format. After some research on the web I discovered it's good ol' Microsoft trying to make an open format proprietary; it's MS-Word XML. In practical terms it meant I couldn't open the files, neither with MS-WORD XP nor with Open Office. FAIL.

Why fail? If I can't open the files, how can I review them? My task is not to figure out how to open an obscure text file format, it's to review the tests. I can say with confidence most editors in scan groups are going to immediately email, "Hey, resubmit this in .DOC or .RTF format, okay?"

In my experience scan groups work with three text-file formats: MS-Word .DOC, open-source .RTF or simple .TXT format. Some groups prefer proofreading in .DOC so they can take advantage of MS-Word "Record Changes" feature. Others prefer .RTF due to it's ease and simplicity and x-platform ability. In .RTF you can color the text to indicate corrections/changes. 

Often the initial translated scripts are in .TXT format.  The proofreader then uses an editor to convert it to .DOC or .RTF format. The .TXT file format is often used by editors when they are typesetting. The finished script is given to the editors in .TXT format to allow ease of copy and paste. Note: Due to the multi-national nature of scanlation, the .TXT files are often UTF-8 encoded. 

Here is an excellent (free) text editor named "Rough Draft" designed to work with the .RTF format:
Here is a (free) text editor, "EditPad Lite," that works with many encoding formats including UTF-8:

