Introduction
CopiaFacts applications are not yet fully Unicode-enabled and cannot in general display Unicode on the screen or accept Unicode input. However it is now possible to use UTF-8 encoding in CopiaFacts command files. UTF-8 can be used in:
| • | assignments to variables for use in e-mails and graphical cover sheets |
| • | e-mail content such as subject and body text lines |
| • | e-mail body text files |
| • | broadcast lists |
A simple Unicode editor, COPIAUNIEDIT, is provided to edit command files; this program is fully Unicode-enabled and can display the content correctly. You can also use this editor to enter Unicode text such as Japanese with the normal Windows IME tools.
Pre-requisites
You require Unicode font sets on the machine(s) you will be using to process Unicode. If you do not have these files installed, the procedure is as follows: open the Control Panel, double click on Regional and Language Option, go to the "Languages" Tab, and in the "Supplemental Language Support", select either or both checkboxes. Check the first one for Arabic, Armenian, Georgian, Hebrew, Indic, Thai, and Vietnamese, and the second one for Chinese, Japanese, and Korean. You may be asked to insert your Windows operating system media.
Choosing between Unicode or Windows default encoding
In the initial implementation, the byte-order mark (BOM) in command files is stripped and ignored. So UTF-8 files look the same as Windows default encoding, and the implementation depends on what uses you make of the values embedded in the files. If you are using CopiaFacts in a Western environment, working with names which may have accented characters in the Windows default code page, you can continue to operate normally.
On the other hand if you need to process Japanese or other Unicode text, you can include UTF-8 encoding in your command files, and specify that you are using Unicode by adding $email_charset commands for UTF-8 and adding the UTF-8 attribute to graphical cover annotations. But you cannot combine this with Windows default encoding in the same system, and any accented characters in command files must also be entered in UTF-8 encoding. The most likely place where Windows default encoding might appear is in FS files derived from a broadcast list, so we provide an option to convert broadcast lists to Unicode to avoid this problem.
A future CopiaFacts release will use only Unicode internally, and will maintain command files with and without byte-order marks, thus allowing command files containing Windows default encoding to be used alongside Unicode command files.
Unicode Encoding
Only the UTF-8 encoding is currently supported in CopiaFacts Unicode command files. Files may optionally start with the standard byte-order mark (BOM) bytes for UTF-8 (EF BB BF hexadecimal). Files with a BOM can only be processed by CopiaFacts builds later than 10/10/2008. Files containing UTF-8 encoding without a BOM can be processed by all version 7.x CopiaFacts releases.
Syntax elements in UTF-8 command files, such as space, double-quote, comma, pipe-symbol, semi-colon, etc. must be written as the standard ASCII characters, not using non-ASCII Unicode code points. These ASCII characters do not form part of any UTF-8 multi-byte sequence.
As an example of Unicode encoding, if you need to display the destination fax number in Japanese in a confirmation e-mail, you could write in a post-process infobox:
$email_text "ファクス番号 `RCVRFAX"
This command can be entered directly in the COPIAUNIEDIT editor: it will not display correctly in COPIAEDIT even though the output will still be correct in an e-mail for which $email_charset utf-8 has been specified.
When CopiaFacts programs write command files, for example when an FS file has been processed and written to SENT or FAIL, it will preserve the encoding on commands, but will not yet write the BOM. Such files can be processed in all version 7.x CopiaFacts releases.
Although UTF-8 files can be edited in Word, we do not recommend this, because it is easy to save the file with a non-supported Unicode encoding and difficult to avoid Word appending an extra .TXT extension when saving the file.
Graphical Cover Sheets
F7GCOVER can process UTF-8 Unicode annotations. And FFVIEWER has a new annotation property of UTF-8 which is set as usual by right-clicking the annotation title. You also need of course to specify an annotation font which includes Unicode characters. To process text files for faxing which are in UTF-8 encoding, set the UTF-8 attribute on the large annotation(s) in your ASCII_TEMPLATE GTT file.
Because FFVIEWER cannot currently display Unicode, we recommend first creating a watermark file using Word and FFSAVE, and then creating annotations with the UTF-8 attribute containing variables to be expanded. The HTML annotation attribute does not yet support UTF-8 encoding.
FFVIEWER and F7GCOVER can also process SJIS-encoded Japanese files if the font is an SJIS font, the character set is specified as Japanese, and the UTF-8 attribute does not appear on the annotation. SJIS was supported in an early FaxFacts release, but this feature is now the only remaining SJIS support. It is recommended that you only use an SJIS font for the large annotation(s) in an ASCII_TEMPLATE GTT file, to process SJIS-encoded text files, and that all other Japanese text should be encoded UTF-8. |
Broadcast Lists
CopiaFacts has supported Unicode broadcast lists since FFBC release 7.245 and F7JOBADM release 7.327, but only if the list content was convertable to characters in the active code page.
From release 7.355 of F7JOBADM.DLL, 7.302 of JOBADMIN and 7.257 of FFBC, the job variable UTF8_LISTS (for FFBC, environment variable) will cause text-file lists for a job to be loaded with UTF-8 encoding from any type of Unicode file. For lists without a BOM this option assumes that they contain Windows default encoding and converts the content to UTF-8. Again, the specified field separator character must be an ASCII character. This setting also affects XLS lists loaded with USE_EXCEL but not yet those loaded by the default internal XLS reader, nor lists loaded from DBF files.
This feature allows Unicode strings from a list to be placed on graphical cover sheets, substituted in HTML or WordMerge documents, or used in e-mail parameters. In a future release when Unicode is fully supported, it will become the broadcast list default.
Topic url: http://www.copia.com/support/refmanual/index.html?using_unicode.htm