PMDF System Manager's Guide


Previous Contents Index

2.3.4.51 Automatic character set labelling (charset7, charset8, charsetesc)

The MIME specification provides a mechanism to label the character set used in a plain text message. Specifically, a "charset=" parameter can be specified as part of the Content-type: header line. Various character set names are defined in MIME, including US-ASCII (the default), ISO-8859-1, ISO-8859-2, and so on. Additional character set names will undoubtedly be added to the list in the future.

Most existing systems and user agents, however, do not provide any mechanism for generating these character set labels. In particular, plain text messages sent from VMS MAIL are not properly labelled. The charset7 , charset8 , and charsetesc channel keywords provide a per-channel mechanism to specify character set names to be inserted into message headers. Each keyword requires a single argument giving the character set name. The names are not checked for validity. Note, however, that character set conversion can only be done on character sets specified in the PMDF character set definition file charsets.txt found in the PMDF table directory, (i.e., PMDF_TABLE:charsets.txt on OpenVMS or /pmdf/table/charsets.txt on UNIX. The names defined in this file should be used if possible.

The charset7 character set name is used if the message contains only seven bit characters; the charset8 will be used if eight bit data is found in the message; charsetesc will be used if a message containing only seven bit data happens to contain the escape character. If the appropriate keyword is not specified no character set name will be inserted into the Content-type: header lines.

These character set specifications never override existing labels; that is, they have no effect if a message already has a character set label or is of a type other than text.

It is usually appropriate to label the PMDF local channel as follows:


l ... charset7 US-ASCII charset8 ISO-8859-1 ... 
official-host-name

OpenVMS systems actually use the DEC Multinational Character Set (DEC-MCS). The character set is very close to ISO-8859-1, however, so this labelling will work well enough in most cases. If absolute accuracy is an issue, the local channel can be marked as using DEC-MCS


l ... charset7 US-ASCII charset8 DEC-MCS ... 
official-host-name
and an appropriate character set conversion can be set up to convert DEC-MCS to ISO-8859-1 as needed. See Chapter 6 for details on how to set up such conversions.

The charsetesc keyword tends to be particularly useful on channels that receive unlabelled messages using Japanese or Korean character sets that contain the escape character.


Previous Next Contents Index