Du bon usage de...
Son-of-RFC 1036
News Article Format and Transmission
A. Archeological Notes
A.1. A-News Article Format
The obsolete "A News" article format consisted of exactly
five lines of header information, followed by the body. For
example:
Aeagle.642
news.misc
cbosgd!mhuxj!mhuxt!eagle!jerry
Fri Nov 19 16:14:55 1982
Usenet Etiquette - Please Read
body
body
body
The first line consisted of an "A" followed by an article ID
(analogous to a message ID and used for similar purposes).
The second line was the list of newsgroups. The third line
was the path. The fourth was the date, in the format above
(all fields fixed width), resembling an Internet date but
not quite the same. The fifth was the subject.
This format is documented for archeological purposes only.
Do not generate articles in this format.
A.2. Early B-News Article Format
The obsolete pseudo-Internet article format, used briefly
during the transition between the A News format and the mod-
ern format, followed the general outline of a MAIL message
but with some non-standard headers. For example:
From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz)
Newsgroups: news.misc
Title: Usenet Etiquette -- Please Read
Article-I.D.: eagle.642
Posted: Fri Nov 19 16:14:55 1982
Received: Fri Nov 19 16:59:30 1982
Expires: Mon Jan 1 00:00:00 1990
body
body
body
The From header contained the information now found in the
Path header, plus possibly the full name now typically found
in the From header. The Title header contained what is now
the Subject content. The Posted header contained what is
now the Date content. The Article-I.D. header contained an
article ID, analogous to a message ID and used for similar
purposes. The Newsgroups and Expires headers were approxi-
mately as now. The Received header contained the date when
the latest relayer to process the article first saw it. All
dates were in the above format, with all fields fixed width,
resembling an Internet date but not quite the same.
This format is documented for archeological purposes only.
Do not generate articles in this format.
A.3. Obsolete Headers
Early versions of news software following the modern format
sometimes generated headers like the following:
Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
Date-Received: Friday, 19-Nov-82 16:59:30 EST
Relay-Version contained version information about the
relayer that last processed the article. Posting-Version
contained version information about the posting agent that
posted the article. Date-Received contained the date when
the last relayer to process the article first saw it (in a
slightly nonstandard format).
These headers are documented for archeological purposes
only. Do not generate articles using them.
A.4. Obsolete Control Messages
There once was a senduuname control message, resembling
sendsys but requesting transmission of the list of hosts
that the receiving host had UUCP connections to. This
rapidly ceased to be of much use, and many organizations
consider information about their internal connectivity to be
confidential.
Historically, a checkgroups body consisting of one or two
lines, the first of the form "-n newsgroup", caused check-
groups to apply to only that single newsgroup. This form is
documented for archeological purposes only; do not use it.
Historically, an article posted to a newsgroup whose name
had exactly three components of which the third was "ctl"
signified that article was to be taken as a control message.
The Subject header specified the actions, in the same way
the Control header does now. This form is documented for
archeological purposes only; do not use it; do not implement
it.
B. A Quick Tour Of MIME
(The editor wishes to thank Luc Rooijakkers; most of this
appendix is a lightly-edited version of a summary he kindly
supplied.)
MIME (Multipurpose Internet Mail Extensions) is an upward-
compatible set of extensions to RFC 822, currently docu-
mented in RFCs 1341 and 1342. This appendix summarizes
these documents. See the MIME RFCs for more information;
they are very readable.
UNRESOLVED ISSUE: These RFC numbers (here and
elsewhere in this Draft) need updating when the
new MIME RFCs come out.
MIME defines the following new headers:
MIME-Version
Content-Type
Content-Transfer-Encoding
Content-ID
Content-Description
The MIME-Version header is mandatory for all messages con-
forming to the MIME specification and carries the version
number of the MIME specification. Example:
MIME-Version: 1.0
The Content-Type header indicates the content type of the
message. Content types are split into a top-level type and
a subtype, separated by a slash. Auxiliary information can
also be supplied, using an attribute-value notation. Exam-
ple:
Content-Type: text/plain; charset=us-ascii
(In the absence of a Content-Type header this is in fact the
default content type.)
Important type/subtype combinations are
- text/plain
- Plain text, possibly in a non-
ASCII character set.
- text/enriched
- A very simple wordprocessor-like
language supporting character
attributes (e.g., underlining),
justification control, and multi-
ple character sets. (This pro-
posal has gone through several
iterations and has recently split
off from the main MIME RFCs into a
separate document.)
- message/rfc822
- A mail message conforming to a
slightly-relaxed version of RFC
822.
- message/partial
- Part of a message (supporting the
transparent splitting and joining
of messages when they are too
large to be handled by some trans-
port agent).
- message/external-body
- A message whose body is external.
Possible access methods include
via mail, FTP, local file, etc.
- multipart/mixed
- A message whose body consists of
multiple parts, possibly of dif-
ferent types, intended to be
viewed in serial order. Each part
looks like an RFC 822 message,
consisting of headers and a body.
Most of the RFC 822 headers have
no defined semantics for body
parts.
- multipart/parallel
- Likewise, except that the parts
are intended to be viewed in par-
allel (on user agents that support
it).
- multipart/alternative
- Likewise, except that the parts
are intended to be semantically
equivalent such that the part that
best matches the capabilities of
the environment should be dis-
played. For example, a message
may include plain-text, enriched-
text, and postscript versions of
some document.
- multipart/digest
- A variant of multipart/mixed espe-
cially intended for message
digests (the default type of the
parts is message/rfc822 instead of
text/plain, saving on the number
of headers for the parts).
- application/postscript
- A PostScript document.
(PostScript is a trademark of
Adobe.)
Other top-level types exist for still images, audio, and
video samples.
Some of the above types require the ability to transport
binary data. Since the existing message systems usually do
not support this, MIME provides a Content-Transfer-Encoding
header to indicate the kind of encoding used. The possible
encodings are:
- 7bit
- No encoding; the data consists of short
(less than 1000 characters) lines of
7-bit ASCII data, delimited by EOL
sequences. This is the default encod-
ing.
- 8bit
- Like 7bit, except that bytes with the
high-order bit set may be present.
Many transmission paths are incapable
of carrying messages which use this
encoding.
- binary
- No encoding; any sequence of bytes may
be present. Many transmission paths
are incapable of carrying messages
which use this encoding.
- base64
- The data is encoded by representing
every group of 3 bytes as 4 characters
from the alphabet "A-Za-z0-9+/", which
was chosen for its high robustness
through mail gateways (the alphabet
used by uuencode does not survive
ASCII-EBCDIC-ASCII translations). In
the final group of 4 characters, "=" is
used for those characters not repre-
senting data bytes. Line length is
limited and EOLs in the encoded form
are ignored.
- quoted-printable
- Any byte can be represented by a three
character "=XX" sequence where the X's
are upper case hexadecimal digits.
Bytes representing printable 7-bit US-
ASCII characters except "=" may be rep-
resented literally. Tabs and blanks
may be represented literally if not at
the end of a line. Line length is lim-
ited, and an EOL preceded by "=" was
inserted for this purpose and is not
present in the original.
The base64 and quoted-printable encodings are applied to
data in Internet canonical form, which means that any EOL
encoded as anything but EOL must be an Internet canonical
EOL: CR followed by LF.
The Content-Description header allows further description of
a body part, analogous to the use of Subject for messages.
Finally, the Content-ID header can be used to assign an
identification to body parts, analogous to the assignment of
identifications to messages by Message-ID.
Note that most of these headers are structured header
fields, as defined in RFC 822. Consequently, comments are
allowed in their values. The following is a legal MIME
header:
Content-Type: (a comment) text (yeah) /
plain (and now some params:) ; charset= (guess what)
iso-8859-1 (we don't have iso-10646 yet, pity)
NOTE: Although the MIME specification was devel-
oped for mail, there is nothing precluding its use
for news as well. While it might simplify imple-
mentation to restrict the MIME headers somewhat,
in the same way that other news headers (e.g.
From) are restricted subsets of the RFC-822 origi-
nals, this would add yet another divergence
between two formats that ought to be as compatible
as possible. In the case of the MIME headers,
there is no body of existing code posing compati-
bility concerns. A full-featured MIME reading
agent needs a full RFC-822 parser anyway, to prop-
erly handle body parts of types like mes-
sage/rfc822, so there is little gain from
restricting MIME headers. Adopting the MIME spec-
ification unchanged seems best. However, article-
level MIME headers must still comply with the
overall news header syntax given in section 4, so
that news software which is NOT interested in MIME
need not contain a full RFC-822 parser.
The second part of MIME, RFC 1342 (Representation of Non-
ASCII Text in Internet Message Headers), addresses the prob-
lem of non-ASCII characters in headers. An example of a
header using the RFC 1342 mechanism is
From: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Such encodings are allowed in selected headers, subject to
the restrictions listed in RFC 1342.
The MIME effort has also produced an RFC defining a Content-
MD5 header [rrr 1544], containing an MD5-based "checksum" of
the contents of an article or body part, giving high confi-
dence of detecting accidental modifications to the contents.
The "metamail" software package [rrr] helps provide MIME
support with minimal changes to mailers, and may also be
relevant to news reading agents.
The PEM (Privacy Enhanced Mail) effort is pursuing analogous
facilities to offer stronger guarantees against malicious
modifications, unauthorized eavesdropping, and forgery.
This work too may be applicable to news, once it is recon-
ciled with MIME (by efforts now underway).
C. Summary of Changes Since RFC 1036
This Draft is much longer than RFC 1036, so there is obvi-
ously much change in content. Much of this is just
increased precision and rigor. Noteworthy changes and addi-
tions include:
- section 4.3's restrictions on article bodies
- all references to MIME facilities
- size limits on articles
- precise specification of Date-content syntax
- message IDs must never be re-used, ever
- "!" is the only Path delimiter
- multiple moderators in the Approved header
- rules on References trimming, and the _-_ mechanism
- generalization of the Xref rules
- multiple message IDs in Cancel and Supersedes
- Also-Control
- See-Also
- Article-Names
- Article-Replacing
- more precise rules for cancellation
- cancellation authorization based on From, not Sender
- "unmoderated" and descriptors in newgroup messages
- restrictive rules on handling of sendsys and version messages
- the whogets control message
- precise specification of checkgroups messages
- compression type preferably specified out-of-band
- rules for encapsulating news in MIME mail
- tighter specification of relayer functioning (section 9.1)
- the "newsmaster" contact address
- rules for gatewaying (section 10)
- discussion of security issues (section 11)
D. Summary of Completely New Features
Most of this Draft merely documents existing practice, but
there are a few attempts to extend it. These are:
TBW
E. Summary of Differences From RFC 822+1123
The following are noteworthy differences between this
Draft's articles and MAIL messages:
- generally less-permissive header syntax
- notably, limited From syntax
- MAIL header comments allowed in only a few contexts
- slightly more restricted message-ID syntax
- several more mandatory headers
- duplicate headers forbidden
- References/See-Also versus In-Reply-To/References (section 6.5)
- case sensitivity in some contexts
- point-to-point headers, e.g. To and Cc, forbidden (section 6)
- several new headers
References
[Sanderson] "Smileys", David Sanderson, O'Reilly & Associates Ltd., 1993.
TBW
Security Considerations
Section 11 discusses security considerations in detail.
Author's Address
Henry Spencer
henry@zoo.toronto.edu
SP Systems
Box 280 Stn. A
Toronto, Ont. M5W1B2 Canada
|