Du bon usage de...
Son-of-RFC 1036
News Article Format and Transmission
1. Introduction
Network news articles resemble mail messages but are broad-
cast to potentially-large audiences, using a flooding algo-
rithm that propagates one copy to each interested host (or
groups thereof), typically stores only one copy per host,
and does not require any central administration or system-
atic registration of interested users. Network news origi-
nated as the medium of communication for Usenet, circa 1980.
Since then Usenet has grown explosively, and many Internet
sites participate in it. In addition, the news technology
is now in widespread use for other purposes, on the Internet
and elsewhere.
The earliest news interchange used the so-called "A News"
article format. Shortly thereafter, an article format
vaguely resembling Internet mail was devised and used
briefly. Both of those formats are completely obsolete;
they are documented in appendix A for historical reasons
only. With publication of RFC 850 [rrr] in 1983, news arti-
cles came to closely resemble Internet mail messages, with
some restrictions and some additional headers. RFC 1036
[rrr] in 1987 updated RFC 850 without making major changes.
In the intervening five years, the RFC 1036 article format
has proven quite satisfactory, although minor extensions
appear desirable to match recent developments in areas such
as multi-media mail. RFC 1036 itself has not proven quite
so satisfactory. It is often rather vague and does not
address some issues at all; this has caused significant
interoperability problems at times, and implementations have
diverged somewhat. Worse, although it was intended primar-
ily to document existing practice, it did not precisely
match existing practice even at the time it was published,
and the deviations have grown since.
This Draft attempts to specify the format of articles, and
the procedures used to exchange them and process them, in
sufficient detail to allow full interoperability. In addi-
tion, some tentative suggestions are made about directions
for future development, in an attempt to avert unnecessary
divergence and consequent loss of interoperability. Major
extensions (e.g. cryptographic authentication) that need
significant development effort are left to be undertaken as
independent efforts.
NOTE: One question this all may raise is: why is
there no News-Version header, analogous to MIME-
Version, specifying a version number corresponding
to this specification? The answer is: it doesn't
appear to be useful, given news's backward-
compatibility constraints. The major use of a
version number is indicating which of several
INCOMPATIBLE interpretations is relevant. The
impossibility of orchestrating any sort of simul-
taneous change over news's installed base makes it
necessary to avoid such incompatible changes (as
opposed to extensions) entirely. MIME has a ver-
sion number mostly because it introduced incompat-
ible changes to the interpretation of several
"Content-" headers. This Draft attempts no
changes in interpretation and it appears doubtful
that future Drafts will find it feasible to intro-
duce any.
UNRESOLVED ISSUE: Should this be reconsidered?
Only if the header has SPECIFIC IDENTIFIABLE uses
today. Otherwise it's just useless added bulk.
As in this Draft's predecessors, the exact means used to
transmit articles from one host to another is not specified.
NNTP [rrr] is probably the most common transmission method
on the Internet, but a number of others are known to be in
use, including the UUCP protocol [rrr] extensively used in
the early days of Usenet and still much used on its fringes
today.
Several of the mechanisms described in this Draft may seem
somewhat strange or even bizarre at first reading. As with
Internet mail, there is no reasonable possibility of updat-
ing the entire installed base of news software promptly, so
interoperability with old software is crucial and will
remain so. Compatibility with existing practice and robust-
ness in an imperfect world necessarily take priority over
elegance.
2. Definitions, Notations, and Conventions
2.1. Textual Notations
Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
amended by RFC 1123 [rrr]". (RFC 1123's amendments are
mostly relatively small, but they are not insignificant.)
See also the discussion in section 3 about this Draft's
relationship to MAIL. "MIME" is short for "RFCs 1341 and
1342" (or their updated replacements).
UNRESOLVED ISSUE: Update these numbers.
"ASCII" is short for "the ANSI X3.4 character set" [rrr].
While "ASCII" is often misused to refer to various character
sets somewhat similar to X3.4, in this Draft, "ASCII" means
X3.4 and only X3.4.
NOTE: The name is traditional (to the point where
the ANSI standard sanctions it) even though it is
no longer an acronym for the name of the standard.
NOTE: ASCII, X3.4, contains 128 characters, not
all of them printable. Character sets with more
characters are not ASCII, although they may
include it as a subset.
Certain words used to define the significance of individual
requirements are capitalized. "MUST" means that the item is
an absolute requirement of the specification. "SHOULD"
means that the item is a strong recommendation: there may be
valid reasons to ignore it in unusual circumstances, but
this should be done only after careful study of the full
implications and a firm conclusion that it is necessary,
because there are serious disadvantages to doing so. "MAY"
means that the item is truly optional, and implementors and
users are warned that conformance is possible but not to be
relied on.
The term "compliant", applied to implementations etc., indi-
cates satisfaction of all relevant "MUST" and "SHOULD"
requirements. The term "conditionally compliant" indicates
satisfaction of all relevant "MUST" requirements but viola-
tion of at least one relevant "SHOULD" requirement.
This Draft contains explanatory notes using the following
format. These may be skipped by persons interested solely
in the content of the specification. The purpose of the
notes is to explain why choices were made, to place them in
context, or to suggest possible implementation techniques.
NOTE: While such explanatory notes may seem super-
fluous in principle, they often help the less-
than-omniscient reader grasp the purpose of the
specification and the constraints involved. Given
the limitations of natural language for descrip-
tive purposes, this improves the probability that
implementors and users will understand the true
intent of the specification in cases where the
wording is not entirely clear.
All numeric values are given in decimal unless otherwise
indicated. Octets are assumed to be unsigned values for
this purpose. Large numbers are written using the North
American convention, in which "," separates groups of three
digits but otherwise has no significance.
2.2. Syntax Notation
Although the mechanisms specified in this Draft are all
described in prose, most are also described formally in the
modified BNF notation of RFC 822. Implementors will need to
be familiar with this notation to fully understand this
specification, and are referred to RFC 822 for a complete
explanation of the modified BNF notation. Here is a brief
illustrative example:
sentence = clause *( punct clause ) "."
punct = ":" / ";"
clause = 1*word [ "(" clause ")" / "," 1*word ]
word = <any English word>
This defines a sentence as some clauses separated by puncts
and ended by a period, a punct as a colon or semicolon, a
clause as at least one <word> optionally followed by either
a parenthesized clause or a comma and at least one more
<word>, and a <word> as (informally) any English word. <>
are used to enclose names when (and only when) distinguish-
ing them from surrounding text is useful. The full form of
the repetition notation is <m>"*"<n><thing>, denoting <m>
through <n> repetitions of <thing>; <m> defaults to zero,
<n> to infinity, and the "*" and <n> can be omitted if <m>
and <n> are equal, so 1*word is one or more words, 1*5word
is one through five words, and 2word is exactly two words.
The character "\" is not special in any way in this nota-
tion.
This Draft is intended to be self-contained; all syntax
rules used in it are defined within it, and a rule with the
same name as one found in MAIL does not necessarily have the
same definition. The lexical layer of MAIL is NOT, repeat
NOT, used in this Draft, and its presence must not be
assumed; notably, this Draft spells out all places where
white space is permitted/required and all places where con-
structs resembling MAIL comments can occur.
NOTE: News parsers historically have been much
less permissive than MAIL parsers.
2.3. Definitions
The term "character set", wherever it is used in this Draft,
refers to a coded character set, in the sense of ISO charac-
ter set standardization work, and must not be misinterpreted
as meaning merely "a set of characters".
In this Draft, ASCII character 32 is referred to as "blank";
the word "space" has a more generic meaning.
An "article" is the unit of news, analogous to a MAIL "mes-
sage".
A "poster" is a human being (or software equivalent) submit-
ting a possibly-compliant article to be "posted": made
available for reading on all relevant hosts. A "posting
agent" is software that assists posters to prepare articles,
including determining whether the final article is compli-
ant, passing it on to a relayer for posting if so, and
returning it to the poster with an explanation if not. A
"relayer" is software which receives allegedly-compliant
articles from posting agents and/or other relayers, files
copies in a "news database", and possibly passes copies on
to other relayers.
NOTE: While the same software may well function
both as a relayer and as part of a posting agent,
the two functions are distinct and should not be
confused. The posting agent's purpose is (in
part) to validate an article, supply header infor-
mation that can or should be supplied automati-
cally, and generally take reasonable actions in an
attempt to transform the poster's submission into
a compliant article. The relayer's purpose is to
move already-compliant articles around efficiently
without damaging them.
A "reader" is a human being reading news articles. A "read-
ing agent" is software which presents articles to a reader.
NOTE: Informal usage often uses "reader" for both
these meanings, but this introduces considerable
potential for confusion and misunderstanding, so
this Draft takes care to make the distinction.
A "newsgroup" is a single news forum, a logical bulletin
board, having a name and nominally intended for articles on
a specific topic. An article is "posted to" a single news-
group or several newsgroups. When an article is posted to
more than one newsgroup, it is said to be "cross-posted";
note that this differs from posting the same text as part of
each of several articles, one per newsgroup. A "hierarchy"
is the set of all newsgroups whose names share a first com-
ponent (see the name syntax in section 5.5).
A newsgroup may be "moderated", in which case submissions
are not posted directly, but mailed to a "moderator" for
consideration and possible posting. Moderators are typi-
cally human but may be implemented partially or entirely in
software.
A "followup" is an article containing a response to the con-
tents of an earlier article (the followup's "precursor"). A
"followup agent" is a combination of reading agent and post-
ing agent that aids in the preparation and posting of a fol-
lowup.
Text comparisons are "case-sensitive" if they consider
uppercase letters (e.g. "A") different from lowercase let-
ters (e.g. "a"), and "case-insensitive" if letters differing
only in case (e.g. "A" and "a") are considered identical.
Categories of text are said to be case-(in)sensitive if com-
parisons of such texts to others are case-(in)sensitive.
A "cooperating subnet" is a set of news-exchanging hosts
which is sufficiently well-coordinated (typically via a cen-
tral administration of some sort) that stronger assumptions
can be made about hosts in the set than about news hosts in
general. This is typically used to relax restrictions which
are otherwise required for worst-case interoperability; mem-
bers of a cooperating subnet MAY interchange articles that
do not conform to this Draft's specifications, provided all
members have agreed to this and provided the articles are
not permitted to leak out of the subnet. The word "subnet"
is used to emphasize that a cooperating subnet is typically
not an isolated universe; care must be taken that traffic
leaving the subnet complies with the restrictions of the
larger net, not just those of the cooperating subnet.
A "message ID" is a unique identifier for an article, usu-
ally supplied by the posting agent which posted it. It dis-
tinguishes the article from every other article ever posted
anywhere (in theory). Articles with the same message ID are
treated as identical copies of the same article even if they
are not in fact identical.
A "gateway" is software which receives news articles and
converts them to messages of some other kind (e.g. mail to a
mailing list), or vice-versa; in essence it is a translating
relayer that straddles boundaries between different methods
of message exchange. The most common type of gateway
connects newsgroup(s) to mailing list(s), either unidirec-
tionally or bidirectionally, but there are also gateways
between news networks using this Draft's news format and
those using other formats.
A "control message" is an article which is marked as con-
taining control information; a relayer receiving such an
article will (subject to permissions etc.) take actions
beyond just filing and passing on the article.
NOTE: "Control article" would be more consistent
terminology, but "control message" is already well
established.
An article's "reply address" is the address to which mailed
replies should be sent. This is the address specified in
the article's From header (see section 5.2), unless it also
has a Reply-To header (see section 6.3).
The notation (e.g.) "(ASCII 17)" following a name means
"this name refers to the ASCII character having value 17".
An "ASCII printable character" is an ASCII character in the
range 33-126. An "ASCII control character" is an ASCII
character in the range 0-31, or the character DEL (ASCII
127). A "non-ASCII character" is a character having a value
exceeding 127.
NOTE: Blank is neither an "ASCII printable charac-
ter" nor an "ASCII control character".
2.4. End Of Line
How the end of a text line is represented depends on the
context and the implementation. For Internet transmission
via protocols such as SMTP [rrr], an end-of-line is a CR
(ASCII 13) followed by an LF (ASCII 10). ISO C [rrr] and
many modern operating systems indicate end-of-line with a
single character, typically ASCII LF (aka "newline"), and
this is the normal convention when news is transmitted via
UUCP. A variety of other methods are in use, including out-
of-band methods in which there is no specific character that
means end-of-line.
This Draft does not constrain how end-of-line is represented
in news, except that characters other than CR and LF MUST
not be usurped for use in end-of-line representations.
Also, obviously, all software dealing with a particular copy
of an article must agree on the convention to be used.
"EOL" is used to mean "whatever end-of-line representation
is appropriate"; it is not necessarily a character or
sequence of characters.
NOTE: If faced with picking an EOL representation
in the absence of other constraints, use of a sin-
gle character simplifies processing, and the ASCII
standard [rrr] specifies that if one character is
to be used for this purpose, it should be LF
(ASCII 10).
NOTE: Inside MIME encodings, use of the Internet
canonical EOL representation (CR followed by LF)
is mandatory. See [rrr].
2.5. Case-Sensitivity
Text in newsgroup names, header parameters, etc. is case-
sensitive unless stated otherwise.
NOTE: This is at variance with MAIL, which is
case-insensitive unless stated otherwise, but is
consistent with news historical practice and
existing news software. See the comments on back-
ward compatibility in section 1.
2.6. Language
Various constant strings in this Draft, such as header names
and month names, are derived from English words. Despite
their derivation, these words do NOT change when the poster
or reader employing them is interacting in a language other
than English. Posting and reading agents SHOULD translate
as appropriate in their interaction with the poster or
reader, but the forms that actually appear in articles are
always the English-derived ones defined in this Draft.
3. Relation To MAIL (RFC 822 etc.)
The primary intent of this Draft is to completely describe
the news article format as a subset of MAIL's message format
augmented by some new headers. Unless explicitly noted oth-
erwise, the intent throughout is that an article MUST also
be a valid MAIL message.
NOTE: Despite obvious similarities between news
and mail, opinions vary on whether it is possible
or desirable to unify them into a single service.
However, it is unquestionably both possible and
useful to employ some of the same tools for manip-
ulating both mail messages and news articles, so
there is specific advantage to be had in defining
them compatibly. Furthermore, there is no appar-
ent need to re-invent the wheel when slight exten-
sions to an existing definition will suffice.
Given that this Draft attempts to be self-contained, it
inevitably contains considerable repetition of information
found in MAIL. This raises the possibility of unintentional
conflicts. Unless specifically noted otherwise, any wording
in this Draft which permits behavior that is not MAIL-
compliant is erroneous and should be followed only to the
extent that the result remains compliant with MAIL.
NOTE: RFC 1036 said "where this standard conflicts
with [RFC 822], RFC-822 should be considered cor-
rect and this standard in error". Taken liter-
ally, this was obviously incorrect, since RFC 1036
imposed a number of restrictions not found in RFC
822. The intent, however, was reasonable: to
indicate that UNINTENTIONAL differences were
errors in RFC 1036.
Implementors and users should note that MAIL is deliberately
an extensible standard, and most extensions devised for mail
are also relevant to (and compatible with) news. Note par-
ticularly MIME [rrr], summarized briefly in appendix B,
which extends MAIL in a number of useful ways that are defi-
nitely relevant to news. Also of note is the work in
progress on reconciling PEM (Privacy Enhanced Mail, which
defines extensions for authentication and security) with
MIME, after which this may also be relevant to news.
UNRESOLVED ISSUE: Update the MIME/PEM information.
Similarly, descriptions here of MIME facilities should be
considered correct only to the extent that they do not
require or legitimize practices that would violate those
RFCs. (Note that this Draft does extend the application of
some MIME facilities, but this is an extension rather than
an alteration.)
4. Basic Format
4.1. Overall Syntax
The overall syntax of a news article is:
article = 1*header separator body
header = start-line *continuation
start-line = header-name ":" space [ nonblank-text ] eol
continuation = space nonblank-text eol
header-name = 1*name-character *( "-" 1*name-character )
name-character = letter / digit
letter = <ASCII letter A-Z or a-z>
digit = <ASCII digit 0-9>
separator = eol
body = *( [ nonblank-text / space ] eol )
eol = <EOL>
nonblank-text = [ space ] text-character *( space-or-text )
text-character = <any ASCII character except NUL (ASCII 0),
HT (ASCII 9), LF (ASCII 10), CR (ASCII 13),
or blank (ASCII 32)>
space = 1*( <HT (ASCII 9)> / <blank (ASCII 32)> )
space-or-text = space / text-character
An article consists of some headers followed by a body. An
empty line separates the two. The headers contain struc-
tured information about the article and its transmission. A
header begins with a header name identifying it, and can be
continued onto subsequent lines by beginning the continua-
tion line(s) with white space. (Note that section 4.2.3
adds some restrictions to the header syntax indicated here.)
The body is largely-unstructured text significant only to
the poster and the readers.
NOTE: Terminology here follows the current custom
in the news community, rather than the MAIL con-
vention of (sometimes) referring to what is here
called a "header" as a "header field" or "field".
Note that the separator line must be truly empty, not just a
line containing white space. Further empty lines following
it are part of the body, as are empty lines at the end of
the article.
NOTE: Some systems make no distinction between
empty lines and lines consisting entirely of white
space; indeed, some systems cannot represent
entirely empty lines. The grammar's requirement
that header continuation lines contain some print-
able text is meant to ensure that the empty/space
distinction cannot confuse identification of the
separator line.
NOTE: It is tempting to authorize posting agents
to strip empty lines at the beginning and end of
the body, but such empty lines could possibly be
part of a preformatted document.
Implementors are warned that trailing white space, whether
alone on the line or not, MAY be significant in the body,
notably in early versions of the "uuencode" encoding for
binary data. Trailing white space MUST be preserved unless
the article is known to have originated within a cooperating
subnet that avoids using significant trailing white space,
and SHOULD be preserved regardless. Posters SHOULD avoid
using conventions or encodings which make trailing white
space significant; for encoding of binary data, MIME's
"base64" encoding is recommended. Implementors are warned
that ISO C implementations are not required to preserve
trailing white space, and special precautions may be neces-
sary in implementations which do not.
NOTE: Unfortunately, the signature-delimiter con-
vention (described in section 4.3.2) does use sig-
nificant trailing white space. It's too late to
fix this; there is work underway on defining an
organized signature convention as part of MIME,
which is a preferable solution in the long run.
Posters are warned that some very old relayer software mis-
behaves when the first non-empty line of an article body
begins with white space.
4.2. Headers
4.2.1. Names and Contents
Despite the restrictions on header-name syntax imposed by
the grammar, relayers and reading agents SHOULD tolerate
header names containing any ASCII printable character other
than colon (":", ASCII 58).
NOTE: MAIL header names can contain any ASCII
printable character (other than colon) in theory,
but in practice, arbitrary header names are known
to cause trouble for some news software. Section
4.1's restriction to alphanumeric sequences sepa-
rated by hyphens is believed to permit all widely-
used header names without causing problems for any
widely-used software. Software is nevertheless
encouraged to cope correctly with the full range
of possibilities, since aberrations are known to
occur.
Relayers MUST disregard headers not described in this Draft
(that is, with header names not mentioned in this Draft),
and pass them on unaltered.
Posters wishing to convey non-standard information in head-
ers SHOULD use header names beginning with "X-". No stan-
dard header name will ever be of this form. Reading agents
SHOULD ignore "X-" headers, or at least treat them with
great care.
The order of headers in an article is not significant. How-
ever, posting agents are encouraged to put mandatory headers
(see section 5) first, followed by optional headers (see
section 6), followed by headers not defined in this Draft.
NOTE: While relayers and reading agents must be
prepared to handle any order, having the signifi-
cant headers (the precise definition of "signifi-
cant" depends on context) first can noticeably
improve efficiency, especially in memory-limited
environments where it is difficult to buffer up an
arbitrary quantity of headers while searching for
the few that matter.
Header names are case-insensitive. There is a preferred
case convention, which posters and posting agents SHOULD
use: each hyphen-separated "word" has its initial letter (if
any) in uppercase and the rest in lowercase, except that
some abbreviations have all letters uppercase (e.g. "Mes-
sage-ID" and "MIME-Version"). The forms used in this Draft
are the preferred forms for the headers described herein.
Relayers and reading agents are warned that articles might
not obey this convention.
NOTE: Although software must be prepared for the
possibility of random use of case in header names
(and other case-independent text), establishing a
preferred convention reduces pointless diversity,
and may permit optimized software that looks for
the preferred forms before resorting to less-
efficient case-insensitive searches.
In general, a header can consist of several lines, with each
continuation line beginning with white space. The EOLs pre-
ceding continuation lines are ignored when processing such a
header, effectively combining the start-line and the contin-
uations into a single logical line. The logical line, less
the header name, colon, and any white space following the
colon, is the "header content".
4.2.2. Undesirable Headers
A header whose content is empty is said to be an empty
header. Relayers and reading agents SHOULD not consider
presence or absence of an empty header to alter the seman-
tics of an article (although syntactic rules, such as
requirements that certain header names appear at most once
in an article, MUST still be satisfied). Posting agents
SHOULD delete empty headers from articles before posting
them.
Headers that merely state defaults explicitly (e.g., a Fol-
lowup-To header with the same content as the Newsgroups
header, or a MIME Content-Type header with contents
"text/plain; charset=us-ascii") or state information that
reading agents can typically determine easily themselves
(e.g. the length of the body in octets) are redundant, con-
veying no information whatsoever. Headers that state infor-
mation which cannot possibly be of use to a significant num-
ber of relayers, reading agents, or readers (e.g., the name
of the software package used as the posting agent) are use-
less and pointless. Posters and posting agents SHOULD avoid
including redundant or useless headers in articles.
NOTE: Information that someone, somewhere, might
someday find useful is best omitted from headers.
(There's quite enough of it in article bodies.)
Headers should contain information of known util-
ity only. This is not meant to preclude inclusion
of information primarily meant for news-software
debugging, but such information should be included
only if there is real reason, preferably based on
experience, to suspect that it may be genuinely
useful. Articles passing through gateways are the
only obvious case where inclusion of debugging
information appears clearly legitimate. (See sec-
tion 10.1.)
NOTE: A useful rule of thumb for software imple-
mentors is: "if I had to pay a dollar a day for
the transmission of this header, would I still
think it worthwhile?".
4.2.3. White Space and Continuations
The colon following the header name on the start-line MUST
be followed by white space, even if the header is empty. If
the header is not empty, at least some of the content MUST
appear on the start-line. Posting agents MUST enforce these
restrictions, but relayers (etc.) SHOULD accept even arti-
cles that violate them.
NOTE: MAIL does not require white space after the
colon, but it is usual. RFC 1036 required the
white space, even in empty headers, and some
existing software demands it. In MAIL, and
arguably in RFC 1036 (although the wording is
vague), it is technically legitimate for the white
space to be part of a continuation line rather
than the start-line, but not all existing software
will accept this. Deleting empty headers and
placing some content on the start-line avoids this
issue... which is desirable because trailing
blanks, easily deleted by accident, are best not
made significant in headers.
In general, posters and posting agents SHOULD use blank
(ASCII 32), not tab (ASCII 9), where white space is desired
in headers. Existing software does not consistently accept
tab as synonymous with blank in all contexts. In particu-
lar, RFC 1036 appeared to specify that the character immedi-
ately following the colon after a header name was required
to be a blank, and some news software insists on that, so
this character MUST be a blank. Again, posting agents MUST
enforce these restrictions but relayers SHOULD be more tol-
erant.
Since the white space beginning a continuation line remains
a part of the logical line, headers can be "broken" into
multiple lines only at white space. Posting agents SHOULD
not break headers unnecessarily. Relayers SHOULD preserve
existing header breaks, and SHOULD not introduce new breaks.
Breaking headers SHOULD be a last resort; relayers and read-
ing agents SHOULD handle long header lines gracefully. (See
the discussion of size limits in section 4.6.)
4.3. Body
Although the article body is unstructured for most of the
purposes of this Draft, structure MAY be imposed on it by
other means, notably MIME headers (see appendix B).
4.3.1. Body Format Issues
The body of an article MAY be empty, although posting agents
SHOULD consider this an error condition (meriting returning
the article to the poster for revision). A posting agent
which does not reject such an article SHOULD issue a warning
message to the poster and supply a non-empty body. Note
that the separator line MUST be present even if the body is
empty.
NOTE: An empty body is probably a poster error
except, arguably, for some control messages... and
even they really ought to have a body explaining
the reason for the control message. Some old
reading agents are known to generate empty bodies
for "cancel" control messages, so posting agents
might opt not to reject body-less articles in such
cases (although it would be better to fix the
reading agents to request a body). However, some
existing news software is known to react badly to
body-less articles, hence the request for posting
agents to insert a body in such cases.
NOTE: A possible posting-agent-supplied body text
(already used by one widespread posting agent) is
"This article was probably generated by a buggy
news reader.". (The use of "reader" to refer to
the reading agent is traditional, although this
Draft uses more precise terminology.)
NOTE: The requirement for the separator line even
in a bodyless article is inherited from MAIL, and
also distinguishes legitimately-bodyless articles
from articles accidentally truncated in the middle
of the headers.
Note that an article body is a sequence of lines terminated
by EOLs, not arbitrary binary data, and in particular it
MUST end with an EOL. However, relayers SHOULD treat the
body of an article as an uninterpreted sequence of octets
(except as mandated by changes of EOL representation and by
control-message processing) and SHOULD avoid imposing con-
straints on it. See also section 4.6.
4.3.2. Body Conventions
Although body lines can in principle be very long (see sec-
tion 4.6 for some discussion of length limits), posters
SHOULD restrict body line lengths to circa 70-75 characters.
On systems where text is conventionally stored with EOLs
only at paragraph breaks and other "hard return" points,
with software breaking lines as appropriate for display or
manipulation, posting agents SHOULD insert EOLs as necessary
so that posted articles comply with this restriction.
NOTE: News originated in environments where line
breaks in plain text files were supplied by the
user, not the software. Be this good or bad, much
reading-agent and posting-agent software assumes
that news articles follow this convention, so it
is often inconvenient to read or respond to arti-
cles which violate it. The "70-75" number comes
from the widespread use of display devices which
are 80 columns wide, and the desire to leave a bit
of margin for quoting etc. (see below).
Reading agents confronted with body lines much longer than
the available output-device width SHOULD break lines as
appropriate. Posters are warned that such breaks may not
occur exactly where the poster intends.
NOTE: "As appropriate" would typically include
breaking lines when supplying the text of an arti-
cle to be quoted in a reply or followup, something
that line-breaking reading agents often neglect to
do now.
Although styles vary widely, for plain text it is usual to
use no left margin, leave the right edge ragged, use a sin-
gle empty line to separate paragraphs, and employ normal
natural-language usage on matters such as upper/lowercase.
(In particular, articles SHOULD not be written entirely in
uppercase. In environments where posters have access only
to uppercase, posting agents SHOULD translate it to lower-
case.)
NOTE: Most people find substantial bodies of text
entirely in uppercase relatively hard to read,
while all-lowercase text merely looks slightly
odd. The common association of uppercase with
strong emphasis adds to this.
Tone of voice does not carry well in written text, and mis-
understandings are common when sarcasm, parody, or exaggera-
tion for humorous effect is attempted without explicit warn-
ing. It has become conventional to use the sequence ":-)",
which (on most output devices) resembles a rotated "smiley
face" symbol, as a marker for text not meant to be taken
literally, especially when humor is intended. This practice
aids communication and averts unintended ill-will; posters
are urged to use it. A variety of analogous sequences are
used with less-standardized meanings [Sanderson].
The order of arrival of news articles at a particular host
depends somewhat on transmission paths, and occasionally
articles are lost for various reasons. When responding to a
previous article, posters SHOULD not assume that all readers
understand the exact context. It is common to quote some of
the previous article to establish context. This SHOULD be
done by prefacing each quoted line (even if it is empty)
with the character ">". This will result in multiple levels
of ">" when quoted context itself contains quoted context.
NOTE: It may seem superfluous to put a prefix on
empty lines, but it simplifies implementation of
functions such as "skip all quoted text" in read-
ing agents.
Readability is enhanced if quoted text and new text are sep-
arated by an empty line.
Posters SHOULD edit quoted context to trim it down to the
minimum necessary. However, posting agents SHOULD not
attempt to enforce this by imposing overly-simplistic rules
like "no more than 50% of the lines should be quotes".
NOTE: While encouraging trimming is desirable, the
50% rule imposed by some old posting agents is
both inadequate and counterproductive. Posters do
not respond to it by being more selective about
quoting; they respond by padding short responses,
or by using different quoting styles to defeat
automatic analysis. The former adds unnecessary
noise and volume, while the latter also defeats
more useful forms of automatic analysis that read-
ing agents might wish to do.
NOTE: At the very least, if a minimum-unquoted
quota is being set, article bodies shorter than
(say) 20 lines, or perhaps articles which exceed
the quota by only a few lines, should be exempt.
This avoids the ridiculous situation of complain-
ing about a 5-line response to a 6-line quote.
NOTE: A more subtle posting-agent rule, suggested
for experimental use, is to reject articles that
appear to contain quoted signatures (see below).
This is almost certainly the result of a careless
poster not bothering to trim down quoted context.
Also, if a posting agent or followup agent pre-
sents an article template to the poster for edit-
ing, it really should take note of whether the
poster actually made any changes, and refrain from
posting an unmodified template.
Some followup agents supply "attribution" lines for quoted
context, indicating where it first appeared and under whose
name. When multiple levels of quoting are present and
quoted context is edited for brevity, "inner" attribution
lines are not always retained. The editing process is also
somewhat error-prone. Reading agents (and readers) are
warned not to assume that attributions are accurate.
UNRESOLVED ISSUE: Should a standard format for
attribution lines be defined? There is already
considerable diversity... but automatic news anal-
ysis would be substantially aided by a standard
convention.
Early difficulties in inferring return addresses from arti-
cle headers led to "signatures": short closing texts, auto-
matically added to the end of articles by posting agents,
identifying the poster and giving his network addresses etc.
If a poster or posting agent does append a signature to an
article, the signature SHOULD be preceded with a delimiter
line containing (only) two hyphens (ASCII 45) followed by
one blank (ASCII 32). Posting agents SHOULD limit the
length of signatures, since verbose excess bordering on
abuse is common if no restraint is imposed; 4 lines is a
common limit.
NOTE: While signatures are arguably a blemish,
they are a well-understood convention, and convey-
ing the same information in headers exposes it to
mangling and makes it rather less conspicuous. A
standard delimiter line makes it possible for
reading agents to handle signatures specially if
desired. (This is unfortunately hampered by
extensive misunderstanding of, and misuse of, the
delimiter.)
NOTE: The choice of delimiter is somewhat unfortu-
nate, since it relies on preservation of trailing
white space, but it is too well-established to
change. There is work underway to define a more
sophisticated signature scheme as part of MIME,
and this will presumably supersede the current
convention in due time.
NOTE: Four 75-column lines of signature text is
300 characters, which is ample to convey name and
mail-address information in all but the most
bizarre situations.
4.4. Characters And Character Sets
Header and body lines MAY contain any ASCII characters other
than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
NOTE: CR and LF are excluded because they clash
with common EOL conventions. NUL is excluded
because it clashes with the C end-of-string con-
vention, which is significant to most existing
news software. These three characters are
unlikely to be transmitted successfully.
However, posters SHOULD avoid using ASCII control characters
except for tab (ASCII 9), formfeed (ASCII 12), and backspace
(ASCII 8). Tab signifies sufficient horizontal white space
to reach the next of a set of fixed positions; posters are
warned that there is no standard set of positions, so tabs
should be avoided if precise spacing is essential. Formfeed
signifies a point at which a reading agent SHOULD pause and
await reader interaction before displaying further text.
Backspace SHOULD be used only for underlining, done by a
sequence of underscores (ASCII 95) followed by an equal num-
ber of backspaces, signifying that the same number of text
characters following are to be underlined. Posters are
warned that underlining is not available on all output
devices and is best not relied on for essential meaning.
Reading agents SHOULD recognize underlining and translate it
to the appropriate commands for devices that support it.
NOTE: Interpretation of almost all control charac-
ters is device-specific to some degree, and
devices differ. Tabs and underlining are sup-
ported, to some extent, by most modern devices and
reading agents, hence the cautious exemptions for
them. The underlining method is specified because
the inverse method, text and then underscores, is
tempting to the naive... but if sent unaltered to
a device that shows only the most recent of sev-
eral overstruck characters rather than a compos-
ite, the result can be utterly unreadable.
NOTE: A common interpretation of tab is that it is
a request to space forward to the next position
whose number is one more than a multiple of 8,
with positions numbered sequentially starting at
1. (So tab positions are 9, 17, 25, ...) Reading
agents not constrained by existing system conven-
tions might wish to use this interpretation.
NOTE: It will typically be necessary for a reading
agent to catch and interpret formfeed, not just
send it to the output device. The actions per-
formed by typical output devices on receiving a
formfeed are neither adequate for nor appropriate
to the pause-for-interaction meaning.
Cooperating subnets which wish to employ non-ASCII character
sets by using escape sequences (employing, e.g., ESC (ASCII
27), SO (ASCII 14), and SI (ASCII 15)) to alter the meaning
of superficially-ASCII characters MAY do so, but MUST use
MIME headers to alert reading agents to the particular char-
acter set(s) and escape sequences in use. A reading agent
SHOULD not pass such an escape sequence through, unaltered,
to the output device unless the agent confirms that the
sequence is one used to affect character sets and has reason
to believe that the device is capable of interpreting that
particular sequence properly.
NOTE: Cooperating-subnet organizers are warned
that some very old relayers strip certain control
characters out of articles they pass along. ESC
is known to be among the affected characters.
NOTE: There are now standard Internet encodings
for Japanese [rrr] and Vietnamese [rrr] in partic-
ular.
Articles MUST not contain any octet with value exceeding
127, i.e. any octet that is not an ASCII character.
NOTE: This rule, like others, may be relaxed by
unanimous consent of the members of a cooperating
subnet, provided suitable precautions are taken to
ensure that rule-violating articles do not leak
out of the subnet. (This has already been done in
many areas where ASCII is not adequate for the
local language(s).) Beware that articles contain-
ing non-ASCII octets in headers are a violation of
the MAIL specifications and are not valid MAIL
messages. MIME offers a way to encode non-ASCII
characters in ASCII for use in headers; see sec-
tion 4.5.
NOTE: While there is great interest in using 8-bit
character sets, not all software can yet handle
them correctly. Hence the restriction to cooper-
ating subnets. MIME encodings can be used to
transmit such characters while remaining within
the octet restriction.
In anticipation of the day when it is possible to use non-
ASCII characters safely anywhere, and to provide for the
(substantial) cooperating subnets that are already using
them, transmission paths SHOULD treat news articles as unin-
terpreted sequences of octets (except perhaps for transfor-
mations between EOL representations) and relayers SHOULD
treat non-ASCII characters in articles as ordinary charac-
ters.
NOTE: 8-bit enthusiasts are warned that not all
software conforms to these recommendations yet.
In particular, standard NNTP [rrr] is a 7-bit pro-
tocol, and there may be implementations which
enforce this rule. Be warned, also, that it will
never be safe to send raw binary data in the body
of news articles, because changes of EOL represen-
tation may (will!) corrupt it.
Except where cooperating subnets permit more direct
approaches, MIME [rrr] headers and encodings SHOULD be used
to transmit non-ASCII content using ASCII characters; see
section 4.5, appendix B, and the MIME RFCs for details. If
article content can be expressed in ASCII, it SHOULD be.
Failing that, the order of preference for character sets is
that described in MIME [rrr].
NOTE: Using the MIME facilities, it is possible to
transmit ANY character set, and ANY form of binary
data, using only ASCII characters. Equally impor-
tant, such articles are self-describing and the
reading agent can tell which octet-to-symbol map-
ping is intended! Designation of some preferred
character sets is intended to minimize the number
of character sets that a reading agent must under-
stand in order to display most articles properly.
Articles containing non-ASCII characters, articles using
ASCII characters (values 0 through 127) to refer to non-
ASCII symbols, and articles using escape sequences to shift
character sets SHOULD include MIME headers indicating which
character set(s) and conventions are being used, and MUST do
so unless such articles are strictly confined to a
cooperating subnet which has its own pre-agreed conventions.
MIME encodings are preferred over all these techniques. If
it comes to a relayer's attention that it is being asked to
pass an article using such techniques outward across what it
knows to be the boundary of such a cooperating subnet, it
MUST report this error to its administrator, and MAY refuse
to pass the article beyond the subnet boundary. If it does
pass the article, it MUST re-encode it with MIME encodings
to make it conform to this Draft.
NOTE: Such re-encoding is a non-trivial task, due
to MIME rules such as the prohibition of nested
encodings. It's not just a matter of pouring the
body through a simple filter.
Reading agents SHOULD note MIME headers and attempt to show
the reader the closest possible approximation to the
intended content. They SHOULD not just send the octets of
the article to the output device unaltered, unless there is
reason to believe that the output device will indeed inter-
pret them correctly. Reading agents MUST not pass ASCII
control characters or escape sequences, other than as dis-
cussed above, unaltered to the output device; only by chance
would the result be the desired one, and there is serious
potential for harmful side effects, either accidental or
malicious.
NOTE: Exactly what to do with unwanted control
characters/sequences depends on the philosophy of
the reading agent, but passing them straight to
the output device is almost always wrong. If the
reading agent wants to mark the presence of such a
character/sequence in circumstances where only
ASCII printable characters are available, trans-
lating it to "#" might be a suitable method; "#"
is a conspicuous character seldom used in normal
text.
NOTE: Reading agents should be aware that many old
output devices (or the transmission paths to them)
zero out the top bit of octets sent to them. This
can transform non-ASCII characters into ASCII con-
trol characters.
Followup agents MUST be careful to apply appropriate trans-
formations of representation to the outbound followup as
well as the inbound precursor. A followup to an article
containing non-ASCII material is very likely to contain non-
ASCII material itself.
4.5. Non-ASCII Characters In Headers
All octets found in headers MUST be ASCII characters. How-
ever, it is desirable to have a way of encoding non-ASCII
characters, especially in "human-readable" headers such as
Subject. MIME [rrr] provides a way to do this. Full
details may be found in the MIME specifications; herewith a
quick summary to alert software authors to the issues...
encoded-word = "=?" charset "?" encoding "?" codes "?="
charset = 1*tag-char
encoding = 1*tag-char
tag-char = <ASCII printable character except !()<>@,;:\"[]/?=>
codes = 1*code-char
code-char = <ASCII printable character except ?>
An encoded word is a sequence of ASCII printable characters
that specifies the character set, encoding method, and bits
of (potentially) non-ASCII characters. Encoded words are
allowed only in certain positions in certain headers. Spe-
cific headers impose restrictions on the content of encoded
words beyond that specified in this section. Posting agents
MUST ensure that any material resembling an encoded word
(complete with all delimiters), in a context where encoded
words may appear, really is an encoded word.
NOTE: The syntax is a bit ugly, but it was
designed to minimize chances of confusion with
legitimate header contents, and to satisfy diffi-
cult constraints on use within existing headers.
An encoded word MUST not be more than 75 octets long. Each
line of a header containing encoded word(s) MUST be at most
76 octets long, not counting the EOL.
NOTE: These limits are meant to bound the looka-
head needed to determine whether text that begins
"=?" is really an encoded word.
The details of charsets and encodings are defined by MIME
[rrr]; the sequence of preferred character sets is the same
as MIME's. Encoded words SHOULD not be used for content
expressible in ASCII.
When an encoded word is used, other than in a newsgroup name
(see section 5.5), it MUST be separated from any adjacent
non-space characters (including other encoded words) by
white space. Reading agents displaying the contents of
encoded words (as opposed to their encoded form) should
ignore white space adjacent to encoded words.
UNRESOLVED ISSUE: Should this section be deleted
entirely, or made much more terse? The material
is relevant, but too complex to discuss fully.
NOTE: The deletion of intervening white space per-
mits using multiple encoded words, implicitly con-
catenated by the deletion, to encode text that
will not fit within a single 75-character encoded
word.
Reading-agent implementors are warned that although this
Draft completely specifies where encoded words may appear in
the headers it defines, there are other headers (e.g. the
MIME Content-Description header) that MAY contain them.
4.6. Size Limits
Implementations SHOULD avoid fixed constraints on the sizes
of lines within an article and on the size of the entire
article.
Relayers SHOULD treat the body of an article as an uninter-
preted sequence of octets (except as mandated by changes of
EOL representation and processing of control messages), not
to be altered or constrained in any way.
If it is absolutely necessary for an implementation to
impose a limit on the length of header lines, body lines, or
header logical lines, that limit shall be at least 1000
octets, including EOL representations. Relayers and trans-
mission paths confronted with lines beyond their internal
limits (if any) MUST not simply inject EOLs at random
places; they MAY break headers (as described in 4.2.3) as a
last resort, and otherwise they MUST either pass the long
lines through unaltered, or refuse to pass the article at
all (see section 9.1 for further discussion).
NOTE: The limit here is essentially the same mini-
mum as that specified for SMTP mail in RFC 821
[rrr]. Implementors are warned that Path (see
section 5.6) and References (see section 6.5)
headers, in particular, often become several hun-
dred characters long, so 1000 is not an overly
generous limit.
All implementations MUST be able to handle an article
totalling at least 65,000 octets, including headers and EOL
representations, gracefully and efficiently. All implemen-
tations SHOULD be able to handle an article totalling at
least 1,000,000 (one million) octets, including headers and
EOL representations, gracefully and efficiently. "Grace-
fully and efficiently" is intended to preclude not only
failures, but also major loss of performance, serious prob-
lems in error recovery, or resource consumption beyond what
is reasonably necessary.
NOTE: The intent here is to prohibit lowering the
existing de-facto limit any further, while
strongly encouraging movement towards a higher
one. Actually, although improvements are desir-
able in some cases, much news software copes rea-
sonably well with very large articles. The same
cannot be said of the communications software and
protocols used to transmit news from one host to
another, especially when slow communications links
are involved. Occasional huge articles that
appear now (by accident or through ignorance) typ-
ically leave trails of failing software, system
problems, and irate administrators in their wake.
NOTE: It is intended that the successor to this
Draft will raise the "MUST" limit to 1,000,000 and
the "SHOULD" limit still further.
Posters SHOULD limit posted articles to at most 60,000
octets, including headers and EOL representations, unless
the articles are being posted only within a cooperating sub-
net which is known to be capable of handling larger articles
gracefully. Posting agents presented with a large article
SHOULD warn the poster and request confirmation.
NOTE: The difference between this and the earlier
"MUST" limit is margin for header growth, differ-
ing EOL representations, and transmission over-
heads.
NOTE: Disagreeable though these limits are, it is
a fact that in current networks, an article larger
than 64K (after header growth etc.) simply is not
transmitted reliably. Note also the comments
above on the trauma caused by single extremely-
large articles now; the problems are real and cur-
rent. These problems arguably should be fixed,
but this will not happen network-wide in the imme-
diate future. Hence the restriction of larger
articles to cooperating subnets, for now.
Posters using non-ASCII characters in their text MUST take
into account the overhead involved in MIME encoding, unless
the article's propagation will be entirely limited to a
cooperating subnet which does not use MIME encodings for
non-ASCII characters. For example, MIME base64 encoding
involves growth by a factor of approximately 4/3, so an
article which would likely have to use this encoding should
be at most about 45,000 octets before encoding.
Posters SHOULD use MIME "message/partial" conventions to
facilitate automatic reassembly of a large document split
into smaller pieces for posting. It is recommended that the
content identifier used should be a message ID, generated by
the same means as article message IDs (see section 5.3), and
that all parts should have a See-Also header (see section
6.16) giving the message IDs of at least the previous parts
and preferably all the parts.
NOTE: See-Also is more correct for this purpose
than References, although References is in common
use today (with less-formal reassembly arrange-
ments). MIME reassemblers should probably examine
articles suggested by References headers if See-
Also headers are not present to indicate the
whereabouts of the other parts of "mes-
sage/partial" articles.
To repeat: implementations SHOULD avoid fixed constraints on
the sizes of lines within an article and on the size of the
entire article.
4.7. Example
Here is a sample article:
From: jerry@eagle.ATT.COM (Jerry Schwarz)
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
Newsgroups: news.announce
Subject: Usenet Etiquette -- Please Read
Message-ID: <642@eagle.ATT.COM>
Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
Followup-To: news.misc
Expires: Wed, 19 Jan 1994 00:00:00 -0500
Organization: AT&T Bell Laboratories, Murray Hill
body
body
body
|