| Du bon usage de...
 
 Son-of-RFC 1036 News Article Format and Transmission
 
 1. Introduction
Network  news articles resemble mail messages but are broad-
cast to potentially-large audiences, using a flooding  algo-
rithm  that  propagates one copy to each interested host (or
groups thereof), typically stores only one  copy  per  host,
and  does  not require any central administration or system-
atic registration of interested users.  Network news  origi-
nated as the medium of communication for Usenet, circa 1980.
Since then Usenet has grown explosively, and  many  Internet
sites  participate  in it.  In addition, the news technology
is now in widespread use for other purposes, on the Internet
and elsewhere.
 
The  earliest  news  interchange used the so-called "A News"
article  format.   Shortly  thereafter,  an  article  format
vaguely  resembling  Internet  mail  was  devised  and  used
briefly.  Both of those  formats  are  completely  obsolete;
they  are  documented  in  appendix A for historical reasons
only.  With publication of RFC 850 [rrr] in 1983, news arti-
cles  came  to closely resemble Internet mail messages, with
some restrictions and some  additional  headers.   RFC  1036
[rrr]  in 1987 updated RFC 850 without making major changes.
 
In the intervening five years, the RFC 1036  article  format
has  proven  quite  satisfactory,  although minor extensions
appear desirable to match recent developments in areas  such
as  multi-media  mail.  RFC 1036 itself has not proven quite
so satisfactory.  It is often  rather  vague  and  does  not
address  some  issues  at  all;  this has caused significant
interoperability problems at times, and implementations have
diverged  somewhat.  Worse, although it was intended primar-
ily to document existing  practice,  it  did  not  precisely
match  existing  practice even at the time it was published,
and the deviations have grown since.
 
This Draft attempts to specify the format of  articles,  and
the  procedures  used  to exchange them and process them, in
sufficient detail to allow full interoperability.  In  addi-
tion,  some  tentative suggestions are made about directions
for future development, in an attempt to  avert  unnecessary
divergence  and  consequent loss of interoperability.  Major
extensions (e.g.  cryptographic  authentication)  that  need
significant  development effort are left to be undertaken as
independent efforts.
 
     NOTE: One question this all may raise is:  why  is
     there  no  News-Version header, analogous to MIME-
     Version, specifying a version number corresponding
     to  this specification?  The answer is: it doesn't
     appear  to  be  useful,  given  news's   backward-
     compatibility  constraints.   The  major  use of a
     version number  is  indicating  which  of  several
     INCOMPATIBLE  interpretations  is  relevant.   The
     impossibility of orchestrating any sort of  simul-
     taneous change over news's installed base makes it
     necessary to avoid such incompatible  changes  (as
     opposed  to extensions) entirely.  MIME has a ver-
     sion number mostly because it introduced incompat-
     ible  changes  to  the  interpretation  of several
     "Content-"  headers.   This  Draft   attempts   no
     changes  in interpretation and it appears doubtful
     that future Drafts will find it feasible to intro-
     duce any.
 
     UNRESOLVED  ISSUE:  Should  this  be reconsidered?
     Only if the header has SPECIFIC IDENTIFIABLE  uses
     today.  Otherwise it's just useless added bulk.
 
As  in  this  Draft's  predecessors, the exact means used to
transmit articles from one host to another is not specified.
NNTP  [rrr]  is probably the most common transmission method
on the Internet, but a number of others are known to  be  in
use,  including  the UUCP protocol [rrr] extensively used in
the early days of Usenet and still much used on its  fringes
today.
 
Several  of  the mechanisms described in this Draft may seem
somewhat strange or even bizarre at first reading.  As  with
Internet  mail, there is no reasonable possibility of updat-
ing the entire installed base of news software promptly,  so
interoperability  with  old  software  is  crucial  and will
remain so.  Compatibility with existing practice and robust-
ness  in  an  imperfect world necessarily take priority over
elegance.
 
 
 
 2. Definitions, Notations, and Conventions
 
 2.1. Textual Notations
Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
amended  by  RFC  1123  [rrr]".   (RFC 1123's amendments are
mostly relatively small, but they  are  not  insignificant.)
See  also  the  discussion  in  section 3 about this Draft's
relationship to MAIL.  "MIME" is short for  "RFCs  1341  and
1342" (or their updated replacements).
 
     UNRESOLVED ISSUE: Update these numbers.
 
"ASCII"  is  short  for "the ANSI X3.4 character set" [rrr].
While "ASCII" is often misused to refer to various character
sets  somewhat similar to X3.4, in this Draft, "ASCII" means
X3.4 and only X3.4.
 
     NOTE: The name is traditional (to the point  where
     the  ANSI standard sanctions it) even though it is
     no longer an acronym for the name of the standard.
 
     NOTE:  ASCII,  X3.4,  contains 128 characters, not
     all of them printable.  Character sets  with  more
     characters   are  not  ASCII,  although  they  may
     include it as a subset.
 
Certain words used to define the significance of  individual
requirements are capitalized.  "MUST" means that the item is
an absolute  requirement  of  the  specification.   "SHOULD"
means that the item is a strong recommendation: there may be
valid reasons to ignore it  in  unusual  circumstances,  but
this  should  be  done  only after careful study of the full
implications and a firm conclusion  that  it  is  necessary,
because  there are serious disadvantages to doing so.  "MAY"
means that the item is truly optional, and implementors  and
users  are warned that conformance is possible but not to be
relied on.
 
The term "compliant", applied to implementations etc., indi-
cates  satisfaction  of  all  relevant  "MUST"  and "SHOULD"
requirements.  The term "conditionally compliant"  indicates
satisfaction  of all relevant "MUST" requirements but viola-
tion of at least one relevant "SHOULD" requirement.
 
This Draft contains explanatory notes  using  the  following
format.   These  may be skipped by persons interested solely
in the content of the specification.   The  purpose  of  the
notes  is to explain why choices were made, to place them in
context, or to suggest possible implementation techniques.
 
     NOTE: While such explanatory notes may seem super-
     fluous  in  principle,  they  often help the less-
     than-omniscient reader grasp the  purpose  of  the
     specification and the constraints involved.  Given
     the limitations of natural language  for  descrip-
     tive  purposes, this improves the probability that
     implementors and users will  understand  the  true
     intent  of  the  specification  in cases where the
     wording is not entirely clear.
 
All numeric values are given  in  decimal  unless  otherwise
indicated.   Octets  are  assumed  to be unsigned values for
this purpose.  Large numbers are  written  using  the  North
American  convention, in which "," separates groups of three
digits but otherwise has no significance.
 
 
 2.2. Syntax Notation
Although the mechanisms specified  in  this  Draft  are  all
described  in prose, most are also described formally in the
modified BNF notation of RFC 822.  Implementors will need to
be  familiar  with  this  notation  to fully understand this
specification, and are referred to RFC 822  for  a  complete
explanation  of  the modified BNF notation.  Here is a brief
illustrative example:
 
     sentence  = clause *( punct clause ) "."
     punct     = ":" / ";"
     clause    = 1*word [ "(" clause ")" / "," 1*word ]
     word      = <any English word>
This defines a sentence as some clauses separated by  puncts
and  ended  by  a period, a punct as a colon or semicolon, a
clause as at least one <word> optionally followed by  either
a  parenthesized  clause  or  a  comma and at least one more
<word>, and a <word> as (informally) any English  word.   <>
are  used to enclose names when (and only when) distinguish-
ing them from surrounding text is useful.  The full form  of
the  repetition  notation  is <m>"*"<n><thing>, denoting <m>
through <n> repetitions of <thing>; <m>  defaults  to  zero,
<n>  to  infinity, and the "*" and <n> can be omitted if <m>
and <n> are equal, so 1*word is one or more  words,  1*5word
is one through five words, and 2word is exactly two words.
 
The  character  "\"  is not special in any way in this nota-
tion.
 
This Draft is intended  to  be  self-contained;  all  syntax
rules  used in it are defined within it, and a rule with the
same name as one found in MAIL does not necessarily have the
same  definition.   The lexical layer of MAIL is NOT, repeat
NOT, used in this  Draft,  and  its  presence  must  not  be
assumed;  notably,  this  Draft  spells out all places where
white space is permitted/required and all places where  con-
structs resembling MAIL comments can occur.
 
     NOTE:  News  parsers  historically  have been much
     less permissive than MAIL parsers.
 
 
 2.3. Definitions
The term "character set", wherever it is used in this Draft,
refers to a coded character set, in the sense of ISO charac-
ter set standardization work, and must not be misinterpreted
as meaning merely "a set of characters".
 
In this Draft, ASCII character 32 is referred to as "blank";
the word "space" has a more generic meaning.
 
An "article" is the unit of news, analogous to a MAIL  "mes-
sage".
 
A "poster" is a human being (or software equivalent) submit-
ting a  possibly-compliant  article  to  be  "posted":  made
available  for  reading  on  all relevant hosts.  A "posting
agent" is software that assists posters to prepare articles,
including  determining  whether the final article is compli-
ant, passing it on to a  relayer  for  posting  if  so,  and
returning  it  to  the poster with an explanation if not.  A
"relayer" is  software  which  receives  allegedly-compliant
articles  from  posting  agents and/or other relayers, files
copies in a "news database", and possibly passes  copies  on
to other relayers.
 
     NOTE:  While  the  same software may well function
     both as a relayer and as part of a posting  agent,
     the  two  functions are distinct and should not be
     confused.  The  posting  agent's  purpose  is  (in
     part) to validate an article, supply header infor-
     mation that can or should  be  supplied  automati-
     cally, and generally take reasonable actions in an
     attempt to transform the poster's submission  into
     a  compliant article.  The relayer's purpose is to
     move already-compliant articles around efficiently
     without damaging them.
 
A "reader" is a human being reading news articles.  A "read-
ing agent" is software which presents articles to a  reader.
 
     NOTE:  Informal usage often uses "reader" for both
     these meanings, but this  introduces  considerable
     potential  for  confusion and misunderstanding, so
     this Draft takes care to make the distinction.
 
A "newsgroup" is a single news  forum,  a  logical  bulletin
board,  having a name and nominally intended for articles on
a specific topic.  An article is "posted to" a single  news-
group  or  several newsgroups.  When an article is posted to
more than one newsgroup, it is said  to  be  "cross-posted";
note that this differs from posting the same text as part of
each of several articles, one per newsgroup.  A  "hierarchy"
is  the set of all newsgroups whose names share a first com-
ponent (see the name syntax in section 5.5).
 
A newsgroup may be "moderated", in  which  case  submissions
are  not  posted  directly,  but mailed to a "moderator" for
consideration and possible posting.   Moderators  are  typi-
cally  human but may be implemented partially or entirely in
software.
 
A "followup" is an article containing a response to the con-
tents of an earlier article (the followup's "precursor").  A
"followup agent" is a combination of reading agent and post-
ing agent that aids in the preparation and posting of a fol-
lowup.
 
Text  comparisons  are  "case-sensitive"  if  they  consider
uppercase  letters  (e.g. "A") different from lowercase let-
ters (e.g. "a"), and "case-insensitive" if letters differing
only  in  case  (e.g. "A" and "a") are considered identical.
Categories of text are said to be case-(in)sensitive if com-
parisons of such texts to others are case-(in)sensitive.
 
A  "cooperating  subnet"  is  a set of news-exchanging hosts
which is sufficiently well-coordinated (typically via a cen-
tral  administration of some sort) that stronger assumptions
can be made about hosts in the set than about news hosts  in
general.  This is typically used to relax restrictions which
are otherwise required for worst-case interoperability; mem-
bers  of  a cooperating subnet MAY interchange articles that
do not conform to this Draft's specifications, provided  all
members  have  agreed  to this and provided the articles are
not permitted to leak out of the subnet.  The word  "subnet"
is  used to emphasize that a cooperating subnet is typically
not an isolated universe; care must be  taken  that  traffic
leaving  the  subnet  complies  with the restrictions of the
larger net, not just those of the cooperating subnet.
 
A "message ID" is a unique identifier for an  article,  usu-
ally supplied by the posting agent which posted it.  It dis-
tinguishes the article from every other article ever  posted
anywhere (in theory).  Articles with the same message ID are
treated as identical copies of the same article even if they
are not in fact identical.
 
A  "gateway"  is  software  which receives news articles and
converts them to messages of some other kind (e.g. mail to a
mailing list), or vice-versa; in essence it is a translating
relayer that straddles boundaries between different  methods
of  message  exchange.   The  most  common  type  of gateway
connects newsgroup(s) to mailing list(s),  either  unidirec-
tionally  or  bidirectionally,  but  there are also gateways
between news networks using this  Draft's  news  format  and
those using other formats.
 
A  "control  message"  is an article which is marked as con-
taining control information; a  relayer  receiving  such  an
article  will  (subject  to  permissions  etc.) take actions
beyond just filing and passing on the article.
 
     NOTE: "Control article" would be  more  consistent
     terminology, but "control message" is already well
     established.
 
An article's "reply address" is the address to which  mailed
replies  should  be  sent.  This is the address specified in
the article's From header (see section 5.2), unless it  also
has a Reply-To header (see section 6.3).
 
The  notation  (e.g.)  "(ASCII  17)"  following a name means
"this name refers to the ASCII character having  value  17".
An  "ASCII printable character" is an ASCII character in the
range 33-126.  An "ASCII  control  character"  is  an  ASCII
character  in  the  range  0-31, or the character DEL (ASCII
127).  A "non-ASCII character" is a character having a value
exceeding 127.
 
     NOTE: Blank is neither an "ASCII printable charac-
     ter" nor an "ASCII control character".
 
 
 2.4. End Of Line
How the end of a text line is  represented  depends  on  the
context  and  the implementation.  For Internet transmission
via protocols such as SMTP [rrr], an  end-of-line  is  a  CR
(ASCII  13)  followed  by an LF (ASCII 10).  ISO C [rrr] and
many modern operating systems indicate  end-of-line  with  a
single  character,  typically  ASCII LF (aka "newline"), and
this is the normal convention when news is  transmitted  via
UUCP.  A variety of other methods are in use, including out-
of-band methods in which there is no specific character that
means end-of-line.
 
This Draft does not constrain how end-of-line is represented
in news, except that characters other than CR  and  LF  MUST
not  be  usurped  for  use  in  end-of-line representations.
Also, obviously, all software dealing with a particular copy
of  an  article  must  agree  on  the convention to be used.
"EOL" is used to mean "whatever  end-of-line  representation
is  appropriate";  it  is  not  necessarily  a  character or
sequence of characters.
 
     NOTE: If faced with picking an EOL  representation
     in the absence of other constraints, use of a sin-
     gle character simplifies processing, and the ASCII
     standard  [rrr] specifies that if one character is
     to be used for  this  purpose,  it  should  be  LF
     (ASCII 10).
 
     NOTE:  Inside  MIME encodings, use of the Internet
     canonical EOL representation (CR followed  by  LF)
     is mandatory.  See [rrr].
 
 
 2.5. Case-Sensitivity
Text  in  newsgroup  names, header parameters, etc. is case-
sensitive unless stated otherwise.
 
     NOTE: This is at  variance  with  MAIL,  which  is
     case-insensitive  unless  stated otherwise, but is
     consistent  with  news  historical  practice   and
     existing news software.  See the comments on back-
     ward compatibility in section 1.
 
 
 2.6. Language
Various constant strings in this Draft, such as header names
and  month  names,  are derived from English words.  Despite
their derivation, these words do NOT change when the  poster
or  reader employing them is interacting in a language other
than English.  Posting and reading agents  SHOULD  translate
as  appropriate  in  their  interaction  with  the poster or
reader, but the forms that actually appear in  articles  are
always the English-derived ones defined in this Draft.
 
 
 
 3. Relation To MAIL (RFC 822 etc.)
The  primary  intent of this Draft is to completely describe
the news article format as a subset of MAIL's message format
augmented by some new headers.  Unless explicitly noted oth-
erwise, the intent throughout is that an article  MUST  also
be a valid MAIL message.
 
     NOTE:  Despite  obvious  similarities between news
     and mail, opinions vary on whether it is  possible
     or  desirable to unify them into a single service.
     However, it is unquestionably  both  possible  and
     useful to employ some of the same tools for manip-
     ulating both mail messages and news  articles,  so
     there  is specific advantage to be had in defining
     them compatibly.  Furthermore, there is no  appar-
     ent need to re-invent the wheel when slight exten-
     sions to an existing definition will suffice.
 
Given that this Draft  attempts  to  be  self-contained,  it
inevitably  contains  considerable repetition of information
found in MAIL.  This raises the possibility of unintentional
conflicts.  Unless specifically noted otherwise, any wording
in this Draft which  permits  behavior  that  is  not  MAIL-
compliant  is  erroneous  and should be followed only to the
extent that the result remains compliant with MAIL.
 
     NOTE: RFC 1036 said "where this standard conflicts
     with  [RFC 822], RFC-822 should be considered cor-
     rect and this standard in  error".   Taken  liter-
     ally, this was obviously incorrect, since RFC 1036
     imposed a number of restrictions not found in  RFC
     822.   The  intent,  however,  was  reasonable: to
     indicate  that  UNINTENTIONAL   differences   were
     errors in RFC 1036.
 
Implementors and users should note that MAIL is deliberately
an extensible standard, and most extensions devised for mail
are  also relevant to (and compatible with) news.  Note par-
ticularly MIME [rrr],  summarized  briefly  in  appendix  B,
which extends MAIL in a number of useful ways that are defi-
nitely relevant to news.   Also  of  note  is  the  work  in
progress  on  reconciling  PEM (Privacy Enhanced Mail, which
defines extensions for  authentication  and  security)  with
MIME, after which this may also be relevant to news.
 
     UNRESOLVED ISSUE: Update the MIME/PEM information.
 
Similarly, descriptions here of MIME  facilities  should  be
considered  correct  only  to  the  extent  that they do not
require or legitimize practices  that  would  violate  those
RFCs.   (Note that this Draft does extend the application of
some MIME facilities, but this is an extension  rather  than
an alteration.)
 
 
 
 4. Basic Format
 
 4.1. Overall Syntax
The overall syntax of a news article is:
 
     article         = 1*header separator body
     header          = start-line *continuation
     start-line      = header-name ":" space [ nonblank-text ] eol
     continuation    = space nonblank-text eol
     header-name     = 1*name-character *( "-" 1*name-character )
     name-character  = letter / digit
     letter          = <ASCII letter A-Z or a-z>
     digit           = <ASCII digit 0-9>
     separator       = eol
     body            = *( [ nonblank-text / space ] eol )
     eol             = <EOL>
     nonblank-text   = [ space ] text-character *( space-or-text )
     text-character  = <any ASCII character except NUL (ASCII 0),
                         HT (ASCII 9), LF (ASCII 10), CR (ASCII 13),
                         or blank (ASCII 32)>
     space           = 1*( <HT (ASCII 9)> / <blank (ASCII 32)> )
     space-or-text   = space / text-character
An  article consists of some headers followed by a body.  An
empty line separates the two.  The  headers  contain  struc-
tured information about the article and its transmission.  A
header begins with a header name identifying it, and can  be
continued  onto  subsequent lines by beginning the continua-
tion line(s) with white space.   (Note  that  section  4.2.3
adds some restrictions to the header syntax indicated here.)
The body is largely-unstructured text  significant  only  to
the poster and the readers.
 
     NOTE:  Terminology here follows the current custom
     in the news community, rather than the  MAIL  con-
     vention  of  (sometimes) referring to what is here
     called a "header" as a "header field" or  "field".
 
Note that the separator line must be truly empty, not just a
line containing white space.  Further empty lines  following
it  are  part  of the body, as are empty lines at the end of
the article.
 
     NOTE: Some systems  make  no  distinction  between
     empty lines and lines consisting entirely of white
     space;  indeed,  some  systems  cannot   represent
     entirely  empty  lines.  The grammar's requirement
     that header continuation lines contain some print-
     able  text is meant to ensure that the empty/space
     distinction cannot confuse identification  of  the
     separator line.
 
     NOTE:  It  is tempting to authorize posting agents
     to strip empty lines at the beginning and  end  of
     the  body,  but such empty lines could possibly be
     part of a preformatted document.
 
Implementors are warned that trailing white  space,  whether
alone  on  the  line or not, MAY be significant in the body,
notably in early versions of  the  "uuencode"  encoding  for
binary  data.  Trailing white space MUST be preserved unless
the article is known to have originated within a cooperating
subnet  that  avoids using significant trailing white space,
and SHOULD be preserved regardless.   Posters  SHOULD  avoid
using  conventions  or  encodings  which make trailing white
space significant;  for  encoding  of  binary  data,  MIME's
"base64"  encoding  is recommended.  Implementors are warned
that ISO C implementations  are  not  required  to  preserve
trailing  white space, and special precautions may be neces-
sary in implementations which do not.
 
     NOTE: Unfortunately, the signature-delimiter  con-
     vention (described in section 4.3.2) does use sig-
     nificant trailing white space.  It's too  late  to
     fix  this;  there  is work underway on defining an
     organized signature convention as  part  of  MIME,
     which is a preferable solution in the long run.
 
Posters  are warned that some very old relayer software mis-
behaves when the first non-empty line  of  an  article  body
begins with white space.
 
4.2. Headers
4.2.1. Names and Contents
Despite  the  restrictions  on header-name syntax imposed by
the grammar, relayers and  reading  agents  SHOULD  tolerate
header  names containing any ASCII printable character other
than colon (":", ASCII 58).
 
     NOTE: MAIL header  names  can  contain  any  ASCII
     printable  character (other than colon) in theory,
     but in practice, arbitrary header names are  known
     to  cause trouble for some news software.  Section
     4.1's restriction to alphanumeric sequences  sepa-
     rated by hyphens is believed to permit all widely-
     used header names without causing problems for any
     widely-used  software.   Software  is nevertheless
     encouraged to cope correctly with the  full  range
     of  possibilities,  since aberrations are known to
     occur.
 
Relayers MUST disregard headers not described in this  Draft
(that  is,  with  header names not mentioned in this Draft),
and pass them on unaltered.
 
Posters wishing to convey non-standard information in  head-
ers  SHOULD  use header names beginning with "X-".  No stan-
dard header name will ever be of this form.  Reading  agents
SHOULD  ignore  "X-"  headers,  or  at least treat them with
great care.
 
The order of headers in an article is not significant.  How-
ever, posting agents are encouraged to put mandatory headers
(see section 5) first, followed  by  optional  headers  (see
section 6), followed by headers not defined in this Draft.
 
     NOTE:  While  relayers  and reading agents must be
     prepared to handle any order, having the  signifi-
     cant  headers (the precise definition of "signifi-
     cant" depends on  context)  first  can  noticeably
     improve  efficiency,  especially in memory-limited
     environments where it is difficult to buffer up an
     arbitrary  quantity of headers while searching for
     the few that matter.
 
Header names are case-insensitive.   There  is  a  preferred
case  convention,  which  posters  and posting agents SHOULD
use: each hyphen-separated "word" has its initial letter (if
any)  in  uppercase  and  the rest in lowercase, except that
some abbreviations have all letters  uppercase  (e.g.  "Mes-
sage-ID"  and "MIME-Version").  The forms used in this Draft
are the preferred forms for the  headers  described  herein.
Relayers  and  reading agents are warned that articles might
not obey this convention.
 
     NOTE: Although software must be prepared  for  the
     possibility  of random use of case in header names
     (and other case-independent text), establishing  a
     preferred  convention reduces pointless diversity,
     and may permit optimized software that  looks  for
     the  preferred  forms  before  resorting  to less-
     efficient case-insensitive searches.
 
In general, a header can consist of several lines, with each
continuation line beginning with white space.  The EOLs pre-
ceding continuation lines are ignored when processing such a
header, effectively combining the start-line and the contin-
uations into a single logical line.  The logical line,  less
the  header  name,  colon, and any white space following the
colon, is the "header content".
 
 4.2.2. Undesirable Headers
A header whose content is empty  is  said  to  be  an  empty
header.   Relayers  and  reading  agents SHOULD not consider
presence or absence of an empty header to alter  the  seman-
tics  of  an  article  (although  syntactic  rules,  such as
requirements that certain header names appear at  most  once
in  an  article,  MUST  still be satisfied).  Posting agents
SHOULD delete empty headers  from  articles  before  posting
them.Headers  that merely state defaults explicitly (e.g., a Fol-
lowup-To header with the  same  content  as  the  Newsgroups
header,   or   a  MIME  Content-Type  header  with  contents
"text/plain; charset=us-ascii") or  state  information  that
reading  agents  can  typically  determine easily themselves
(e.g. the length of the body in octets) are redundant,  con-
veying no information whatsoever.  Headers that state infor-
mation which cannot possibly be of use to a significant num-
ber  of relayers, reading agents, or readers (e.g., the name
of the software package used as the posting agent) are  use-
less and pointless.  Posters and posting agents SHOULD avoid
including redundant or useless headers in articles.
 
     NOTE: Information that someone,  somewhere,  might
     someday  find useful is best omitted from headers.
     (There's quite enough of it  in  article  bodies.)
     Headers  should contain information of known util-
     ity only.  This is not meant to preclude inclusion
     of  information  primarily meant for news-software
     debugging, but such information should be included
     only  if there is real reason, preferably based on
     experience, to suspect that it  may  be  genuinely
     useful.  Articles passing through gateways are the
     only obvious case  where  inclusion  of  debugging
     information appears clearly legitimate.  (See sec-
     tion 10.1.)
 
     NOTE: A useful rule of thumb for  software  imple-
     mentors  is:  "if  I had to pay a dollar a day for
     the transmission of this  header,  would  I  still
     think it worthwhile?".
 
4.2.3. White Space and Continuations
The  colon  following the header name on the start-line MUST
be followed by white space, even if the header is empty.  If
the  header  is not empty, at least some of the content MUST
appear on the start-line.  Posting agents MUST enforce these
restrictions,  but  relayers (etc.) SHOULD accept even arti-
cles that violate them.
 
     NOTE: MAIL does not require white space after  the
     colon,  but  it  is  usual.  RFC 1036 required the
     white space,  even  in  empty  headers,  and  some
     existing   software  demands  it.   In  MAIL,  and
     arguably in RFC  1036  (although  the  wording  is
     vague), it is technically legitimate for the white
     space to be part of  a  continuation  line  rather
     than the start-line, but not all existing software
     will accept  this.   Deleting  empty  headers  and
     placing some content on the start-line avoids this
     issue...  which  is  desirable  because   trailing
     blanks,  easily  deleted by accident, are best not
     made significant in headers.
 
In general, posters and  posting  agents  SHOULD  use  blank
(ASCII  32), not tab (ASCII 9), where white space is desired
in headers.  Existing software does not consistently  accept
tab  as  synonymous with blank in all contexts.  In particu-
lar, RFC 1036 appeared to specify that the character immedi-
ately  following  the colon after a header name was required
to be a blank, and some news software insists  on  that,  so
this  character MUST be a blank.  Again, posting agents MUST
enforce these restrictions but relayers SHOULD be more  tol-
erant.
 
Since  the white space beginning a continuation line remains
a part of the logical line, headers  can  be  "broken"  into
multiple  lines  only at white space.  Posting agents SHOULD
not break headers unnecessarily.  Relayers  SHOULD  preserve
existing header breaks, and SHOULD not introduce new breaks.
Breaking headers SHOULD be a last resort; relayers and read-
ing agents SHOULD handle long header lines gracefully.  (See
the discussion of size limits in section 4.6.)
 
 4.3. Body
Although the article body is unstructured for  most  of  the
purposes  of  this  Draft, structure MAY be imposed on it by
other means, notably MIME headers (see appendix B).
 
4.3.1. Body Format Issues
The body of an article MAY be empty, although posting agents
SHOULD  consider this an error condition (meriting returning
the article to the poster for revision).   A  posting  agent
which does not reject such an article SHOULD issue a warning
message to the poster and supply  a  non-empty  body.   Note
that  the separator line MUST be present even if the body is
empty.
 
     NOTE: An empty body is  probably  a  poster  error
     except, arguably, for some control messages... and
     even they really ought to have a  body  explaining
     the  reason  for  the  control  message.  Some old
     reading agents are known to generate empty  bodies
     for  "cancel"  control messages, so posting agents
     might opt not to reject body-less articles in such
     cases  (although  it  would  be  better to fix the
     reading agents to request a body).  However,  some
     existing  news software is known to react badly to
     body-less articles, hence the request for  posting
     agents to insert a body in such cases.
 
     NOTE:  A possible posting-agent-supplied body text
     (already used by one widespread posting agent)  is
     "This  article  was  probably generated by a buggy
     news reader.".  (The use of "reader" to  refer  to
     the  reading  agent  is traditional, although this
     Draft uses more precise terminology.)
 
     NOTE: The requirement for the separator line  even
     in  a bodyless article is inherited from MAIL, and
     also distinguishes legitimately-bodyless  articles
     from articles accidentally truncated in the middle
     of the headers.
 
Note that an article body is a sequence of lines  terminated
by  EOLs,  not  arbitrary  binary data, and in particular it
MUST end with an EOL.  However, relayers  SHOULD  treat  the
body  of  an  article as an uninterpreted sequence of octets
(except as mandated by changes of EOL representation and  by
control-message  processing)  and SHOULD avoid imposing con-
straints on it.  See also section 4.6.
 
4.3.2. Body Conventions
Although body lines can in principle be very long (see  sec-
tion  4.6  for  some  discussion  of length limits), posters
SHOULD restrict body line lengths to circa 70-75 characters.
On  systems  where  text  is conventionally stored with EOLs
only at paragraph breaks and  other  "hard  return"  points,
with  software  breaking lines as appropriate for display or
manipulation, posting agents SHOULD insert EOLs as necessary
so that posted articles comply with this restriction.
 
     NOTE:  News  originated in environments where line
     breaks in plain text files were  supplied  by  the
     user, not the software.  Be this good or bad, much
     reading-agent and posting-agent  software  assumes
     that  news  articles follow this convention, so it
     is often inconvenient to read or respond to  arti-
     cles  which  violate it.  The "70-75" number comes
     from the widespread use of display  devices  which
     are 80 columns wide, and the desire to leave a bit
     of margin for quoting etc. (see below).
 
Reading agents confronted with body lines much  longer  than
the  available  output-device  width  SHOULD  break lines as
appropriate.  Posters are warned that such  breaks  may  not
occur exactly where the poster intends.
 
     NOTE:  "As  appropriate"  would  typically include
     breaking lines when supplying the text of an arti-
     cle to be quoted in a reply or followup, something
     that line-breaking reading agents often neglect to
     do now.
 
Although  styles  vary widely, for plain text it is usual to
use no left margin, leave the right edge ragged, use a  sin-
gle  empty  line  to  separate paragraphs, and employ normal
natural-language usage on matters such  as  upper/lowercase.
(In  particular,  articles SHOULD not be written entirely in
uppercase.  In environments where posters have  access  only
to  uppercase,  posting agents SHOULD translate it to lower-
case.)
 
     NOTE: Most people find substantial bodies of  text
     entirely  in  uppercase  relatively  hard to read,
     while all-lowercase  text  merely  looks  slightly
     odd.   The  common  association  of uppercase with
     strong emphasis adds to this.
 
Tone of voice does not carry well in written text, and  mis-
understandings are common when sarcasm, parody, or exaggera-
tion for humorous effect is attempted without explicit warn-
ing.   It has become conventional to use the sequence ":-)",
which (on most output devices) resembles a  rotated  "smiley
face"  symbol,  as  a  marker for text not meant to be taken
literally, especially when humor is intended.  This practice
aids  communication  and averts unintended ill-will; posters
are urged to use it.  A variety of analogous  sequences  are
used with less-standardized meanings [Sanderson].
 
The  order  of arrival of news articles at a particular host
depends somewhat on  transmission  paths,  and  occasionally
articles are lost for various reasons.  When responding to a
previous article, posters SHOULD not assume that all readers
understand the exact context.  It is common to quote some of
the previous article to establish context.  This  SHOULD  be
done  by  prefacing  each  quoted line (even if it is empty)
with the character ">".  This will result in multiple levels
of ">" when quoted context itself contains quoted context.
 
     NOTE:  It  may seem superfluous to put a prefix on
     empty lines, but it simplifies  implementation  of
     functions  such as "skip all quoted text" in read-
     ing agents.
 
Readability is enhanced if quoted text and new text are sep-
arated by an empty line.
 
Posters  SHOULD  edit  quoted context to trim it down to the
minimum  necessary.   However,  posting  agents  SHOULD  not
attempt  to enforce this by imposing overly-simplistic rules
like "no more than 50% of the lines should be quotes".
 
     NOTE: While encouraging trimming is desirable, the
     50%  rule  imposed  by  some old posting agents is
     both inadequate and counterproductive.  Posters do
     not  respond  to  it by being more selective about
     quoting; they respond by padding short  responses,
     or  by  using  different  quoting styles to defeat
     automatic analysis.  The former  adds  unnecessary
     noise  and  volume,  while the latter also defeats
     more useful forms of automatic analysis that read-
     ing agents might wish to do.
 
     NOTE:  At  the  very  least, if a minimum-unquoted
     quota is being set, article  bodies  shorter  than
     (say)  20  lines, or perhaps articles which exceed
     the quota by only a few lines, should  be  exempt.
     This  avoids the ridiculous situation of complain-
     ing about a 5-line response to a 6-line quote.
 
     NOTE: A more subtle posting-agent rule,  suggested
     for  experimental  use, is to reject articles that
     appear to contain quoted signatures  (see  below).
     This  is almost certainly the result of a careless
     poster not bothering to trim down quoted  context.
     Also,  if  a  posting agent or followup agent pre-
     sents an article template to the poster for  edit-
     ing,  it  really  should  take note of whether the
     poster actually made any changes, and refrain from
     posting an unmodified template.
 
Some  followup  agents supply "attribution" lines for quoted
context, indicating where it first appeared and under  whose
name.   When  multiple  levels  of  quoting  are present and
quoted context is edited for  brevity,  "inner"  attribution
lines  are not always retained.  The editing process is also
somewhat error-prone.   Reading  agents  (and  readers)  are
warned not to assume that attributions are accurate.
 
     UNRESOLVED  ISSUE:  Should  a  standard format for
     attribution lines be defined?   There  is  already
     considerable diversity... but automatic news anal-
     ysis would be substantially aided  by  a  standard
     convention.
 
Early  difficulties in inferring return addresses from arti-
cle headers led to "signatures": short closing texts,  auto-
matically  added  to  the end of articles by posting agents,
identifying the poster and giving his network addresses etc.
If  a  poster or posting agent does append a signature to an
article, the signature SHOULD be preceded with  a  delimiter
line  containing  (only)  two hyphens (ASCII 45) followed by
one blank (ASCII  32).   Posting  agents  SHOULD  limit  the
length  of  signatures,  since  verbose  excess bordering on
abuse is common if no restraint is imposed;  4  lines  is  a
common limit.
 
     NOTE:  While  signatures  are  arguably a blemish,
     they are a well-understood convention, and convey-
     ing  the same information in headers exposes it to
     mangling and makes it rather less conspicuous.   A
     standard  delimiter  line  makes  it  possible for
     reading agents to handle signatures  specially  if
     desired.    (This  is  unfortunately  hampered  by
     extensive misunderstanding of, and misuse of,  the
     delimiter.)
 
     NOTE: The choice of delimiter is somewhat unfortu-
     nate, since it relies on preservation of  trailing
     white  space,  but  it  is too well-established to
     change.  There is work underway to define  a  more
     sophisticated  signature  scheme  as part of MIME,
     and this will  presumably  supersede  the  current
     convention in due time.
 
     NOTE:  Four  75-column  lines of signature text is
     300 characters, which is ample to convey name  and
     mail-address  information  in  all  but  the  most
     bizarre situations.
 
4.4. Characters And Character Sets
Header and body lines MAY contain any ASCII characters other
than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
 
     NOTE:  CR  and  LF are excluded because they clash
     with common  EOL  conventions.   NUL  is  excluded
     because  it  clashes with the C end-of-string con-
     vention, which is  significant  to  most  existing
     news   software.    These   three  characters  are
     unlikely to be transmitted successfully.
 
However, posters SHOULD avoid using ASCII control characters
except for tab (ASCII 9), formfeed (ASCII 12), and backspace
(ASCII 8).  Tab signifies sufficient horizontal white  space
to  reach  the next of a set of fixed positions; posters are
warned that there is no standard set of positions,  so  tabs
should be avoided if precise spacing is essential.  Formfeed
signifies a point at which a reading agent SHOULD pause  and
await  reader  interaction  before  displaying further text.
Backspace SHOULD be used only for  underlining,  done  by  a
sequence of underscores (ASCII 95) followed by an equal num-
ber of backspaces, signifying that the same number  of  text
characters  following  are  to  be  underlined.  Posters are
warned that underlining  is  not  available  on  all  output
devices  and  is  best  not relied on for essential meaning.
Reading agents SHOULD recognize underlining and translate it
to the appropriate commands for devices that support it.
 
     NOTE: Interpretation of almost all control charac-
     ters  is  device-specific  to  some  degree,   and
     devices  differ.   Tabs  and  underlining are sup-
     ported, to some extent, by most modern devices and
     reading  agents, hence the cautious exemptions for
     them.  The underlining method is specified because
     the  inverse method, text and then underscores, is
     tempting to the naive... but if sent unaltered  to
     a  device  that shows only the most recent of sev-
     eral overstruck characters rather than  a  compos-
     ite, the result can be utterly unreadable.
 
     NOTE: A common interpretation of tab is that it is
     a request to space forward to  the  next  position
     whose  number  is  one  more than a multiple of 8,
     with positions numbered sequentially  starting  at
     1.  (So tab positions are 9, 17, 25, ...)  Reading
     agents not constrained by existing system  conven-
     tions might wish to use this interpretation.
 
     NOTE: It will typically be necessary for a reading
     agent to catch and interpret  formfeed,  not  just
     send  it  to  the output device.  The actions per-
     formed by typical output devices  on  receiving  a
     formfeed  are neither adequate for nor appropriate
     to the pause-for-interaction meaning.
 
Cooperating subnets which wish to employ non-ASCII character
sets  by using escape sequences (employing, e.g., ESC (ASCII
27), SO (ASCII 14), and SI (ASCII 15)) to alter the  meaning
of  superficially-ASCII  characters  MAY do so, but MUST use
MIME headers to alert reading agents to the particular char-
acter  set(s)  and escape sequences in use.  A reading agent
SHOULD not pass such an escape sequence through,  unaltered,
to  the  output  device  unless  the agent confirms that the
sequence is one used to affect character sets and has reason
to  believe  that the device is capable of interpreting that
particular sequence properly.
 
     NOTE:  Cooperating-subnet  organizers  are  warned
     that  some very old relayers strip certain control
     characters out of articles they pass  along.   ESC
     is known to be among the affected characters.
 
     NOTE:  There  are  now standard Internet encodings
     for Japanese [rrr] and Vietnamese [rrr] in partic-
     ular.
 
Articles  MUST  not  contain  any octet with value exceeding
127, i.e. any octet that is not an ASCII character.
 
     NOTE: This rule, like others, may  be  relaxed  by
     unanimous  consent of the members of a cooperating
     subnet, provided suitable precautions are taken to
     ensure  that  rule-violating  articles do not leak
     out of the subnet.  (This has already been done in
     many  areas  where  ASCII  is not adequate for the
     local language(s).)  Beware that articles contain-
     ing non-ASCII octets in headers are a violation of
     the MAIL specifications and  are  not  valid  MAIL
     messages.   MIME  offers a way to encode non-ASCII
     characters in ASCII for use in headers;  see  sec-
     tion 4.5.
 
     NOTE: While there is great interest in using 8-bit
     character sets, not all software  can  yet  handle
     them  correctly.  Hence the restriction to cooper-
     ating subnets.  MIME  encodings  can  be  used  to
     transmit  such  characters  while remaining within
     the octet restriction.
 
In anticipation of the day when it is possible to  use  non-
ASCII  characters  safely  anywhere,  and to provide for the
(substantial) cooperating subnets  that  are  already  using
them, transmission paths SHOULD treat news articles as unin-
terpreted sequences of octets (except perhaps for  transfor-
mations  between  EOL  representations)  and relayers SHOULD
treat non-ASCII characters in articles as  ordinary  charac-
ters.
 
     NOTE:  8-bit  enthusiasts  are warned that not all
     software conforms to  these  recommendations  yet.
     In particular, standard NNTP [rrr] is a 7-bit pro-
     tocol, and  there  may  be  implementations  which
     enforce  this rule.  Be warned, also, that it will
     never be safe to send raw binary data in the  body
     of news articles, because changes of EOL represen-
     tation may (will!) corrupt it.
 
Except  where  cooperating  subnets   permit   more   direct
approaches,  MIME [rrr] headers and encodings SHOULD be used
to transmit non-ASCII content using  ASCII  characters;  see
section  4.5, appendix B, and the MIME RFCs for details.  If
article content can be expressed in  ASCII,  it  SHOULD  be.
Failing  that, the order of preference for character sets is
that described in MIME [rrr].
 
     NOTE: Using the MIME facilities, it is possible to
     transmit ANY character set, and ANY form of binary
     data, using only ASCII characters.  Equally impor-
     tant,  such  articles  are self-describing and the
     reading agent can tell which octet-to-symbol  map-
     ping  is  intended!  Designation of some preferred
     character sets is intended to minimize the  number
     of character sets that a reading agent must under-
     stand in order to display most articles  properly.
 
Articles  containing  non-ASCII  characters,  articles using
ASCII characters (values 0 through 127)  to  refer  to  non-
ASCII  symbols, and articles using escape sequences to shift
character sets SHOULD include MIME headers indicating  which
character set(s) and conventions are being used, and MUST do
so  unless  such  articles  are  strictly  confined   to   a
cooperating subnet which has its own pre-agreed conventions.
MIME encodings are preferred over all these techniques.   If
it  comes to a relayer's attention that it is being asked to
pass an article using such techniques outward across what it
knows  to  be  the boundary of such a cooperating subnet, it
MUST report this error to its administrator, and MAY  refuse
to  pass the article beyond the subnet boundary.  If it does
pass the article, it MUST re-encode it with  MIME  encodings
to make it conform to this Draft.
 
     NOTE:  Such re-encoding is a non-trivial task, due
     to MIME rules such as the  prohibition  of  nested
     encodings.   It's not just a matter of pouring the
     body through a simple filter.
 
Reading agents SHOULD note MIME headers and attempt to  show
the   reader  the  closest  possible  approximation  to  the
intended content.  They SHOULD not just send the  octets  of
the  article to the output device unaltered, unless there is
reason to believe that the output device will indeed  inter-
pret  them  correctly.   Reading  agents MUST not pass ASCII
control characters or escape sequences, other than  as  dis-
cussed above, unaltered to the output device; only by chance
would the result be the desired one, and  there  is  serious
potential  for  harmful  side  effects, either accidental or
malicious.
 
     NOTE: Exactly what to  do  with  unwanted  control
     characters/sequences  depends on the philosophy of
     the reading agent, but passing  them  straight  to
     the  output device is almost always wrong.  If the
     reading agent wants to mark the presence of such a
     character/sequence  in  circumstances  where  only
     ASCII printable characters are  available,  trans-
     lating  it  to "#" might be a suitable method; "#"
     is a conspicuous character seldom used  in  normal
     text.
 
     NOTE: Reading agents should be aware that many old
     output devices (or the transmission paths to them)
     zero out the top bit of octets sent to them.  This
     can transform non-ASCII characters into ASCII con-
     trol characters.
 
Followup  agents MUST be careful to apply appropriate trans-
formations of representation to  the  outbound  followup  as
well  as  the  inbound  precursor.  A followup to an article
containing non-ASCII material is very likely to contain non-
ASCII material itself.
 
4.5. Non-ASCII Characters In Headers
All  octets found in headers MUST be ASCII characters.  How-
ever, it is desirable to have a way  of  encoding  non-ASCII
characters,  especially  in "human-readable" headers such as
Subject.  MIME [rrr]  provides  a  way  to  do  this.   Full
details  may be found in the MIME specifications; herewith a
quick summary to alert software authors to the issues...
 
     encoded-word  = "=?" charset "?" encoding "?" codes "?="
     charset       = 1*tag-char
     encoding      = 1*tag-char
     tag-char      = <ASCII printable character except !()<>@,;:\"[]/?=>
     codes         = 1*code-char
     code-char     = <ASCII printable character except ?>
An encoded word is a sequence of ASCII printable  characters
that  specifies the character set, encoding method, and bits
of (potentially) non-ASCII characters.   Encoded  words  are
allowed  only in certain positions in certain headers.  Spe-
cific headers impose restrictions on the content of  encoded
words beyond that specified in this section.  Posting agents
MUST ensure that any material  resembling  an  encoded  word
(complete  with  all delimiters), in a context where encoded
words may appear, really is an encoded word.
 
     NOTE: The  syntax  is  a  bit  ugly,  but  it  was
     designed  to  minimize  chances  of confusion with
     legitimate header contents, and to satisfy  diffi-
     cult constraints on use within existing headers.
 
An  encoded word MUST not be more than 75 octets long.  Each
line of a header containing encoded word(s) MUST be at  most
76 octets long, not counting the EOL.
 
     NOTE:  These  limits are meant to bound the looka-
     head needed to determine whether text that  begins
     "=?" is really an encoded word.
 
The  details  of  charsets and encodings are defined by MIME
[rrr]; the sequence of preferred character sets is the  same
as  MIME's.   Encoded  words  SHOULD not be used for content
expressible in ASCII.
 
When an encoded word is used, other than in a newsgroup name
(see  section  5.5),  it MUST be separated from any adjacent
non-space characters  (including  other  encoded  words)  by
white  space.   Reading  agents  displaying  the contents of
encoded words (as opposed  to  their  encoded  form)  should
ignore white space adjacent to encoded words.
 
     UNRESOLVED  ISSUE:  Should this section be deleted
     entirely, or made much more terse?   The  material
     is relevant, but too complex to discuss fully.
 
     NOTE: The deletion of intervening white space per-
     mits using multiple encoded words, implicitly con-
     catenated  by  the  deletion,  to encode text that
     will not fit within a single 75-character  encoded
     word.
 
Reading-agent  implementors  are  warned  that although this
Draft completely specifies where encoded words may appear in
the  headers  it  defines, there are other headers (e.g. the
MIME Content-Description header) that MAY contain them.
 
4.6. Size Limits
Implementations SHOULD avoid fixed constraints on the  sizes
of  lines  within  an  article and on the size of the entire
article.
 
Relayers SHOULD treat the body of an article as an  uninter-
preted  sequence of octets (except as mandated by changes of
EOL representation and processing of control messages),  not
to be altered or constrained in any way.
 
If  it  is  absolutely  necessary  for  an implementation to
impose a limit on the length of header lines, body lines, or
header  logical  lines,  that  limit  shall be at least 1000
octets, including EOL representations.  Relayers and  trans-
mission  paths  confronted  with lines beyond their internal
limits (if any)  MUST  not  simply  inject  EOLs  at  random
places;  they MAY break headers (as described in 4.2.3) as a
last resort, and otherwise they MUST either  pass  the  long
lines  through  unaltered,  or refuse to pass the article at
all (see section 9.1 for further discussion).
 
     NOTE: The limit here is essentially the same mini-
     mum  as  that  specified  for SMTP mail in RFC 821
     [rrr].  Implementors are  warned  that  Path  (see
     section  5.6)  and  References  (see  section 6.5)
     headers, in particular, often become several  hun-
     dred  characters  long,  so  1000 is not an overly
     generous limit.
 
All implementations  MUST  be  able  to  handle  an  article
totalling  at least 65,000 octets, including headers and EOL
representations, gracefully and efficiently.  All  implemen-
tations  SHOULD  be  able  to handle an article totalling at
least 1,000,000 (one million) octets, including headers  and
EOL  representations,  gracefully  and efficiently.  "Grace-
fully and efficiently" is  intended  to  preclude  not  only
failures,  but also major loss of performance, serious prob-
lems in error recovery, or resource consumption beyond  what
is reasonably necessary.
 
     NOTE:  The intent here is to prohibit lowering the
     existing  de-facto  limit   any   further,   while
     strongly  encouraging  movement  towards  a higher
     one.  Actually, although improvements  are  desir-
     able  in some cases, much news software copes rea-
     sonably well with very large articles.   The  same
     cannot  be said of the communications software and
     protocols used to transmit news from one  host  to
     another, especially when slow communications links
     are  involved.   Occasional  huge  articles   that
     appear now (by accident or through ignorance) typ-
     ically leave trails of  failing  software,  system
     problems,  and irate administrators in their wake.
 
     NOTE: It is intended that the  successor  to  this
     Draft will raise the "MUST" limit to 1,000,000 and
     the "SHOULD" limit still further.
 
Posters SHOULD limit  posted  articles  to  at  most  60,000
octets,  including  headers  and EOL representations, unless
the articles are being posted only within a cooperating sub-
net which is known to be capable of handling larger articles
gracefully.  Posting agents presented with a  large  article
SHOULD warn the poster and request confirmation.
 
     NOTE:  The difference between this and the earlier
     "MUST" limit is margin for header growth,  differ-
     ing  EOL  representations,  and transmission over-
     heads.
 
     NOTE: Disagreeable though these limits are, it  is
     a fact that in current networks, an article larger
     than 64K (after header growth etc.) simply is  not
     transmitted  reliably.   Note  also  the  comments
     above on the trauma caused  by  single  extremely-
     large articles now; the problems are real and cur-
     rent.  These problems arguably  should  be  fixed,
     but this will not happen network-wide in the imme-
     diate future.  Hence  the  restriction  of  larger
     articles to cooperating subnets, for now.
 
Posters  using  non-ASCII characters in their text MUST take
into account the overhead involved in MIME encoding,  unless
the  article's  propagation  will  be  entirely limited to a
cooperating subnet which does not  use  MIME  encodings  for
non-ASCII  characters.   For  example,  MIME base64 encoding
involves growth by a factor  of  approximately  4/3,  so  an
article  which would likely have to use this encoding should
be at most about 45,000 octets before encoding.
 
Posters SHOULD use  MIME  "message/partial"  conventions  to
facilitate  automatic  reassembly  of a large document split
into smaller pieces for posting.  It is recommended that the
content identifier used should be a message ID, generated by
the same means as article message IDs (see section 5.3), and
that  all  parts  should have a See-Also header (see section
6.16) giving the message IDs of at least the previous  parts
and preferably all the parts.
 
     NOTE:  See-Also  is  more correct for this purpose
     than References, although References is in  common
     use  today  (with  less-formal reassembly arrange-
     ments).  MIME reassemblers should probably examine
     articles  suggested  by References headers if See-
     Also headers  are  not  present  to  indicate  the
     whereabouts   of   the   other   parts   of  "mes-
     sage/partial" articles.
 
To repeat: implementations SHOULD avoid fixed constraints on
the  sizes of lines within an article and on the size of the
entire article.
 
4.7. Example
Here is a sample article:
 
     From: jerry@eagle.ATT.COM (Jerry Schwarz)
     Path: cbosgd!mhuxj!mhuxt!eagle!jerry
     Newsgroups: news.announce
     Subject: Usenet Etiquette -- Please Read
     Message-ID: <642@eagle.ATT.COM>
     Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
     Followup-To: news.misc
     Expires: Wed, 19 Jan 1994 00:00:00 -0500
     Organization: AT&T Bell Laboratories, Murray Hill
     body
     body
     body
 |