\documentstyle[12pt,multicol]{article}

\addtolength{\textwidth}{.5cm}

\def\diatop[#1|#2]{{\setbox1=\hbox{{#1{}}}\setbox2=\hbox{{#2{}}}%
	\dimen0=\ifdim\wd1>\wd2\wd1\else\wd2\fi%
	\dimen1=\ht2\advance\dimen1by-1ex%
	\setbox1=\hbox to1\dimen0{\hss#1\hss}%
	\rlap{\raise1\dimen1\box1}%
	\hbox to1\dimen0{\hss#2\hss}}}%
%e.g. of use: \diatop[\'|{\=o}] gives u macron acute

\title{Standardization of Sanskrit for Electronic Data Transfer
and Screen Representation}

\author{Dominik Wujastyk}

\date{9 September 1990}

\begin{document}

\maketitle

\section*{Text Encoding Guidelines}
During the 8th World Sanskrit Conference, Vienna 1990, a panel
was held to discuss the standardization of Sanskrit for
electronic data transfer.  Participants were encouraged to
acquire and study the {\em ACH-ACL-ALLC Guidelines for the
Encoding and Interchange of Machine-readable Texts}, edited by
Lou BURNARD and C.~M.~SPERBERG-MCQUEEN (Chicago and Oxford,
1990).  These {\em Guidelines\/} are available free of charge in
Europe from L.~Burnard, Oxford University Computing Service, 13
Banbury Road, Oxford OX2 6NN, England, or in the USA from C. M.
Sperberg-McQueen, Computer Center (MIC 135), University of
Illinois at Chicago, Box 6998, Chicago, IL 60680, USA.

\section*{7-bit coding for file transfer}
Professor H. Falk presented a program called {\tt CONVERT} that
conveniently converts any coding scheme used in a data file to
any other coding scheme.  This program was generously made
available at no cost, together with Turbo Pascal source code.
Prof.\ Falk also presented a very useful 7-bit, multi-byte
``mediation code'' which will be of general use for file
exchange.

\section*{8-bit character set for text display}
Finally, although the above two provisions cover all essential
needs, the panel still felt that a standard assignment of graphic
codes for the display of Sanskrit transliteration would be
helpful.  An ad hoc committee of interested parties was formed,
and two 8-bit `code pages'' were designed.  One, {\em Classical
Sanskrit\/} (CS), for standard use and another, {\em Classical
Sanskrit Extended\/} (CSX), which included the former, but also
provided for Vedic, MIA, Tamil and some special usages.  These
code pages take as their point of departure IBM's code page 437,
the default set of character codes built into the IBM PC and
clones.  The characters listed below are replacements for the
characters in code page 437 which have the same numerical code.
E.g., character number 224 in code page 437 is a Greek letter alpha
($\alpha$); CS redefines it to be a with a macron (\=a).  All
codes not specified below are assumed to be as code page 437.
E.g., character number 130 is e acute (\'e).

The codes assigned were as follow:
\begin{multicols}{2}[\subsection*{Classical Sanskrit (CS)}]
\begin{small}
\begin{tabbing}
000 \= x underdot macron acute \= (normally German  eszett, xx) \kill
166 \>  l tilde         \>      \~ l     \\
167 \>  m overdot       \>      \.m     \\
224 \>  a macron        \>      \a=a     \\
225 \>  not used  (normally German {\em eszett}, \ss)   \\
226 \>  A macron        \>      \a=A     \\
227 \>  i macron        \>      \a=\i  \\
228 \>  I macron        \>      \a=I     \\
229 \>  u macron        \>      \a=u     \\
230 \>  U macron        \>      \a=U     \\
231 \>  r underdot      \>      \d r    \\
232 \>  R underdot      \>      \d R    \\
233 \>  r underdot macron\>     \diatop[\a=|\d r]\\
234 \>  R underdot macron\>     \diatop[\a=|\d R]\\
235 \>  l underdot      \>      \d l    \\
236 \>  L underdot      \>      \d L    \\
237 \>  l underdot macron\>     \diatop[\a=|\d l]\\
238 \>  L underdot macron\>     \diatop[\a=|\d L]\\
239 \>  n overdot       \>      \.n     \\
240 \>  N overdot       \>      \.N     \\
241 \>  t underdot      \>      \d t    \\
242 \>  T underdot      \>      \d T    \\
243 \>  d underdot      \>      \d d    \\
244 \>  D underdot      \>      \d D    \\
245 \>  n underdot      \>      \d n    \\
246 \>  N underdot      \>      \d N    \\
247 \>  s acute         \>      \a's    \\
248 \>  S acute         \>      \a'S    \\
249 \>  s underdot      \>      \d s    \\
250 \>  S underdot      \>      \d S    \\
251 \>  not used  (normally the root sign $\surd$) \\
252 \>  m underdot      \>      \d m    \\
253 \>  M underdot      \>      \d M    \\
254 \>  h underdot      \>      \d h    \\
255 \>  H underdot      \>      \d H    \\
\end{tabbing}
\end{small}
\end{multicols}
\newpage

\begin{multicols}{2}[\subsection*{Classical Sanskrit Extended (CSX) additions}
The following definitions are added to the above Classical
Sanskrit character set.]
\begin{small}
\begin{tabbing}
000 \= x underdot macron acute \= (normally German eszett, xx) \kill
159 \>  r underbar      \>      \b r    \\
168 \>  a macron breve  \>      \diatop[\u|\a=a]\\
169 \>  i macron breve  \>      \diatop[\u|\a=\i]\\
170 \>  u macron breve  \>      \diatop[\u|\a=u]\\
173 \>  n underbar      \>      \b n    \\
181 \>  a macron acute  \>      \diatop[\a'|\a=a]\\
182 \>  a macron grave  \>      \diatop[\a`|\a=a] \\
183 \>  i macron acute  \>      \diatop[\a'|\a=\i] \\
184 \>  i macron grave  \>      \diatop[\a`|\a=\i] \\
189 \>  u macron acute  \>      \diatop[\a'|\a=u] \\
190 \>  u macron grave  \>      \diatop[\a`|\a=u] \\
198 \>  r underdot acute\>      \diatop[\a'|\d r] \\
199 \>  r underdot grave\>      \diatop[\a`|\d r] \\
207 \>  r underdot macron acute\>
 \raisebox{.25ex}{\rlap{\a'{ }}}\diatop[\a=|\d r] \\
208 \>  a tilde         \>      \~ a     \\
209 \>  i tilde         \>      \~ \i    \\
210 \>  u tilde         \>      \~ u     \\
211 \>  e tilde         \>      \~ e     \\
212 \>  o tilde         \>      \~ o     \\
213 \>  e breve         \>      \u e    \\
214 \>  o breve         \>      \u o    \\
215 \>  l underbar      \>      \b l    \\
\end{tabbing}
\end{small}
\end{multicols}
\bigskip
These codes were chosen to have minimal impact on the standard
IBM PC extended ASCII character set, but they are intended for
general use in displaying Indological texts on any machine with
an 8-bit (or greater) character set.

Dr. D. Wujastyk will be making available small programs that load
the above character sets into the EGA or VGA display adaptors,
for IBM PC users.

The above character codings have been approved by R. E. Emmerick,
H. Falk, R. Lariviere, G. J. Meulenbeld, H. Nakatani, M.
Tokunaga, D.~Wujastyk, P. Schreiner and M. Yano.

These character codings are primarily intended for use in
situations when the screen display of these characters is
requried, such as in word processing.  They may, of course, be
used for data transfer, where, however, a 7-bit code (perhaps with
multi-byte character codes) is still preferable.  One such 7-bit
scheme is provided hy H. Falk (see 2. above).

\newpage

These character codings are currently open for discussion and
comments may be directed to Dr. D. Wujastyk at

Wellcome Institute,

183 Euston Road,

London NW1 2BN, England,\\
or by email at

Bitnet/Earn: {\tt dow@harvunxw} or

Janet: {\tt D.Wujastyk@uk.ac.ucl}.


After a suitable lapse of time, the character sets will be sent
to ECMA and ISO for registration.  They will also be sent to the
Text Encoding Initiative for registration, probably with H.
Falk's 7-bit coding scheme.

Such registration in no way enforces these schemes; it merely
makes them available centrally for reference.  Other schemes may
also be registered in the future.

\end{document}