%begin{latexonly}
\newif\ifpdf
\ifx\pdfoutput\undefined
\pdffalse
\else
\pdfoutput=1
\pdftrue
\fi

% Change this as needed :
%   - a4paper to your paper format
%   - the document class to your need (book, article, ...)
\ifpdf
\documentclass[a4paper, 12pt, pdftex]{report}
\else
%end{latexonly}
\documentclass[a4paper, 12pt, dvips]{report}
%begin{latexonly}
\fi
%end{latexonly}

% The packages we need
\usepackage{verbatim}
\usepackage{moreverb}
\usepackage{url}
\usepackage{tabularx}
\usepackage[final]{graphicx}
\usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref}
%begin{latexonly}
\ifpdf
\hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red}
\fi
%end{latexonly}
\usepackage{html}
\begin{htmlonly}
\newcommand{\href}[2]{\htmladdnormallink{#2}{#1}}
\end{htmlonly}

% block style paragraphs tend to look better in technical docs
\parindent=0in
\parskip=10pt

\begin{document}

% Split Jar Specification

\appendix

\chapter{Split Jars and MANIFEST Extensions}

The jar file specification allows archiving and packaging classes and
resources, but is limited in size. To overcome these limitations with
minimal changes to the jar format we create a set of jar files and add
new key-value attributes to the jar MANIFEST. These attributes
indicate now many jars are in the set, and which, if any, files are
split across multiple jars, and which jars they are contained in.

\section{Motivations and Limitations}

Java's zip implementation limited to \~{}2GB. The problem is not solved
by zip64 extensions (which will allow larger files), when medium limitations
restrict the jar size. There must be a way to split the archive into
multiple files, and indeed, the individual entries must be split
across jars.

A ``Split Jar'' is a set of normal jar files, one being the
\emph{primary} jar, and zero or more \emph{secondary} jars. The
Primary jar file has additional manifest attributes to help
reconstruct the data. Entries may or may not be split across multiple
jars, and need to be spliced back together upon extraction. Secondary
jar names derive from the basename of the primary jar, and each
segment of a split entry, shares a basename derived from the original
entry name.

Segments of a split entry need not be in separate jars. Thus if a jar
is split to deal with media limitations, all the resulting jars may be
combined into a single primary jar, as long as the Manifest is
correctly updated.

A major benefit to this format is that the split archive contents can
be recovered manually by extracting the contents of all jar files in
the set, and simply concatenating the segments of the split entries.

Entry names in the primary and secondary jars must not conflict, so
that together they represent a single archive. This includes the
generated names of split entry segments. This ensures that each jar in
a split archive may be extracted to the same location without risk of
loosing data. Split file segments are then be concatenated manually or
by automation to get the original data set. The manifest should always
be consulted to ensure that files which look like split entry segments
should actually be spliced together. It is possible that the files
were intended to be part of the archive (See ``Naming Conventions'',
for name conflict resolution).

All segments of a split jar are given generated names so that normal
jar tools will never unpack the original file. This ensures that no
unsuspecting user mistakenly uses a truncated, partial file.

\subsection{Warnings}

Signing jar entries which have been split has not been addressed.

Files can not be compressed directly into streams when there are
potential name conflicts with the generated segment names. This
requires that robust tools collect a list of files to be added, and
determine any conflicts first to avoid the issue (See Naming
Conventions: Entry Names).

Adding files to existing split jars may also have problems with name
conflicts.

\section{Naming Conventions}

A primary goal for this design is to allow split jars to be created
and unpacked manually with minimal problems. This is accomplished by
using a naming convention which lends to visual reconstruction. When a
jar file must be split into multiple segments, there is a primary
file, and multiple secondary jars with a common name. When an entry
within the set of jars must be split, \emph{each} segment is given
a numbered suffix.

\subsection{Jar File Names}

For the primary jar \texttt{\emph{basename}.jar}, the names of
secondary jars must always be \texttt{\emph{basename}.split\#.jar}
where \texttt{\#} is an integer \emph{secondary jar ID} starting at
\texttt{1}. Left padded zeros in the ID are ignored, and encouraged
to allow lexicographical sorting. The jars can be renamed, as long as
the \emph{basename} is the same for all, and the suffixes
(\texttt{.split\#.jar}) remain the same. All entries within the set
must be unique.

\subsection{Jar Entry Names}

For the split entry named \texttt{\emph{basename}} (including
suffixes), all segments are named using the template:
\texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer
\emph{segment ID} starting at \texttt{0}. These segments are
rejoined by concatenating the segments in numeric order, to a file
named \texttt{basename}. The template is recorded in the \emph{main} section
of the manifest.

In the rare case where an entry is split, and the name of a real entry
may conflicts with a generated segment name, a non-default suffix
template is used. In Our case, all of the generated segments will have
'\texttt{\~{}}' characters appended, as needed, to eliminate potential
conflicts. This non-default template is recorded in the
\emph{per-entry} section of the manifest for the split entry.

Non-default suffixes are used for all \emph{potential} conflicts even in
cases where there is no actual conflict.

\begin{itemize}
  \item When the split entry does not generate enough segments to
        conflict, but the suffix matches the default template.
  \item When the conflicting real entry must also be split, thus its
        actual entries use generated suffixes.
\end{itemize}\

Examples are given below.

Other tools implementing split jars may (though are not encouraged to)
use different suffixes, though they must have numeric segment replaced
by '\#' in the manifest. Tools must sort these numerically, not
lexicographically as ``2'' is generally greater than ``10''
lexicographically. However, tools are encouraged to zero padding names,
as needed, so that lexicographic sorting is correct.

\section{Manifest Attributes}

To minimize changes needed to implement the split jar, we simply add
attributes to the manifest. Additional attributes are ignored by other
jar tools, so the only consequences is that files split files, and
files completely located in secondary jars will not be available to
them.

To prevent adding too much space overhead, and allow jar files to be
renamed, the entries are kept minimalistic.

\subsection{Main Section Attributes}

Two attribute are added to indicate the number of secondary jars, and
the default suffix added to the segments of split files.

% TODO: make this like an html <dl><dd>... <dt> ...</dl>
\begin{itemize}
  \item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars
        in the set.
  \item \texttt{Split-Jar-Secondary-Suffix}: the suffix template
        inserted prior to the \texttt{.jar} suffix typical of jar
        files, to make the names of secondary jar file in the set;
        typically \texttt{.split\#}.
  \item \texttt{Split-Entry-Suffix}: the suffix template appended to
        an entry name, to name each of the entries constituent parts;
        typically \texttt{.---\#.\~{}}. The \# char indicates the
        location of the numeric value.  This cannot currently be
        changed.
\end{itemize}

\subsection{Per-Entry Section Attributes}

Only files which are split require an attributes in the manifest. A
space separated list of integers is recorded; one for each jar
containing a segment of the entry. Entries which have a segment in the
primary jar file, indicate this with the id \texttt{0}.

No restriction is placed on the order of the entries, or the IDs of the
jar in which any segment is contained.

% TODO: make this like an html <dl><dd>... <dt> ...</dl>
\begin{itemize}
  \item \texttt{Split-Entry-Jar-IDs}: A space separated set of
        secondary jar IDs which contains the segments of the
        entry. Essentially a list of integers.
  \item \texttt{Split-Entry-Suffix}: Overrides the default
        Split-Entry-Suffix specified in the Main-Attributes. Needed
        when one (or more) '\texttt{\~{}}' chars are appended due to name
        conflict with real entries.  This is not strictly necessary,
        as simply knowing the basename and unpacking all jars would
        allow the suffix to be determined, but is included to conserve
        processing. This is currently not user configurable.
\end{itemize}

\section{Examples}

Two examples, one simple, and another cluttered with pathological
cases. Notice that the jar ID number and segment of a split entry have
no correlation. In most applications, there will seldom be more than
two segments in a single file: the end of the last entry to the
previous jar, and maybe the last entry of this jar, which is continued
in the next. The examples aren't so well organized though. :-)

\subsection{Basic Example}

% TODO: format for tex
TODO: format for TeX

Jar to Create
-------------
    example.jar

Files to Compress
-----------------
    movie.mpeg
    README
    song.mp3
    text.txt

Entries in Jars
---------------
    example.jar           movie.mpeg.---0.\~{}
                          README
    
    example.split1.jar    movie.mpeg.---1.\~{}
                          song.mp3.---0.\~{}
    
    example.split2.jar    movie.mpeg.---2.\~{}
                          song.mp3.---1.\~{}
                          text.txt

MANIFEST (primary jar only)
---------------------------
    Manifest-Version: 1.0
    Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.)
    Built-By: IzPack 1.6.0
    Main-Class: com.izforge.izpack.installer.Installer
    Split-Jar-Secondary-Count: 2
    Split-Entry-Suffix: .---\#.\~{}
    
    movie.mpg
    Split-Entry-Jar-IDs: 0 1 2
    
    song.mp3
    Split-Entry-Jar-IDs: 1 2

\subsection{Name Conflicts}

Pathological example showing name conflict resolution.  Includes

\begin{itemize}
  \item Direct conflict with real archive file
        (\texttt{foo...}).

  \item Indirect conflict with file by suffix template only
        (\texttt{bar...}).

  \item Conflict with real archive file that is also split. Due to
        both being split, there would be no name conflict amongst jar
        entries, however The default suffix is not used anyway
        (\texttt{yin...}).

  \item A \emph{near} conflict, just to be annoying. Normal behavior
        (\texttt{chi...}).

  \item Files which look like segments of a split file, but are not,
        requiring manifest to know the difference (\texttt{zig...}).
\end{itemize}

\begin{verbatim}

Jar to Create
-------------
    example.jar

Files to Compress
-----------------
    foo.dat
    foo.dat.---0.~{} .... Extremely unlikely that these would exist,
                        much less need to be archived. Provided as an
                        example.
    bar.dat
    bar.dat.---555.~{} .. Another unlikely case which would not conflict
                        (assume bar.dat is split into only 2 segments)
                        except for the suffix template.
    yin.dat
    yin.dat.---2.~{} .... Yet another template only conflict
                        conflicting file needs to be split.
    chi.dat
    chi.dat.---0.~{}~{} ... No potential conflict.
    zig.dat.---0.~{} .... Files to be archived as they are, but not
    zig.dat.---1.~{}      intended to be spliced back together.

Entries in Jars
---------------
    example.jar           foo.dat.---0.~{}
                          foo.dat.---0.~{}~{}
                          bar.dat.---555.~{}
                          bar.dat.---0.~{}~{}
                          yin.dat.---0.~{}~{}
                          yin.dat.---2.~{}.---0.~{}
                          chi.dat.---0.~{}
                          chi.dat.---2.~{}~{}
                          zig.dat.---0.~{}
                          zig.dat.---1.~{}
    
    example.split1.jar    foo.dat.---1.~{}~{}
                          bar.dat.---1.~{}~{}
                          yin.dat.---1.~{}~{}
                          yin.dat.---2.~{}.---1.~{}
                          chi.dat.---1.~{}

MANIFEST (primary jar only)
---------------------------
    Manifest-Version: 1.0
    IzPack-Version: X.X.X
    Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.)
    Built-By: IzPack
    Class-Path: 
    Main-Class: com.izforge.izpack.installer.Installer
    Split-Jar-Secondary-Count: 2
    Split-Entry-Suffix: .---#.~{}

    foo.dat
    Split-Entry-Jar-IDs: 0 1
    Split-Entry-Suffix: .---#.~{}~{}

    bar.dat
    Split-Entry-Jar-IDs: 0 1
    Split-Entry-Suffix: .---#.~{}~{}

    fig.dat
    Split-Entry-Jar-IDs: 0 1
    Split-Entry-Suffix: .---#.~{}~{}

    fig.dat.---2.~{}
    Split-Entry-Jar-IDs: 0 1

    moa.dat
    Split-Entry-Jar-IDs: 0 1 
\end{verbatim}

\end{document}