/trunk/install/IzPack/src/doc/splitjar.tex - | gvSIG desktop 1 - gvSIG

root / trunk / install / IzPack / src / doc / splitjar.tex @ 11445

History | View | Annotate | Download (13 KB)

       %begin{latexonly}
       \newif\ifpdf
       \ifx\pdfoutput\undefined
       \pdffalse
       \else
       \pdfoutput=1
       \pdftrue
       \fi
       % Change this as needed :
       %   - a4paper to your paper format
       %   - the document class to your need (book, article, ...)
       \ifpdf
       \documentclass[a4paper, 12pt, pdftex]{report}
       \else
       %end{latexonly}
       \documentclass[a4paper, 12pt, dvips]{report}
       %begin{latexonly}
       \fi
       %end{latexonly}
       % The packages we need
       \usepackage{verbatim}
       \usepackage{moreverb}
       \usepackage{url}
       \usepackage{tabularx}
       \usepackage[final]{graphicx}
       \usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref}
       %begin{latexonly}
       \ifpdf
       \hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red}
       \fi
       %end{latexonly}
       \usepackage{html}
       \begin{htmlonly}
       \newcommand{\href}[2]{\htmladdnormallink{#2}{#1}}
       \end{htmlonly}
       % block style paragraphs tend to look better in technical docs
       \parindent=0in
       \parskip=10pt
       \begin{document}
       % Split Jar Specification
       \appendix
       \chapter{Split Jars and MANIFEST Extensions}
       The jar file specification allows archiving and packaging classes and
       resources, but is limited in size. To overcome these limitations with
       minimal changes to the jar format we create a set of jar files and add
       new key-value attributes to the jar MANIFEST. These attributes
       indicate now many jars are in the set, and which, if any, files are
       split across multiple jars, and which jars they are contained in.
       \section{Motivations and Limitations}
       Java's zip implementation limited to \~{}2GB. The problem is not solved
       by zip64 extensions (which will allow larger files), when medium limitations
       restrict the jar size. There must be a way to split the archive into
       multiple files, and indeed, the individual entries must be split
       across jars.
       A ``Split Jar'' is a set of normal jar files, one being the
       \emph{primary} jar, and zero or more \emph{secondary} jars. The
       Primary jar file has additional manifest attributes to help
       reconstruct the data. Entries may or may not be split across multiple
       jars, and need to be spliced back together upon extraction. Secondary
       jar names derive from the basename of the primary jar, and each
       segment of a split entry, shares a basename derived from the original
       entry name.
       Segments of a split entry need not be in separate jars. Thus if a jar
       is split to deal with media limitations, all the resulting jars may be
       combined into a single primary jar, as long as the Manifest is
       correctly updated.
       A major benefit to this format is that the split archive contents can
       be recovered manually by extracting the contents of all jar files in
       the set, and simply concatenating the segments of the split entries.
       Entry names in the primary and secondary jars must not conflict, so
       that together they represent a single archive. This includes the
       generated names of split entry segments. This ensures that each jar in
       a split archive may be extracted to the same location without risk of
       loosing data. Split file segments are then be concatenated manually or
       by automation to get the original data set. The manifest should always
       be consulted to ensure that files which look like split entry segments
       should actually be spliced together. It is possible that the files
       were intended to be part of the archive (See ``Naming Conventions'',
       for name conflict resolution).
       All segments of a split jar are given generated names so that normal
       jar tools will never unpack the original file. This ensures that no
       unsuspecting user mistakenly uses a truncated, partial file.
       \subsection{Warnings}
       Signing jar entries which have been split has not been addressed.
       Files can not be compressed directly into streams when there are
       potential name conflicts with the generated segment names. This
       requires that robust tools collect a list of files to be added, and
       determine any conflicts first to avoid the issue (See Naming
       Conventions: Entry Names).
       Adding files to existing split jars may also have problems with name
       conflicts.
       \section{Naming Conventions}
       A primary goal for this design is to allow split jars to be created
       and unpacked manually with minimal problems. This is accomplished by
       using a naming convention which lends to visual reconstruction. When a
       jar file must be split into multiple segments, there is a primary
       file, and multiple secondary jars with a common name. When an entry
       within the set of jars must be split, \emph{each} segment is given
       a numbered suffix.
       \subsection{Jar File Names}
       For the primary jar \texttt{\emph{basename}.jar}, the names of
       secondary jars must always be \texttt{\emph{basename}.split\#.jar}
       where \texttt{\#} is an integer \emph{secondary jar ID} starting at
       \texttt{1}. Left padded zeros in the ID are ignored, and encouraged
       to allow lexicographical sorting. The jars can be renamed, as long as
       the \emph{basename} is the same for all, and the suffixes
       (\texttt{.split\#.jar}) remain the same. All entries within the set
       must be unique.
       \subsection{Jar Entry Names}
       For the split entry named \texttt{\emph{basename}} (including
       suffixes), all segments are named using the template:
       \texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer
       \emph{segment ID} starting at \texttt{0}. These segments are
       rejoined by concatenating the segments in numeric order, to a file
       named \texttt{basename}. The template is recorded in the \emph{main} section
       of the manifest.
       In the rare case where an entry is split, and the name of a real entry
       may conflicts with a generated segment name, a non-default suffix
       template is used. In Our case, all of the generated segments will have
       '\texttt{\~{}}' characters appended, as needed, to eliminate potential
       conflicts. This non-default template is recorded in the
       \emph{per-entry} section of the manifest for the split entry.
       Non-default suffixes are used for all \emph{potential} conflicts even in
       cases where there is no actual conflict.
       \begin{itemize}
         \item When the split entry does not generate enough segments to
               conflict, but the suffix matches the default template.
         \item When the conflicting real entry must also be split, thus its
               actual entries use generated suffixes.
       \end{itemize}\
       Examples are given below.
       Other tools implementing split jars may (though are not encouraged to)
       use different suffixes, though they must have numeric segment replaced
       by '\#' in the manifest. Tools must sort these numerically, not
       lexicographically as ``2'' is generally greater than ``10''
       lexicographically. However, tools are encouraged to zero padding names,
       as needed, so that lexicographic sorting is correct.
       \section{Manifest Attributes}
       To minimize changes needed to implement the split jar, we simply add
       attributes to the manifest. Additional attributes are ignored by other
       jar tools, so the only consequences is that files split files, and
       files completely located in secondary jars will not be available to
       them.
       To prevent adding too much space overhead, and allow jar files to be
       renamed, the entries are kept minimalistic.
       \subsection{Main Section Attributes}
       Two attribute are added to indicate the number of secondary jars, and
       the default suffix added to the segments of split files.
       % TODO: make this like an html <dl><dd>... <dt> ...</dl>
       \begin{itemize}
         \item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars
               in the set.
         \item \texttt{Split-Jar-Secondary-Suffix}: the suffix template
               inserted prior to the \texttt{.jar} suffix typical of jar
               files, to make the names of secondary jar file in the set;
               typically \texttt{.split\#}.
         \item \texttt{Split-Entry-Suffix}: the suffix template appended to
               an entry name, to name each of the entries constituent parts;
               typically \texttt{.---\#.\~{}}. The \# char indicates the
               location of the numeric value.  This cannot currently be
               changed.
       \end{itemize}
       \subsection{Per-Entry Section Attributes}
       Only files which are split require an attributes in the manifest. A
       space separated list of integers is recorded; one for each jar
       containing a segment of the entry. Entries which have a segment in the
       primary jar file, indicate this with the id \texttt{0}.
       No restriction is placed on the order of the entries, or the IDs of the
       jar in which any segment is contained.
       % TODO: make this like an html <dl><dd>... <dt> ...</dl>
       \begin{itemize}
         \item \texttt{Split-Entry-Jar-IDs}: A space separated set of
               secondary jar IDs which contains the segments of the
               entry. Essentially a list of integers.
         \item \texttt{Split-Entry-Suffix}: Overrides the default
               Split-Entry-Suffix specified in the Main-Attributes. Needed
               when one (or more) '\texttt{\~{}}' chars are appended due to name
               conflict with real entries.  This is not strictly necessary,
               as simply knowing the basename and unpacking all jars would
               allow the suffix to be determined, but is included to conserve
               processing. This is currently not user configurable.
       \end{itemize}
       \section{Examples}
       Two examples, one simple, and another cluttered with pathological
       cases. Notice that the jar ID number and segment of a split entry have
       no correlation. In most applications, there will seldom be more than
       two segments in a single file: the end of the last entry to the
       previous jar, and maybe the last entry of this jar, which is continued
       in the next. The examples aren't so well organized though. :-)
       \subsection{Basic Example}
       % TODO: format for tex
       TODO: format for TeX
       Jar to Create
       -------------
           example.jar
       Files to Compress
       -----------------
           movie.mpeg
           README
           song.mp3
           text.txt
       Entries in Jars
       ---------------
           example.jar           movie.mpeg.---0.\~{}
                                 README
           example.split1.jar    movie.mpeg.---1.\~{}
                                 song.mp3.---0.\~{}
           example.split2.jar    movie.mpeg.---2.\~{}
                                 song.mp3.---1.\~{}
                                 text.txt
       MANIFEST (primary jar only)
       ---------------------------
           Manifest-Version: 1.0
           Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.)
           Built-By: IzPack 1.6.0
           Main-Class: com.izforge.izpack.installer.Installer
           Split-Jar-Secondary-Count: 2
           Split-Entry-Suffix: .---\#.\~{}
           movie.mpg
           Split-Entry-Jar-IDs: 0 1 2
           song.mp3
           Split-Entry-Jar-IDs: 1 2
       \subsection{Name Conflicts}
       Pathological example showing name conflict resolution.  Includes
       \begin{itemize}
         \item Direct conflict with real archive file
               (\texttt{foo...}).
         \item Indirect conflict with file by suffix template only
               (\texttt{bar...}).
         \item Conflict with real archive file that is also split. Due to
               both being split, there would be no name conflict amongst jar
               entries, however The default suffix is not used anyway
               (\texttt{yin...}).
         \item A \emph{near} conflict, just to be annoying. Normal behavior
               (\texttt{chi...}).
         \item Files which look like segments of a split file, but are not,
               requiring manifest to know the difference (\texttt{zig...}).
       \end{itemize}
       \begin{verbatim}
       Jar to Create
       -------------
           example.jar
       Files to Compress
       -----------------
           foo.dat
           foo.dat.---0.~{} .... Extremely unlikely that these would exist,
                               much less need to be archived. Provided as an
                               example.
           bar.dat
           bar.dat.---555.~{} .. Another unlikely case which would not conflict
                               (assume bar.dat is split into only 2 segments)
                               except for the suffix template.
           yin.dat
           yin.dat.---2.~{} .... Yet another template only conflict
                               conflicting file needs to be split.
           chi.dat
           chi.dat.---0.~{}~{} ... No potential conflict.
           zig.dat.---0.~{} .... Files to be archived as they are, but not
           zig.dat.---1.~{}      intended to be spliced back together.
       Entries in Jars
       ---------------
           example.jar           foo.dat.---0.~{}
                                 foo.dat.---0.~{}~{}
                                 bar.dat.---555.~{}
                                 bar.dat.---0.~{}~{}
                                 yin.dat.---0.~{}~{}
                                 yin.dat.---2.~{}.---0.~{}
                                 chi.dat.---0.~{}
                                 chi.dat.---2.~{}~{}
                                 zig.dat.---0.~{}
                                 zig.dat.---1.~{}
           example.split1.jar    foo.dat.---1.~{}~{}
                                 bar.dat.---1.~{}~{}
                                 yin.dat.---1.~{}~{}
                                 yin.dat.---2.~{}.---1.~{}
                                 chi.dat.---1.~{}
       MANIFEST (primary jar only)
       ---------------------------
           Manifest-Version: 1.0
           IzPack-Version: X.X.X
           Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.)
           Built-By: IzPack
           Class-Path:
           Main-Class: com.izforge.izpack.installer.Installer
           Split-Jar-Secondary-Count: 2
           Split-Entry-Suffix: .---#.~{}
           foo.dat
           Split-Entry-Jar-IDs: 0 1
           Split-Entry-Suffix: .---#.~{}~{}
           bar.dat
           Split-Entry-Jar-IDs: 0 1
           Split-Entry-Suffix: .---#.~{}~{}
           fig.dat
           Split-Entry-Jar-IDs: 0 1
           Split-Entry-Suffix: .---#.~{}~{}
           fig.dat.---2.~{}
           Split-Entry-Jar-IDs: 0 1
           moa.dat
           Split-Entry-Jar-IDs: 0 1
       \end{verbatim}
       \end{document}