root / trunk / install / IzPack / src / doc / splitjar.tex @ 11445
History | View | Annotate | Download (13 KB)
1 |
|
---|---|
2 |
%begin{latexonly} |
3 |
\newif\ifpdf |
4 |
\ifx\pdfoutput\undefined |
5 |
\pdffalse |
6 |
\else |
7 |
\pdfoutput=1 |
8 |
\pdftrue |
9 |
\fi |
10 |
|
11 |
% Change this as needed : |
12 |
% - a4paper to your paper format |
13 |
% - the document class to your need (book, article, ...) |
14 |
\ifpdf |
15 |
\documentclass[a4paper, 12pt, pdftex]{report} |
16 |
\else |
17 |
%end{latexonly} |
18 |
\documentclass[a4paper, 12pt, dvips]{report} |
19 |
%begin{latexonly} |
20 |
\fi |
21 |
%end{latexonly} |
22 |
|
23 |
% The packages we need |
24 |
\usepackage{verbatim} |
25 |
\usepackage{moreverb} |
26 |
\usepackage{url} |
27 |
\usepackage{tabularx} |
28 |
\usepackage[final]{graphicx} |
29 |
\usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref} |
30 |
%begin{latexonly} |
31 |
\ifpdf |
32 |
\hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red} |
33 |
\fi |
34 |
%end{latexonly} |
35 |
\usepackage{html} |
36 |
\begin{htmlonly} |
37 |
\newcommand{\href}[2]{\htmladdnormallink{#2}{#1}} |
38 |
\end{htmlonly} |
39 |
|
40 |
% block style paragraphs tend to look better in technical docs |
41 |
\parindent=0in |
42 |
\parskip=10pt |
43 |
|
44 |
\begin{document} |
45 |
|
46 |
% Split Jar Specification |
47 |
|
48 |
\appendix |
49 |
|
50 |
\chapter{Split Jars and MANIFEST Extensions} |
51 |
|
52 |
The jar file specification allows archiving and packaging classes and |
53 |
resources, but is limited in size. To overcome these limitations with |
54 |
minimal changes to the jar format we create a set of jar files and add |
55 |
new key-value attributes to the jar MANIFEST. These attributes |
56 |
indicate now many jars are in the set, and which, if any, files are |
57 |
split across multiple jars, and which jars they are contained in. |
58 |
|
59 |
\section{Motivations and Limitations} |
60 |
|
61 |
Java's zip implementation limited to \~{}2GB. The problem is not solved |
62 |
by zip64 extensions (which will allow larger files), when medium limitations |
63 |
restrict the jar size. There must be a way to split the archive into |
64 |
multiple files, and indeed, the individual entries must be split |
65 |
across jars. |
66 |
|
67 |
A ``Split Jar'' is a set of normal jar files, one being the |
68 |
\emph{primary} jar, and zero or more \emph{secondary} jars. The |
69 |
Primary jar file has additional manifest attributes to help |
70 |
reconstruct the data. Entries may or may not be split across multiple |
71 |
jars, and need to be spliced back together upon extraction. Secondary |
72 |
jar names derive from the basename of the primary jar, and each |
73 |
segment of a split entry, shares a basename derived from the original |
74 |
entry name. |
75 |
|
76 |
Segments of a split entry need not be in separate jars. Thus if a jar |
77 |
is split to deal with media limitations, all the resulting jars may be |
78 |
combined into a single primary jar, as long as the Manifest is |
79 |
correctly updated. |
80 |
|
81 |
A major benefit to this format is that the split archive contents can |
82 |
be recovered manually by extracting the contents of all jar files in |
83 |
the set, and simply concatenating the segments of the split entries. |
84 |
|
85 |
Entry names in the primary and secondary jars must not conflict, so |
86 |
that together they represent a single archive. This includes the |
87 |
generated names of split entry segments. This ensures that each jar in |
88 |
a split archive may be extracted to the same location without risk of |
89 |
loosing data. Split file segments are then be concatenated manually or |
90 |
by automation to get the original data set. The manifest should always |
91 |
be consulted to ensure that files which look like split entry segments |
92 |
should actually be spliced together. It is possible that the files |
93 |
were intended to be part of the archive (See ``Naming Conventions'', |
94 |
for name conflict resolution). |
95 |
|
96 |
All segments of a split jar are given generated names so that normal |
97 |
jar tools will never unpack the original file. This ensures that no |
98 |
unsuspecting user mistakenly uses a truncated, partial file. |
99 |
|
100 |
\subsection{Warnings} |
101 |
|
102 |
Signing jar entries which have been split has not been addressed. |
103 |
|
104 |
Files can not be compressed directly into streams when there are |
105 |
potential name conflicts with the generated segment names. This |
106 |
requires that robust tools collect a list of files to be added, and |
107 |
determine any conflicts first to avoid the issue (See Naming |
108 |
Conventions: Entry Names). |
109 |
|
110 |
Adding files to existing split jars may also have problems with name |
111 |
conflicts. |
112 |
|
113 |
\section{Naming Conventions} |
114 |
|
115 |
A primary goal for this design is to allow split jars to be created |
116 |
and unpacked manually with minimal problems. This is accomplished by |
117 |
using a naming convention which lends to visual reconstruction. When a |
118 |
jar file must be split into multiple segments, there is a primary |
119 |
file, and multiple secondary jars with a common name. When an entry |
120 |
within the set of jars must be split, \emph{each} segment is given |
121 |
a numbered suffix. |
122 |
|
123 |
\subsection{Jar File Names} |
124 |
|
125 |
For the primary jar \texttt{\emph{basename}.jar}, the names of |
126 |
secondary jars must always be \texttt{\emph{basename}.split\#.jar} |
127 |
where \texttt{\#} is an integer \emph{secondary jar ID} starting at |
128 |
\texttt{1}. Left padded zeros in the ID are ignored, and encouraged |
129 |
to allow lexicographical sorting. The jars can be renamed, as long as |
130 |
the \emph{basename} is the same for all, and the suffixes |
131 |
(\texttt{.split\#.jar}) remain the same. All entries within the set |
132 |
must be unique. |
133 |
|
134 |
\subsection{Jar Entry Names} |
135 |
|
136 |
For the split entry named \texttt{\emph{basename}} (including |
137 |
suffixes), all segments are named using the template: |
138 |
\texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer |
139 |
\emph{segment ID} starting at \texttt{0}. These segments are |
140 |
rejoined by concatenating the segments in numeric order, to a file |
141 |
named \texttt{basename}. The template is recorded in the \emph{main} section |
142 |
of the manifest. |
143 |
|
144 |
In the rare case where an entry is split, and the name of a real entry |
145 |
may conflicts with a generated segment name, a non-default suffix |
146 |
template is used. In Our case, all of the generated segments will have |
147 |
'\texttt{\~{}}' characters appended, as needed, to eliminate potential |
148 |
conflicts. This non-default template is recorded in the |
149 |
\emph{per-entry} section of the manifest for the split entry. |
150 |
|
151 |
Non-default suffixes are used for all \emph{potential} conflicts even in |
152 |
cases where there is no actual conflict. |
153 |
|
154 |
\begin{itemize} |
155 |
\item When the split entry does not generate enough segments to |
156 |
conflict, but the suffix matches the default template. |
157 |
\item When the conflicting real entry must also be split, thus its |
158 |
actual entries use generated suffixes. |
159 |
\end{itemize}\ |
160 |
|
161 |
Examples are given below. |
162 |
|
163 |
Other tools implementing split jars may (though are not encouraged to) |
164 |
use different suffixes, though they must have numeric segment replaced |
165 |
by '\#' in the manifest. Tools must sort these numerically, not |
166 |
lexicographically as ``2'' is generally greater than ``10'' |
167 |
lexicographically. However, tools are encouraged to zero padding names, |
168 |
as needed, so that lexicographic sorting is correct. |
169 |
|
170 |
\section{Manifest Attributes} |
171 |
|
172 |
To minimize changes needed to implement the split jar, we simply add |
173 |
attributes to the manifest. Additional attributes are ignored by other |
174 |
jar tools, so the only consequences is that files split files, and |
175 |
files completely located in secondary jars will not be available to |
176 |
them. |
177 |
|
178 |
To prevent adding too much space overhead, and allow jar files to be |
179 |
renamed, the entries are kept minimalistic. |
180 |
|
181 |
\subsection{Main Section Attributes} |
182 |
|
183 |
Two attribute are added to indicate the number of secondary jars, and |
184 |
the default suffix added to the segments of split files. |
185 |
|
186 |
% TODO: make this like an html <dl><dd>... <dt> ...</dl> |
187 |
\begin{itemize} |
188 |
\item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars |
189 |
in the set. |
190 |
\item \texttt{Split-Jar-Secondary-Suffix}: the suffix template |
191 |
inserted prior to the \texttt{.jar} suffix typical of jar |
192 |
files, to make the names of secondary jar file in the set; |
193 |
typically \texttt{.split\#}. |
194 |
\item \texttt{Split-Entry-Suffix}: the suffix template appended to |
195 |
an entry name, to name each of the entries constituent parts; |
196 |
typically \texttt{.---\#.\~{}}. The \# char indicates the |
197 |
location of the numeric value. This cannot currently be |
198 |
changed. |
199 |
\end{itemize} |
200 |
|
201 |
\subsection{Per-Entry Section Attributes} |
202 |
|
203 |
Only files which are split require an attributes in the manifest. A |
204 |
space separated list of integers is recorded; one for each jar |
205 |
containing a segment of the entry. Entries which have a segment in the |
206 |
primary jar file, indicate this with the id \texttt{0}. |
207 |
|
208 |
No restriction is placed on the order of the entries, or the IDs of the |
209 |
jar in which any segment is contained. |
210 |
|
211 |
% TODO: make this like an html <dl><dd>... <dt> ...</dl> |
212 |
\begin{itemize} |
213 |
\item \texttt{Split-Entry-Jar-IDs}: A space separated set of |
214 |
secondary jar IDs which contains the segments of the |
215 |
entry. Essentially a list of integers. |
216 |
\item \texttt{Split-Entry-Suffix}: Overrides the default |
217 |
Split-Entry-Suffix specified in the Main-Attributes. Needed |
218 |
when one (or more) '\texttt{\~{}}' chars are appended due to name |
219 |
conflict with real entries. This is not strictly necessary, |
220 |
as simply knowing the basename and unpacking all jars would |
221 |
allow the suffix to be determined, but is included to conserve |
222 |
processing. This is currently not user configurable. |
223 |
\end{itemize} |
224 |
|
225 |
\section{Examples} |
226 |
|
227 |
Two examples, one simple, and another cluttered with pathological |
228 |
cases. Notice that the jar ID number and segment of a split entry have |
229 |
no correlation. In most applications, there will seldom be more than |
230 |
two segments in a single file: the end of the last entry to the |
231 |
previous jar, and maybe the last entry of this jar, which is continued |
232 |
in the next. The examples aren't so well organized though. :-) |
233 |
|
234 |
\subsection{Basic Example} |
235 |
|
236 |
% TODO: format for tex |
237 |
TODO: format for TeX |
238 |
|
239 |
Jar to Create |
240 |
------------- |
241 |
example.jar |
242 |
|
243 |
Files to Compress |
244 |
----------------- |
245 |
movie.mpeg |
246 |
README |
247 |
song.mp3 |
248 |
text.txt |
249 |
|
250 |
Entries in Jars |
251 |
--------------- |
252 |
example.jar movie.mpeg.---0.\~{} |
253 |
README |
254 |
|
255 |
example.split1.jar movie.mpeg.---1.\~{} |
256 |
song.mp3.---0.\~{} |
257 |
|
258 |
example.split2.jar movie.mpeg.---2.\~{} |
259 |
song.mp3.---1.\~{} |
260 |
text.txt |
261 |
|
262 |
MANIFEST (primary jar only) |
263 |
--------------------------- |
264 |
Manifest-Version: 1.0 |
265 |
Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.) |
266 |
Built-By: IzPack 1.6.0 |
267 |
Main-Class: com.izforge.izpack.installer.Installer |
268 |
Split-Jar-Secondary-Count: 2 |
269 |
Split-Entry-Suffix: .---\#.\~{} |
270 |
|
271 |
movie.mpg |
272 |
Split-Entry-Jar-IDs: 0 1 2 |
273 |
|
274 |
song.mp3 |
275 |
Split-Entry-Jar-IDs: 1 2 |
276 |
|
277 |
\subsection{Name Conflicts} |
278 |
|
279 |
Pathological example showing name conflict resolution. Includes |
280 |
|
281 |
\begin{itemize} |
282 |
\item Direct conflict with real archive file |
283 |
(\texttt{foo...}). |
284 |
|
285 |
\item Indirect conflict with file by suffix template only |
286 |
(\texttt{bar...}). |
287 |
|
288 |
\item Conflict with real archive file that is also split. Due to |
289 |
both being split, there would be no name conflict amongst jar |
290 |
entries, however The default suffix is not used anyway |
291 |
(\texttt{yin...}). |
292 |
|
293 |
\item A \emph{near} conflict, just to be annoying. Normal behavior |
294 |
(\texttt{chi...}). |
295 |
|
296 |
\item Files which look like segments of a split file, but are not, |
297 |
requiring manifest to know the difference (\texttt{zig...}). |
298 |
\end{itemize} |
299 |
|
300 |
\begin{verbatim} |
301 |
|
302 |
Jar to Create |
303 |
------------- |
304 |
example.jar |
305 |
|
306 |
Files to Compress |
307 |
----------------- |
308 |
foo.dat |
309 |
foo.dat.---0.~{} .... Extremely unlikely that these would exist, |
310 |
much less need to be archived. Provided as an |
311 |
example. |
312 |
bar.dat |
313 |
bar.dat.---555.~{} .. Another unlikely case which would not conflict |
314 |
(assume bar.dat is split into only 2 segments) |
315 |
except for the suffix template. |
316 |
yin.dat |
317 |
yin.dat.---2.~{} .... Yet another template only conflict |
318 |
conflicting file needs to be split. |
319 |
chi.dat |
320 |
chi.dat.---0.~{}~{} ... No potential conflict. |
321 |
zig.dat.---0.~{} .... Files to be archived as they are, but not |
322 |
zig.dat.---1.~{} intended to be spliced back together. |
323 |
|
324 |
Entries in Jars |
325 |
--------------- |
326 |
example.jar foo.dat.---0.~{} |
327 |
foo.dat.---0.~{}~{} |
328 |
bar.dat.---555.~{} |
329 |
bar.dat.---0.~{}~{} |
330 |
yin.dat.---0.~{}~{} |
331 |
yin.dat.---2.~{}.---0.~{} |
332 |
chi.dat.---0.~{} |
333 |
chi.dat.---2.~{}~{} |
334 |
zig.dat.---0.~{} |
335 |
zig.dat.---1.~{} |
336 |
|
337 |
example.split1.jar foo.dat.---1.~{}~{} |
338 |
bar.dat.---1.~{}~{} |
339 |
yin.dat.---1.~{}~{} |
340 |
yin.dat.---2.~{}.---1.~{} |
341 |
chi.dat.---1.~{} |
342 |
|
343 |
MANIFEST (primary jar only) |
344 |
--------------------------- |
345 |
Manifest-Version: 1.0 |
346 |
IzPack-Version: X.X.X |
347 |
Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.) |
348 |
Built-By: IzPack |
349 |
Class-Path: |
350 |
Main-Class: com.izforge.izpack.installer.Installer |
351 |
Split-Jar-Secondary-Count: 2 |
352 |
Split-Entry-Suffix: .---#.~{} |
353 |
|
354 |
foo.dat |
355 |
Split-Entry-Jar-IDs: 0 1 |
356 |
Split-Entry-Suffix: .---#.~{}~{} |
357 |
|
358 |
bar.dat |
359 |
Split-Entry-Jar-IDs: 0 1 |
360 |
Split-Entry-Suffix: .---#.~{}~{} |
361 |
|
362 |
fig.dat |
363 |
Split-Entry-Jar-IDs: 0 1 |
364 |
Split-Entry-Suffix: .---#.~{}~{} |
365 |
|
366 |
fig.dat.---2.~{} |
367 |
Split-Entry-Jar-IDs: 0 1 |
368 |
|
369 |
moa.dat |
370 |
Split-Entry-Jar-IDs: 0 1 |
371 |
\end{verbatim} |
372 |
|
373 |
\end{document} |