You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
390 lines
13 KiB
390 lines
13 KiB
.\" Copyright (c) 2007 Tim Kientzle
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.Dd December 23, 2011
|
|
.Dt CPIO 5
|
|
.Os
|
|
.Sh NAME
|
|
.Nm cpio
|
|
.Nd format of cpio archive files
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
archive format collects any number of files, directories, and other
|
|
file system objects (symbolic links, device nodes, etc.) into a single
|
|
stream of bytes.
|
|
.Ss General Format
|
|
Each file system object in a
|
|
.Nm
|
|
archive comprises a header record with basic numeric metadata
|
|
followed by the full pathname of the entry and the file data.
|
|
The header record stores a series of integer values that generally
|
|
follow the fields in
|
|
.Va struct stat .
|
|
(See
|
|
.Xr stat 2
|
|
for details.)
|
|
The variants differ primarily in how they store those integers
|
|
(binary, octal, or hexadecimal).
|
|
The header is followed by the pathname of the
|
|
entry (the length of the pathname is stored in the header)
|
|
and any file data.
|
|
The end of the archive is indicated by a special record with
|
|
the pathname
|
|
.Dq TRAILER!!! .
|
|
.Ss PWB format
|
|
The PWB binary
|
|
.Nm
|
|
format is the original format, when cpio was introduced as part of the
|
|
Programmer's Work Bench system, a variant of 6th Edition UNIX. It
|
|
stores numbers as 2-byte and 4-byte binary values.
|
|
Each entry begins with a header in the following format:
|
|
.Pp
|
|
.Bd -literal -offset indent
|
|
struct header_pwb_cpio {
|
|
short h_magic;
|
|
short h_dev;
|
|
short h_ino;
|
|
short h_mode;
|
|
short h_uid;
|
|
short h_gid;
|
|
short h_nlink;
|
|
short h_majmin;
|
|
long h_mtime;
|
|
short h_namesize;
|
|
long h_filesize;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Va short
|
|
fields here are 16-bit integer values, while the
|
|
.Va long
|
|
fields are 32 bit integers. Since PWB UNIX, like the 6th Edition UNIX
|
|
it was based on, only ran on PDP-11 computers, they
|
|
are in PDP-endian format, which has little-endian shorts, and
|
|
big-endian longs. That is, the long integer whose hexadecimal
|
|
representation is 0x12345678 would be stored in four successive bytes
|
|
as 0x34, 0x12, 0x78, 0x56.
|
|
The fields are as follows:
|
|
.Bl -tag -width indent
|
|
.It Va h_magic
|
|
The integer value octal 070707.
|
|
.It Va h_dev , Va h_ino
|
|
The device and inode numbers from the disk.
|
|
These are used by programs that read
|
|
.Nm
|
|
archives to determine when two entries refer to the same file.
|
|
Programs that synthesize
|
|
.Nm
|
|
archives should be careful to set these to distinct values for each entry.
|
|
.It Va h_mode
|
|
The mode specifies both the regular permissions and the file type, and
|
|
it also holds a couple of bits that are irrelevant to the cpio format,
|
|
because the field is actually a raw copy of the mode field in the inode
|
|
representing the file. These are the IALLOC flag, which shows that
|
|
the inode entry is in use, and the ILARG flag, which shows that the
|
|
file it represents is large enough to have indirect blocks pointers in
|
|
the inode.
|
|
The mode is decoded as follows:
|
|
.Pp
|
|
.Bl -tag -width "MMMMMMM" -compact
|
|
.It 0100000
|
|
IALLOC flag - irrelevant to cpio.
|
|
.It 0060000
|
|
This masks the file type bits.
|
|
.It 0040000
|
|
File type value for directories.
|
|
.It 0020000
|
|
File type value for character special devices.
|
|
.It 0060000
|
|
File type value for block special devices.
|
|
.It 0010000
|
|
ILARG flag - irrelevant to cpio.
|
|
.It 0004000
|
|
SUID bit.
|
|
.It 0002000
|
|
SGID bit.
|
|
.It 0001000
|
|
Sticky bit.
|
|
.It 0000777
|
|
The lower 9 bits specify read/write/execute permissions
|
|
for world, group, and user following standard POSIX conventions.
|
|
.El
|
|
.It Va h_uid , Va h_gid
|
|
The numeric user id and group id of the owner.
|
|
.It Va h_nlink
|
|
The number of links to this file.
|
|
Directories always have a value of at least two here.
|
|
Note that hardlinked files include file data with every copy in the archive.
|
|
.It Va h_majmin
|
|
For block special and character special entries,
|
|
this field contains the associated device number, with the major
|
|
number in the high byte, and the minor number in the low byte.
|
|
For all other entry types, it should be set to zero by writers
|
|
and ignored by readers.
|
|
.It Va h_mtime
|
|
Modification time of the file, indicated as the number
|
|
of seconds since the start of the epoch,
|
|
00:00:00 UTC January 1, 1970.
|
|
.It Va h_namesize
|
|
The number of bytes in the pathname that follows the header.
|
|
This count includes the trailing NUL byte.
|
|
.It Va h_filesize
|
|
The size of the file. Note that this archive format is limited to 16
|
|
megabyte file sizes, because PWB UNIX, like 6th Edition, only used
|
|
an unsigned 24 bit integer for the file size internally.
|
|
.El
|
|
.Pp
|
|
The pathname immediately follows the fixed header.
|
|
If
|
|
.Cm h_namesize
|
|
is odd, an additional NUL byte is added after the pathname.
|
|
The file data is then appended, again with an additional NUL
|
|
appended if needed to get the next header at an even offset.
|
|
.Pp
|
|
Hardlinked files are not given special treatment;
|
|
the full file contents are included with each copy of the
|
|
file.
|
|
.Ss New Binary Format
|
|
The new binary
|
|
.Nm
|
|
format showed up when cpio was adopted into late 7th Edition UNIX.
|
|
It is exactly like the PWB binary format, described above, except for
|
|
three changes:
|
|
.Pp
|
|
First, UNIX now ran on more than one hardware type, so the endianness
|
|
of 16 bit integers must be determined by observing the magic number at
|
|
the start of the header. The 32 bit integers are still always stored
|
|
with the most significant word first, though, so each of those two, in
|
|
the struct shown above, was stored as an array of two 16 bit integers,
|
|
in the traditional order. Those 16 bit integers, like all the others
|
|
in the struct, were accessed using a macro that byte swapped them if
|
|
necessary.
|
|
.Pp
|
|
Next, 7th Edition had more file types to store, and the IALLOC and ILARG
|
|
flag bits were re-purposed to accommodate these. The revised use of the
|
|
various bits is as follows:
|
|
.Pp
|
|
.Bl -tag -width "MMMMMMM" -compact
|
|
.It 0170000
|
|
This masks the file type bits.
|
|
.It 0140000
|
|
File type value for sockets.
|
|
.It 0120000
|
|
File type value for symbolic links.
|
|
For symbolic links, the link body is stored as file data.
|
|
.It 0100000
|
|
File type value for regular files.
|
|
.It 0060000
|
|
File type value for block special devices.
|
|
.It 0040000
|
|
File type value for directories.
|
|
.It 0020000
|
|
File type value for character special devices.
|
|
.It 0010000
|
|
File type value for named pipes or FIFOs.
|
|
.It 0004000
|
|
SUID bit.
|
|
.It 0002000
|
|
SGID bit.
|
|
.It 0001000
|
|
Sticky bit.
|
|
.It 0000777
|
|
The lower 9 bits specify read/write/execute permissions
|
|
for world, group, and user following standard POSIX conventions.
|
|
.El
|
|
.Pp
|
|
Finally, the file size field now represents a signed 32 bit integer in
|
|
the underlying file system, so the maximum file size has increased to
|
|
2 gigabytes.
|
|
.Pp
|
|
Note that there is no obvious way to tell which of the two binary
|
|
formats an archive uses, other than to see which one makes more
|
|
sense. The typical error scenario is that a PWB format archive
|
|
unpacked as if it were in the new format will create named sockets
|
|
instead of directories, and then fail to unpack files that should
|
|
go in those directories. Running
|
|
.Va bsdcpio -itv
|
|
on an unknown archive will make it obvious which it is: if it's
|
|
PWB format, directories will be listed with an 's' instead of
|
|
a 'd' as the first character of the mode string, and the larger
|
|
files will have a '?' in that position.
|
|
.Ss Portable ASCII Format
|
|
.St -susv2
|
|
standardized an ASCII variant that is portable across all
|
|
platforms.
|
|
It is commonly known as the
|
|
.Dq old character
|
|
format or as the
|
|
.Dq odc
|
|
format.
|
|
It stores the same numeric fields as the old binary format, but
|
|
represents them as 6-character or 11-character octal values.
|
|
.Pp
|
|
.Bd -literal -offset indent
|
|
struct cpio_odc_header {
|
|
char c_magic[6];
|
|
char c_dev[6];
|
|
char c_ino[6];
|
|
char c_mode[6];
|
|
char c_uid[6];
|
|
char c_gid[6];
|
|
char c_nlink[6];
|
|
char c_rdev[6];
|
|
char c_mtime[11];
|
|
char c_namesize[6];
|
|
char c_filesize[11];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields are identical to those in the new binary format.
|
|
The name and file body follow the fixed header.
|
|
Unlike the binary formats, there is no additional padding
|
|
after the pathname or file contents.
|
|
If the files being archived are themselves entirely ASCII, then
|
|
the resulting archive will be entirely ASCII, except for the
|
|
NUL byte that terminates the name field.
|
|
.Ss New ASCII Format
|
|
The "new" ASCII format uses 8-byte hexadecimal fields for
|
|
all numbers and separates device numbers into separate fields
|
|
for major and minor numbers.
|
|
.Pp
|
|
.Bd -literal -offset indent
|
|
struct cpio_newc_header {
|
|
char c_magic[6];
|
|
char c_ino[8];
|
|
char c_mode[8];
|
|
char c_uid[8];
|
|
char c_gid[8];
|
|
char c_nlink[8];
|
|
char c_mtime[8];
|
|
char c_filesize[8];
|
|
char c_devmajor[8];
|
|
char c_devminor[8];
|
|
char c_rdevmajor[8];
|
|
char c_rdevminor[8];
|
|
char c_namesize[8];
|
|
char c_check[8];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Except as specified below, the fields here match those specified
|
|
for the new binary format above.
|
|
.Bl -tag -width indent
|
|
.It Va magic
|
|
The string
|
|
.Dq 070701 .
|
|
.It Va check
|
|
This field is always set to zero by writers and ignored by readers.
|
|
See the next section for more details.
|
|
.El
|
|
.Pp
|
|
The pathname is followed by NUL bytes so that the total size
|
|
of the fixed header plus pathname is a multiple of four.
|
|
Likewise, the file data is padded to a multiple of four bytes.
|
|
Note that this format supports only 4 gigabyte files (unlike the
|
|
older ASCII format, which supports 8 gigabyte files).
|
|
.Pp
|
|
In this format, hardlinked files are handled by setting the
|
|
filesize to zero for each entry except the first one that
|
|
appears in the archive.
|
|
.Ss New CRC Format
|
|
The CRC format is identical to the new ASCII format described
|
|
in the previous section except that the magic field is set
|
|
to
|
|
.Dq 070702
|
|
and the
|
|
.Va check
|
|
field is set to the sum of all bytes in the file data.
|
|
This sum is computed treating all bytes as unsigned values
|
|
and using unsigned arithmetic.
|
|
Only the least-significant 32 bits of the sum are stored.
|
|
.Ss HP variants
|
|
The
|
|
.Nm cpio
|
|
implementation distributed with HPUX used XXXX but stored
|
|
device numbers differently XXX.
|
|
.Ss Other Extensions and Variants
|
|
Sun Solaris uses additional file types to store extended file
|
|
data, including ACLs and extended attributes, as special
|
|
entries in cpio archives.
|
|
.Pp
|
|
XXX Others? XXX
|
|
.Sh SEE ALSO
|
|
.Xr cpio 1 ,
|
|
.Xr tar 5
|
|
.Sh STANDARDS
|
|
The
|
|
.Nm cpio
|
|
utility is no longer a part of POSIX or the Single Unix Standard.
|
|
It last appeared in
|
|
.St -susv2 .
|
|
It has been supplanted in subsequent standards by
|
|
.Xr pax 1 .
|
|
The portable ASCII format is currently part of the specification for the
|
|
.Xr pax 1
|
|
utility.
|
|
.Sh HISTORY
|
|
The original cpio utility was written by Dick Haight
|
|
while working in AT&T's Unix Support Group.
|
|
It appeared in 1977 as part of PWB/UNIX 1.0, the
|
|
.Dq Programmer's Work Bench
|
|
derived from
|
|
.At v6
|
|
that was used internally at AT&T.
|
|
Both the new binary and old character formats were in use
|
|
by 1980, according to the System III source released
|
|
by SCO under their
|
|
.Dq Ancient Unix
|
|
license.
|
|
The character format was adopted as part of
|
|
.St -p1003.1-88 .
|
|
XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
|
|
.Sh BUGS
|
|
The
|
|
.Dq CRC
|
|
format is mis-named, as it uses a simple checksum and
|
|
not a cyclic redundancy check.
|
|
.Pp
|
|
The binary formats are limited to 16 bits for user id, group id,
|
|
device, and inode numbers. They are limited to 16 megabyte and 2
|
|
gigabyte file sizes for the older and newer variants, respectively.
|
|
.Pp
|
|
The old ASCII format is limited to 18 bits for
|
|
the user id, group id, device, and inode numbers.
|
|
It is limited to 8 gigabyte file sizes.
|
|
.Pp
|
|
The new ASCII format is limited to 4 gigabyte file sizes.
|
|
.Pp
|
|
None of the cpio formats store user or group names,
|
|
which are essential when moving files between systems with
|
|
dissimilar user or group numbering.
|
|
.Pp
|
|
Especially when writing older cpio variants, it may be necessary
|
|
to map actual device/inode values to synthesized values that
|
|
fit the available fields.
|
|
With very large filesystems, this may be necessary even for
|
|
the newer formats.
|