Class ZipArchiveInputStream

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, InputStreamStatistics
    Direct Known Subclasses:
    JarArchiveInputStream

    public class ZipArchiveInputStream
    extends ArchiveInputStream<ZipArchiveEntry>
    implements InputStreamStatistics
    Implements an input stream that can read Zip archives.

    As of Apache Commons Compress it transparently supports Zip64 extensions and thus individual entries and archives larger than 4 GB or with more than 65536 entries.

    The ZipFile class is preferred when reading from files as ZipArchiveInputStream is limited by not being able to read the central directory header before returning entries. In particular ZipArchiveInputStream

    • may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.
    • may return several entries with the same name.
    • will not return internal or external attributes.
    • may return incomplete extra field data.
    • may return unknown sizes and CRC values for entries until the next entry has been reached if the archive uses the data descriptor feature.
    See Also:
    ZipFile
    • Constructor Summary

      Constructors 
      Constructor Description
      ZipArchiveInputStream​(java.io.InputStream inputStream)
      Create an instance using UTF-8 encoding
      ZipArchiveInputStream​(java.io.InputStream inputStream, java.lang.String encoding)
      Create an instance using the specified encoding
      ZipArchiveInputStream​(java.io.InputStream inputStream, java.lang.String encoding, boolean useUnicodeExtraFields)
      Create an instance using the specified encoding
      ZipArchiveInputStream​(java.io.InputStream inputStream, java.lang.String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor)
      Create an instance using the specified encoding
      ZipArchiveInputStream​(java.io.InputStream inputStream, java.lang.String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor, boolean skipSplitSig)
      Create an instance using the specified encoding
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      private boolean bufferContainsSignature​(java.io.ByteArrayOutputStream bos, int offset, int lastRead, int expectedDDLen)
      Checks whether the current buffer contains the signature of a "data descriptor", "local file header" or "central directory entry".
      private int cacheBytesRead​(java.io.ByteArrayOutputStream bos, int offset, int lastRead, int expectedDDLen)
      If the last read bytes could hold a data descriptor and an incomplete signature then save the last bytes to the front of the buffer and cache everything in front of the potential data descriptor into the given ByteArrayOutputStream.
      boolean canReadEntryData​(ArchiveEntry ae)
      Whether this class is able to read the given entry.
      private static boolean checksig​(byte[] signature, byte[] expected)  
      void close()  
      private void closeEntry()
      Closes the current ZIP archive entry and positions the underlying stream to the beginning of the next entry.
      private boolean currentEntryHasOutstandingBytes()
      If the compressed size of the current entry is included in the entry header and there are any outstanding bytes in the underlying stream, then this returns true.
      private void drainCurrentEntryData()
      Read all data of the current entry from the underlying stream that hasn't been read, yet.
      private int fill()  
      private boolean findEocdRecord()
      Reads forward until the signature of the "End of central directory" record is found.
      private long getBytesInflated()
      Gets the number of bytes Inflater has actually processed.
      long getCompressedCount()
      Gets the amount of raw or compressed bytes read by the stream.
      ZipArchiveEntry getNextEntry()
      Returns the next Archive Entry in this Stream.
      ZipArchiveEntry getNextZipEntry()
      Deprecated.
      long getUncompressedCount()
      Gets the amount of decompressed bytes returned by the stream.
      private boolean isApkSigningBlock​(byte[] suspectLocalFileHeader)
      Checks whether this might be an APK Signing Block.
      private boolean isFirstByteOfEocdSig​(int b)  
      static boolean matches​(byte[] signature, int length)
      Checks if the signature matches what is expected for a ZIP file.
      private void processZip64Extra​(ZipLong size, ZipLong cSize)
      Records whether a Zip64 extra is present and sets the size information from it if sizes are 0xFFFFFFFF and the entry doesn't use a data descriptor.
      private void pushback​(byte[] buf, int offset, int length)  
      int read​(byte[] buffer, int offset, int length)  
      private void readDataDescriptor()  
      private int readDeflated​(byte[] buffer, int offset, int length)
      Implementation of read for DEFLATED entries.
      private void readFirstLocalFileHeader()
      Fills the given array with the first local file header and deals with splitting/spanning markers that may prefix the first LFH.
      private int readFromInflater​(byte[] buffer, int offset, int length)
      Potentially reads more bytes to fill the inflater's buffer and reads from it.
      private void readFully​(byte[] b)  
      private void readFully​(byte[] b, int off)  
      private int readOneByte()
      Reads bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which read(byte[], int, int) would do.
      private byte[] readRange​(int len)  
      private int readStored​(byte[] buffer, int offset, int length)
      Implementation of read for STORED entries.
      private void readStoredEntry()
      Caches a stored entry that uses the data descriptor.
      private void realSkip​(long value)
      Skips bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which skip(long) would do.
      long skip​(long value)
      Skips over and discards value bytes of data from this input stream.
      private void skipRemainderOfArchive()
      Reads the stream until it find the "End of central directory record" and consumes it as well.
      private boolean supportsCompressedSizeFor​(ZipArchiveEntry entry)
      Whether the compressed size for the entry is either known or not required by the compression method being used.
      private boolean supportsDataDescriptorFor​(ZipArchiveEntry entry)
      Whether this entry requires a data descriptor this library can work with.
      • Methods inherited from class java.io.InputStream

        available, mark, markSupported, read, reset
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • USE_ZIPFILE_INSTEAD_OF_STREAM_DISCLAIMER

        private static final java.lang.String USE_ZIPFILE_INSTEAD_OF_STREAM_DISCLAIMER
        See Also:
        Constant Field Values
      • LFH

        private static final byte[] LFH
      • CFH

        private static final byte[] CFH
      • DD

        private static final byte[] DD
      • APK_SIGNING_BLOCK_MAGIC

        private static final byte[] APK_SIGNING_BLOCK_MAGIC
      • LONG_MAX

        private static final java.math.BigInteger LONG_MAX
      • zipEncoding

        private final ZipEncoding zipEncoding
        The ZIP encoding to use for file names and the file comment.
      • encoding

        final java.lang.String encoding
      • useUnicodeExtraFields

        private final boolean useUnicodeExtraFields
        Whether to look for and use Unicode extra fields.
      • inputStream

        private final java.io.InputStream inputStream
        Wrapped stream, will always be a PushbackInputStream.
      • inf

        private final java.util.zip.Inflater inf
        Inflater used for all deflated entries.
      • buf

        private final java.nio.ByteBuffer buf
        Buffer used to read from the wrapped stream.
      • closed

        private boolean closed
        Whether the stream has been closed.
      • hitCentralDirectory

        private boolean hitCentralDirectory
        Whether the stream has reached the central directory - and thus found all entries.
      • lastStoredEntry

        private java.io.ByteArrayInputStream lastStoredEntry
        When reading a stored entry that uses the data descriptor this stream has to read the full entry and caches it. This is the cache.
      • allowStoredEntriesWithDataDescriptor

        private final boolean allowStoredEntriesWithDataDescriptor
        Whether the stream will try to read STORED entries that use a data descriptor. Setting it to true means we will not stop reading an entry with the compressed size, instead we will stop reading an entry when a data descriptor is met (by finding the Data Descriptor Signature). This will completely break down in some cases - like JARs in WARs.

        See also : https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-555 https://github.com/apache/commons-compress/pull/137#issuecomment-690835644

      • uncompressedCount

        private long uncompressedCount
        Count decompressed bytes for current entry
      • skipSplitSig

        private final boolean skipSplitSig
        Whether the stream will try to skip the ZIP split signature(08074B50) at the beginning
      • lfhBuf

        private final byte[] lfhBuf
      • skipBuf

        private final byte[] skipBuf
      • shortBuf

        private final byte[] shortBuf
      • wordBuf

        private final byte[] wordBuf
      • twoDwordBuf

        private final byte[] twoDwordBuf
      • entriesRead

        private int entriesRead
    • Constructor Detail

      • ZipArchiveInputStream

        public ZipArchiveInputStream​(java.io.InputStream inputStream)
        Create an instance using UTF-8 encoding
        Parameters:
        inputStream - the stream to wrap
      • ZipArchiveInputStream

        public ZipArchiveInputStream​(java.io.InputStream inputStream,
                                     java.lang.String encoding)
        Create an instance using the specified encoding
        Parameters:
        inputStream - the stream to wrap
        encoding - the encoding to use for file names, use null for the platform's default encoding
        Since:
        1.5
      • ZipArchiveInputStream

        public ZipArchiveInputStream​(java.io.InputStream inputStream,
                                     java.lang.String encoding,
                                     boolean useUnicodeExtraFields)
        Create an instance using the specified encoding
        Parameters:
        inputStream - the stream to wrap
        encoding - the encoding to use for file names, use null for the platform's default encoding
        useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      • ZipArchiveInputStream

        public ZipArchiveInputStream​(java.io.InputStream inputStream,
                                     java.lang.String encoding,
                                     boolean useUnicodeExtraFields,
                                     boolean allowStoredEntriesWithDataDescriptor)
        Create an instance using the specified encoding
        Parameters:
        inputStream - the stream to wrap
        encoding - the encoding to use for file names, use null for the platform's default encoding
        useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
        allowStoredEntriesWithDataDescriptor - whether the stream will try to read STORED entries that use a data descriptor
        Since:
        1.1
      • ZipArchiveInputStream

        public ZipArchiveInputStream​(java.io.InputStream inputStream,
                                     java.lang.String encoding,
                                     boolean useUnicodeExtraFields,
                                     boolean allowStoredEntriesWithDataDescriptor,
                                     boolean skipSplitSig)
        Create an instance using the specified encoding
        Parameters:
        inputStream - the stream to wrap
        encoding - the encoding to use for file names, use null for the platform's default encoding
        useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
        allowStoredEntriesWithDataDescriptor - whether the stream will try to read STORED entries that use a data descriptor
        skipSplitSig - Whether the stream will try to skip the zip split signature(08074B50) at the beginning. You will need to set this to true if you want to read a split archive.
        Since:
        1.20
    • Method Detail

      • checksig

        private static boolean checksig​(byte[] signature,
                                        byte[] expected)
      • matches

        public static boolean matches​(byte[] signature,
                                      int length)
        Checks if the signature matches what is expected for a ZIP file. Does not currently handle self-extracting ZIPs which may have arbitrary leading content.
        Parameters:
        signature - the bytes to check
        length - the number of bytes to check
        Returns:
        true, if this stream is a ZIP archive stream, false otherwise
      • bufferContainsSignature

        private boolean bufferContainsSignature​(java.io.ByteArrayOutputStream bos,
                                                int offset,
                                                int lastRead,
                                                int expectedDDLen)
                                         throws java.io.IOException
        Checks whether the current buffer contains the signature of a "data descriptor", "local file header" or "central directory entry".

        If it contains such a signature, reads the data descriptor and positions the stream right after the data descriptor.

        Throws:
        java.io.IOException
      • cacheBytesRead

        private int cacheBytesRead​(java.io.ByteArrayOutputStream bos,
                                   int offset,
                                   int lastRead,
                                   int expectedDDLen)
        If the last read bytes could hold a data descriptor and an incomplete signature then save the last bytes to the front of the buffer and cache everything in front of the potential data descriptor into the given ByteArrayOutputStream.

        Data descriptor plus incomplete signature (3 bytes in the worst case) can be 20 bytes max.

      • canReadEntryData

        public boolean canReadEntryData​(ArchiveEntry ae)
        Whether this class is able to read the given entry.

        May return false if it is set up to use encryption or a compression method that hasn't been implemented yet.

        Overrides:
        canReadEntryData in class ArchiveInputStream<ZipArchiveEntry>
        Parameters:
        ae - the entry to test
        Returns:
        This implementation always returns true.
        Since:
        1.1
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Overrides:
        close in class java.io.InputStream
        Throws:
        java.io.IOException
      • closeEntry

        private void closeEntry()
                         throws java.io.IOException
        Closes the current ZIP archive entry and positions the underlying stream to the beginning of the next entry. All per-entry variables and data structures are cleared.

        If the compressed size of this entry is included in the entry header, then any outstanding bytes are simply skipped from the underlying stream without uncompressing them. This allows an entry to be safely closed even if the compression method is unsupported.

        In case we don't know the compressed size of this entry or have already buffered too much data from the underlying stream to support uncompression, then the uncompression process is completed and the end position of the stream is adjusted based on the result of that process.

        Throws:
        java.io.IOException - if an error occurs
      • currentEntryHasOutstandingBytes

        private boolean currentEntryHasOutstandingBytes()
        If the compressed size of the current entry is included in the entry header and there are any outstanding bytes in the underlying stream, then this returns true.
        Returns:
        true, if current entry is determined to have outstanding bytes, false otherwise
      • drainCurrentEntryData

        private void drainCurrentEntryData()
                                    throws java.io.IOException
        Read all data of the current entry from the underlying stream that hasn't been read, yet.
        Throws:
        java.io.IOException
      • fill

        private int fill()
                  throws java.io.IOException
        Throws:
        java.io.IOException
      • findEocdRecord

        private boolean findEocdRecord()
                                throws java.io.IOException
        Reads forward until the signature of the "End of central directory" record is found.
        Throws:
        java.io.IOException
      • getBytesInflated

        private long getBytesInflated()
        Gets the number of bytes Inflater has actually processed.

        for Java < Java7 the getBytes* methods in Inflater/Deflater seem to return unsigned ints rather than longs that start over with 0 at 2^32.

        The stream knows how many bytes it has read, but not how many the Inflater actually consumed - it should be between the total number of bytes read for the entry and the total number minus the last read operation. Here we just try to make the value close enough to the bytes we've read by assuming the number of bytes consumed must be smaller than (or equal to) the number of bytes read but not smaller by more than 2^32.

      • getCompressedCount

        public long getCompressedCount()
        Description copied from interface: InputStreamStatistics
        Gets the amount of raw or compressed bytes read by the stream.
        Specified by:
        getCompressedCount in interface InputStreamStatistics
        Returns:
        the amount of raw or compressed bytes read by the stream.
        Since:
        1.17
      • getNextZipEntry

        @Deprecated
        public ZipArchiveEntry getNextZipEntry()
                                        throws java.io.IOException
        Deprecated.
        Gets the next entry.
        Returns:
        the next entry.
        Throws:
        java.io.IOException
      • getUncompressedCount

        public long getUncompressedCount()
        Description copied from interface: InputStreamStatistics
        Gets the amount of decompressed bytes returned by the stream.
        Specified by:
        getUncompressedCount in interface InputStreamStatistics
        Returns:
        the amount of decompressed bytes returned by the stream.
        Since:
        1.17
      • isApkSigningBlock

        private boolean isApkSigningBlock​(byte[] suspectLocalFileHeader)
                                   throws java.io.IOException
        Checks whether this might be an APK Signing Block.

        Unfortunately the APK signing block does not start with some kind of signature, it rather ends with one. It starts with a length, so what we do is parse the suspect length, skip ahead far enough, look for the signature and if we've found it, return true.

        Parameters:
        suspectLocalFileHeader - the bytes read from the underlying stream in the expectation that they would hold the local file header of the next entry.
        Returns:
        true if this looks like an APK signing block
        Throws:
        java.io.IOException
        See Also:
        https://source.android.com/security/apksigning/v2
      • isFirstByteOfEocdSig

        private boolean isFirstByteOfEocdSig​(int b)
      • processZip64Extra

        private void processZip64Extra​(ZipLong size,
                                       ZipLong cSize)
                                throws java.util.zip.ZipException
        Records whether a Zip64 extra is present and sets the size information from it if sizes are 0xFFFFFFFF and the entry doesn't use a data descriptor.
        Throws:
        java.util.zip.ZipException
      • pushback

        private void pushback​(byte[] buf,
                              int offset,
                              int length)
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • read

        public int read​(byte[] buffer,
                        int offset,
                        int length)
                 throws java.io.IOException
        Overrides:
        read in class java.io.InputStream
        Throws:
        java.io.IOException
      • readDataDescriptor

        private void readDataDescriptor()
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • readDeflated

        private int readDeflated​(byte[] buffer,
                                 int offset,
                                 int length)
                          throws java.io.IOException
        Implementation of read for DEFLATED entries.
        Throws:
        java.io.IOException
      • readFirstLocalFileHeader

        private void readFirstLocalFileHeader()
                                       throws java.io.IOException
        Fills the given array with the first local file header and deals with splitting/spanning markers that may prefix the first LFH.
        Throws:
        java.io.IOException
      • readFromInflater

        private int readFromInflater​(byte[] buffer,
                                     int offset,
                                     int length)
                              throws java.io.IOException
        Potentially reads more bytes to fill the inflater's buffer and reads from it.
        Throws:
        java.io.IOException
      • readFully

        private void readFully​(byte[] b)
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readFully

        private void readFully​(byte[] b,
                               int off)
                        throws java.io.IOException
        Throws:
        java.io.IOException
      • readOneByte

        private int readOneByte()
                         throws java.io.IOException
        Reads bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which read(byte[], int, int) would do. Also updates bytes-read counter.
        Throws:
        java.io.IOException
      • readRange

        private byte[] readRange​(int len)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • readStored

        private int readStored​(byte[] buffer,
                               int offset,
                               int length)
                        throws java.io.IOException
        Implementation of read for STORED entries.
        Throws:
        java.io.IOException
      • readStoredEntry

        private void readStoredEntry()
                              throws java.io.IOException
        Caches a stored entry that uses the data descriptor.
        • Reads a stored entry until the signature of a local file header, central directory header or data descriptor has been found.
        • Stores all entry data in lastStoredEntry.

        • Rewinds the stream to position at the data descriptor.
        • reads the data descriptor

        After calling this method the entry should know its size, the entry's data is cached and the stream is positioned at the next local file or central directory header.

        Throws:
        java.io.IOException
      • realSkip

        private void realSkip​(long value)
                       throws java.io.IOException
        Skips bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which skip(long) would do. Also updates bytes-read counter.
        Throws:
        java.io.IOException
      • skip

        public long skip​(long value)
                  throws java.io.IOException
        Skips over and discards value bytes of data from this input stream.

        This implementation may end up skipping over some smaller number of bytes, possibly 0, if and only if it reaches the end of the underlying stream.

        The actual number of bytes skipped is returned.

        Overrides:
        skip in class java.io.InputStream
        Parameters:
        value - the number of bytes to be skipped.
        Returns:
        the actual number of bytes skipped.
        Throws:
        java.io.IOException - - if an I/O error occurs.
        java.lang.IllegalArgumentException - - if value is negative.
      • skipRemainderOfArchive

        private void skipRemainderOfArchive()
                                     throws java.io.IOException
        Reads the stream until it find the "End of central directory record" and consumes it as well.
        Throws:
        java.io.IOException
      • supportsCompressedSizeFor

        private boolean supportsCompressedSizeFor​(ZipArchiveEntry entry)
        Whether the compressed size for the entry is either known or not required by the compression method being used.
      • supportsDataDescriptorFor

        private boolean supportsDataDescriptorFor​(ZipArchiveEntry entry)
        Whether this entry requires a data descriptor this library can work with.
        Returns:
        true if allowStoredEntriesWithDataDescriptor is true, the entry doesn't require any data descriptor or the method is DEFLATED or ENHANCED_DEFLATED.