Interface Chunker

All Known Implementing Classes:
IterativeStreamChunker

public interface Chunker
Interface for algorithms that are able to chunk data streams for data deduplication.

Use ChunkerBuilder for convenient construction of instances.

Author:
Daniel Tischner <zabuza.dev@gmail.com>
  • Method Summary

    Modifier and Type Method Description
    default java.lang.Iterable<Chunk> chunk​(byte[] data)
    Chunks the given data into chunks.
    java.lang.Iterable<Chunk> chunk​(java.io.InputStream stream, long size)
    Chunks the given stream into chunks.
    default java.lang.Iterable<Chunk> chunk​(java.nio.file.Path path)
    Chunks the data available at the given path.
    default java.lang.Iterable<Chunk> chunk​(java.util.stream.Stream<? extends java.nio.file.Path> paths)
    Chunks all given regular files into chunks.
  • Method Details

    • chunk

      java.lang.Iterable<Chunk> chunk​(java.io.InputStream stream, long size)
      Chunks the given stream into chunks. The stream is consumed and populates the resulting iterable lazily as it is consumed.

      Chunks own their bytes, so it is preferable to process them directly and avoid first collecting all of them.

      Parameters:
      stream - The data stream to chunk, not null
      size - The amount of bytes available in the stream that are subject to be chunked, the stream must offer at least that many bytes. Must be positive and not zero.
      Returns:
      The chunks of the stream, lazily populated
    • chunk

      default java.lang.Iterable<Chunk> chunk​(java.util.stream.Stream<? extends java.nio.file.Path> paths)
      Chunks all given regular files into chunks. The stream is consumed and populates the resulting iterable lazily as it is consumed.

      Chunks own their bytes, so it is preferable to process them directly and avoid first collecting all of them.

      The stream is consumed sequential, files are not processed parallel.

      Parameters:
      paths - Stream of files to process, only regular files are chunked, not null
      Returns:
      The chunks of the stream, lazily populated
    • chunk

      default java.lang.Iterable<Chunk> chunk​(byte[] data)
      Chunks the given data into chunks. The data is consumed and populates the resulting iterable lazily as it is consumed.
      Parameters:
      data - The data to chunk, not null and not empty
      Returns:
      The chunks of the stream, lazily populated
    • chunk

      default java.lang.Iterable<Chunk> chunk​(java.nio.file.Path path)
      Chunks the data available at the given path. The path must either be a regular file or a directory. In case of a directory, the method recursively traverses the directory and lazily collects all regular files.

      The stream is consumed and populates the resulting iterable lazily as it is consumed.

      Chunks own their bytes, so it is preferable to process them directly and avoid first collecting all of them.

      The stream is consumed sequential, files are not processed parallel.

      Parameters:
      path - Either a regular file or a directory to traverse, only regular files are processed, not null
      Returns:
      The chunks of the stream, lazily populated