Archive Object

The Archive exists within the Archiver namespace

class Archive

A class interface for direct reading to and from tar-like entities on disk.

In Write mode, an Archive can stream data to individual ‘files’ within a self-contained archive format, grouping together files for ease of transport and preventing clutter. The Write format mimics the tar standard. The Read mode allows data to be extracted from such archives - including from tar archives created using externally tools, so long as those archives are uncompressed.

Constructors & Initialisers

Archive()

Default initialiser, which places the object into an ::ArchiveMode::Uninitialised state. Throws an error if used before Open() is called.

Archive(std::string archivePath, ArchiveMode mode = Read)

Initialises the object and immediately calls Open()

Parameters:
  • archivePath – The name of the archive on disk

  • mode – The ArchiveMode assigned to the object

void Open(std::string archivePath, ArchiveMode mode)

Initialises the Archive into a valid state and sets the file location of the archive on disk.

Depending on the mode, calls either OpenForReading(), OpenForWriting() or OpenForAppending(). See those documentation for more details.

Parameters:
  • archivePath – The name of the archive on disk

  • mode – The ArchiveMode assigned to the object

Throws:
  • Warning – when called on an already-open archive, before Close() has been called.

  • runtime_error – when mode is ::ArchiveMode::Uninitialised

void ChangeMode(ArchiveMode mode)

As with Open(), but only changes the Mode, not the Name.

Functionally acts to create a new Archive object in-place with the new mode.

Parameters:

mode – The new mode to assign to Mode

Destructors and Cleaners

~Archive()

Shuts down any still-open streams, calling Close()

void Close()

Deactivates the current Archive stream. And begins archive flushing.

In read mode, simply shuts down Stream. In Write mode, signals the end of the archive, calling WriteCleanup() and (if necessary), cleaning up duplicate files.

Writing Functions

void WriteFile(WriteMetaData &&input)

The base write-to-tar function. Adds a file with this data into the target archive.

This code is near-verbatim the code from tar_to_stream. As a result, it has a lot of dark sorcery, magic number hackery inside.

Parameters:

input – A formatted WriteMetaData object, corresponding to the bytestream of the data to be written,

Throws:
  • warning – If input.filename already exists as a file within the archive — this file is then overwritten.

  • logic_error – If called whilst in Read mode

void WriteFile(const std::string &fileName, const std::string &data)

Writes the passed string as data into a file in the archive.

Parameters:
  • fileName – The name of the file within the archive (can include subdirectories)

  • data – The string which will comprise the full contents of the file. Control characters (such as \n) are executed.

Throws:
  • warning – If fileName already exists as a file within the archive — this file is then overwritten.

  • logic_error – If called whilst in Read mode

std::stringstream &ActivateStream(const std::string &fileName)

Activate FileBuffer as a valid stream, pointing towards a particular file within the archive.

sets OpenFileName and FileOpen as persistent entities

Throws:

warning – If fileName already exists as a file within the archive — this file is then overwritten.

Returns:

A reference to FileBuffer

void DeactivateStream()

Closes FileBuffer, and flushes the output to WriteFile, adding it into the archive at the specified location.

Unsets OpenFileName and FileOpen.

template<class T>
inline std::stringstream &operator<<(const T &msg)

Streams the input data into FileBuffer.

Throws:

logic_error – If called before ActivateStream() is called.

Returns:

A reference to FileBuffer, allowing chaining of streams

Reading Functions

std::vector<std::string> ListFiles()

Get a vector list of files contained in the archive.

Throws:

when – called whilst in write mode

Returns:

A vector with each element being the name of a file in the archive

std::string GetText(std::string file)

Get the entire text of a file as a single string

Parameters:

file – The file to be queried

Throws:
  • logic_error – if called whilst in write mode

  • runtime_error – if file not in the archive

Returns:

A string representing the entire entire file, interpreted as plaintext.

void ForLineIn(const std::string &fileName, std::function<void(std::string_view)> perLineFunction)

Iterates line-by-line through a plaintext file, calling perLineFunction on each line.

Parameters:
  • fileName – The file to be iterated through

  • perLineFunction – A function (expressed as a lambda or an std::function) which is called on each line. The function cannot have a return value, but may capture-by-reference values outside the lambda scope.

Throws:
  • logic_error – if called whilst in write mode

  • runtime_error – if fileName not in the archive

template<typename ...ColumnTypes, typename TupleFunctor>
inline void ForTabularLineIn(const std::string &fileName, std::string delimiter, TupleFunctor perTupleFunction)

Parse the datafile and convert each line into a tuple of datatypes based on an assumed regular tabular. Then performs a callback function on each line.

Each line is converted into a tuple of types specified by the ColumnTypes parameter (i.e. ForTabularLineIn<int,int,int> attempts to read the file as a set of three integers). Tuples are then accessed via std::get<i> to get the ith element of the tuple.

Parameters:
  • ColumnTypes – An (arbitrary) number of typenames, representing the converted types of each element. typenames must have an associated convert() function.

  • TupleFunctor – (optional &#8212; usually compiler-inferred) template parameter for the type of the per-tuple callback function

  • fileName – The file to be queried

  • delimiter – The string delimiter between ColumnTypes in the file

  • perTupleFunction – The function to be called on

Throws:
  • runtime_error – If the columns of the file cannot be interpreted as a regular grid of len(ColumnTypes) with consistent data types with the specified delimiter

  • runtime_error – If a datatype in a column cannot be converted into the specified typr

  • logic_error – when called whilst in write mode

  • logic_error – when fileName not in the archive

template<typename ...ColumnTypes>
inline std::vector<std::tuple<ColumnTypes...>> GetTabular(const std::string &fileName, std::string delimiter = " ")

Parse the datafile and convert each line into a tuple of datatypes.

Each line is converted into a tuple of types specified by the ColumnTypes parameter (i.e. ForTabularLineIn<int,int,int> attempts to read the file as a set of three integers). Tuples are then accessed via std::get<i> to get the ith element of the tuple.

Parameters:
  • ColumnTypes – An (arbitrary) number of typenames, representing the converted types of each element. typenames must have an associated convert() function.

  • fileName – The file to be queried

  • delimiter – The string delimiter between ColumnTypes in the file

Throws:
  • runtime_error – If the columns of the file cannot be interpreted as a regular grid of len(ColumnTypes) with consistent data types with the specified delimiter

  • runtime_error – If a datatype in a column cannot be converted into the specified typr

  • logic_error – if called whilst in write mode

  • logic_error – if fileName not in the archive

Returns:

A vector of tuples, with each element representing a line in the datafile.

Public Functions

Archive(const Archive&) = delete

Deleted copy constructor to prevent pointer nonsense.

Archive &operator=(const Archive&) = delete

Deleted move constructor to prevent pointer nonsense.

Public Members

bool ExpectingEmpty = false

When true, suppresses the warnings associated with HasWritten. Indicates that an archive is allowed to be empty.

Private Functions

void OpenForReading()

Configures the Archive for Reading and opens the associated streams.

Called when Open() or ChangeMode() set the mode to reading, or at the beginning of an OpenForAppending(). Calls BuildIndex() in order to establish archive integrity

Throws:
  • runtime_error – If the file given by Name does not exist on disk (or is otherwise inaccessible)

  • runtime_error – If the archive cannot be read, is corrupted, or is missing the tar-defined nullbyte termination sequence

void OpenForWriting()

Configures the Archive for Writing and opens the associated streams.

Called when Open() or ChangeMode() set the mode to writing, or at the end of an OpenForAppending(). Overwrites any existing files at the location of Name.

Throws:
  • warning – If a file with the name Name already exists, but does not prevent it from being overwritten

  • runtime_error – If the output file stream cannot be opened

void OpenForAppending()

Copies an existing archive in-place, and then opens it. Allows for new files to be added to an existing archive.

Warning

Does not allow for appending to files in an existing archive. This operation attempts to read the archive entirely into memory, before re-writing it. This is computationally expensive. Append mode should be used with caution.

Throws:

runtime_error – If either OpenForReading() or OpenForWriting() would throw an error.

void WriteCleanup(unsigned int tailBlockRepetition = 2u)

Writes the terminating null-byte strings to an archive during.

void CheckWriteRegistry(std::string fileName)

Checks if fileName exists within FileIndex.

If fileName is found, sets RequiresDuplicateCleanup to true, which flags the old version as being redundant.

Throws:

warning – If fileName is found, and not in Append mode.

void BuildIndex()

Scans through the archive block-by-block and detects the breakpoints between files, building an index of the internal structure of the archive.

Throws:

runtime_error – If the archive is not correctly null-terminated

bool ReadBlock(char *buffer)

Reads in a block of size BLOCK_SIZE into the memory buffer.

Parameters:

buffer – a c-style buffer, part of the file-streaming primitives

void StreamBlocks(std::string fileName, std::function<void(std::string)> dataCallback)

Iteratively calls ReadBlock(), and performs a callback function on it.

This forms the core of all Read functions. Note, however, that each Block is 512 bytes long, and does not necessarily correspond to anything particularly meaningful. File readins based on i.e. linebreaks have to get clever.

Parameters:
  • fileName – The file to be queried

  • dataCallback – The function to be called on each block. Usually an accumulator of some form

void CheckValidState(ArchiveMode targetState)

A check performed at various stages to enforce the write/read divide.

Parameters:

targetState – The expected write/read/append state of the Archive at the calling location

Throws:

logic_error – when Mode does not match targetState

Private Members

bool HasWritten

When true, the Archive has called WriteFile, and an archive has been populated. Used to throw warnings on creating empty archives.

bool HasClosed

When true, the Archive has gracefully closed down (via the Close() function). Calls to Write or Read will fail.

bool FileOpen

When true, a Write-stream is currently active (via ActivateStream()), writing to a single file within the archive. Used to throw warnings, and detect when to shut down the Stream object.

std::string OpenFileName

If FileOpen is true, holds the name of the file the Write-stream is currently opened to.

std::stringstream FileBuffer

If FileOpen is true, this acts as the Write-stream. Objects streamed into the Archive are streamed here until DeactivateStream() is called.

bool RequiresDuplicateCleanup

Detects if a file has been written with the same name as another one. The older file remains within the archive, but cannot be accessed, since the newer one takes priority. If true, triggers the duplication-cleanup process in the destructor.

std::fstream Stream

The stream associated with the archive on disk. Performs the complex read and write operations. Not to be confused with FileBuffer.

std::unordered_map<std::string, ReadMetaData> FileIndex

A list of files within the current archive. In read mode, the ReadMetaData is populated, allowing for Stream to jump to the correct location. In Write mode, this is used to set RequiresDuplicateCleanup, and the ReadMetaData is spoofed.

std::string Name

The filename and path associated with the archive - set during the constructor, or by Open()

ArchiveMode Mode

The current ArchiveMode. Set by construction, Open() or ChangeMode(). Changing the mode is (essentially) equivalent to re-constructing the Archive-in place.

Private Static Attributes

static constexpr size_t BLOCK_SIZE = 512

A magic number associated with the tar block size. DO NOT CHANGE!