Archive Object
The Archive exists within the Archiver namespace
-
class Archive
A class interface for direct reading to and from tar-like entities on disk.
In Write mode, an Archive can stream data to individual ‘files’ within a self-contained archive format, grouping together files for ease of transport and preventing clutter. The Write format mimics the tar standard. The Read mode allows data to be extracted from such archives - including from tar archives created using externally tools, so long as those archives are uncompressed.
Constructors & Initialisers
-
Archive()
Default initialiser, which places the object into an ::ArchiveMode::Uninitialised state. Throws an error if used before Open() is called.
-
Archive(std::string archivePath, ArchiveMode mode = Read)
Initialises the object and immediately calls Open()
- Parameters:
archivePath – The name of the archive on disk
mode – The ArchiveMode assigned to the object
-
void Open(std::string archivePath, ArchiveMode mode)
Initialises the Archive into a valid state and sets the file location of the archive on disk.
Depending on the mode, calls either OpenForReading(), OpenForWriting() or OpenForAppending(). See those documentation for more details.
- Parameters:
archivePath – The name of the archive on disk
mode – The ArchiveMode assigned to the object
- Throws:
Warning – when called on an already-open archive, before Close() has been called.
runtime_error – when mode is ::ArchiveMode::Uninitialised
Destructors and Cleaners
-
void Close()
Deactivates the current Archive stream. And begins archive flushing.
In read mode, simply shuts down Stream. In Write mode, signals the end of the archive, calling WriteCleanup() and (if necessary), cleaning up duplicate files.
Writing Functions
-
void WriteFile(WriteMetaData &&input)
The base write-to-tar function. Adds a file with this data into the target archive.
This code is near-verbatim the code from tar_to_stream. As a result, it has a lot of dark sorcery, magic number hackery inside.
- Parameters:
input – A formatted WriteMetaData object, corresponding to the bytestream of the data to be written,
- Throws:
warning – If input.filename already exists as a file within the archive — this file is then overwritten.
logic_error – If called whilst in Read mode
-
void WriteFile(const std::string &fileName, const std::string &data)
Writes the passed string as data into a file in the archive.
- Parameters:
fileName – The name of the file within the archive (can include subdirectories)
data – The string which will comprise the full contents of the file. Control characters (such as \n) are executed.
- Throws:
warning – If fileName already exists as a file within the archive — this file is then overwritten.
logic_error – If called whilst in Read mode
-
std::stringstream &ActivateStream(const std::string &fileName)
Activate FileBuffer as a valid stream, pointing towards a particular file within the archive.
sets OpenFileName and FileOpen as persistent entities
- Throws:
warning – If fileName already exists as a file within the archive — this file is then overwritten.
- Returns:
A reference to FileBuffer
-
void DeactivateStream()
Closes FileBuffer, and flushes the output to WriteFile, adding it into the archive at the specified location.
Unsets OpenFileName and FileOpen.
-
template<class T>
inline std::stringstream &operator<<(const T &msg) Streams the input data into FileBuffer.
- Throws:
logic_error – If called before ActivateStream() is called.
- Returns:
A reference to FileBuffer, allowing chaining of streams
Reading Functions
-
std::vector<std::string> ListFiles()
Get a vector list of files contained in the archive.
- Throws:
when – called whilst in write mode
- Returns:
A vector with each element being the name of a file in the archive
-
std::string GetText(std::string file)
Get the entire text of a file as a single string
- Parameters:
file – The file to be queried
- Throws:
logic_error – if called whilst in write mode
runtime_error – if file not in the archive
- Returns:
A string representing the entire entire file, interpreted as plaintext.
-
void ForLineIn(const std::string &fileName, std::function<void(std::string_view)> perLineFunction)
Iterates line-by-line through a plaintext file, calling perLineFunction on each line.
- Parameters:
fileName – The file to be iterated through
perLineFunction – A function (expressed as a lambda or an std::function) which is called on each line. The function cannot have a return value, but may capture-by-reference values outside the lambda scope.
- Throws:
logic_error – if called whilst in write mode
runtime_error – if fileName not in the archive
-
template<typename ...ColumnTypes, typename TupleFunctor>
inline void ForTabularLineIn(const std::string &fileName, std::string delimiter, TupleFunctor perTupleFunction) Parse the datafile and convert each line into a tuple of datatypes based on an assumed regular tabular. Then performs a callback function on each line.
Each line is converted into a tuple of types specified by the ColumnTypes parameter (i.e. ForTabularLineIn<int,int,int> attempts to read the file as a set of three integers). Tuples are then accessed via std::get<i> to get the ith element of the tuple.
- Parameters:
ColumnTypes – An (arbitrary) number of typenames, representing the converted types of each element. typenames must have an associated convert() function.
TupleFunctor – (optional — usually compiler-inferred) template parameter for the type of the per-tuple callback function
fileName – The file to be queried
delimiter – The string delimiter between ColumnTypes in the file
perTupleFunction – The function to be called on
- Throws:
runtime_error – If the columns of the file cannot be interpreted as a regular grid of len(ColumnTypes) with consistent data types with the specified delimiter
runtime_error – If a datatype in a column cannot be converted into the specified typr
logic_error – when called whilst in write mode
logic_error – when fileName not in the archive
-
template<typename ...ColumnTypes>
inline std::vector<std::tuple<ColumnTypes...>> GetTabular(const std::string &fileName, std::string delimiter = " ") Parse the datafile and convert each line into a tuple of datatypes.
Each line is converted into a tuple of types specified by the ColumnTypes parameter (i.e. ForTabularLineIn<int,int,int> attempts to read the file as a set of three integers). Tuples are then accessed via std::get<i> to get the ith element of the tuple.
- Parameters:
ColumnTypes – An (arbitrary) number of typenames, representing the converted types of each element. typenames must have an associated convert() function.
fileName – The file to be queried
delimiter – The string delimiter between ColumnTypes in the file
- Throws:
runtime_error – If the columns of the file cannot be interpreted as a regular grid of len(ColumnTypes) with consistent data types with the specified delimiter
runtime_error – If a datatype in a column cannot be converted into the specified typr
logic_error – if called whilst in write mode
logic_error – if fileName not in the archive
- Returns:
A vector of tuples, with each element representing a line in the datafile.
Public Functions
Public Members
-
bool ExpectingEmpty = false
When true, suppresses the warnings associated with HasWritten. Indicates that an archive is allowed to be empty.
Private Functions
-
void OpenForReading()
Configures the Archive for Reading and opens the associated streams.
Called when Open() or ChangeMode() set the mode to reading, or at the beginning of an OpenForAppending(). Calls BuildIndex() in order to establish archive integrity
- Throws:
runtime_error – If the file given by Name does not exist on disk (or is otherwise inaccessible)
runtime_error – If the archive cannot be read, is corrupted, or is missing the tar-defined nullbyte termination sequence
-
void OpenForWriting()
Configures the Archive for Writing and opens the associated streams.
Called when Open() or ChangeMode() set the mode to writing, or at the end of an OpenForAppending(). Overwrites any existing files at the location of Name.
- Throws:
warning – If a file with the name Name already exists, but does not prevent it from being overwritten
runtime_error – If the output file stream cannot be opened
-
void OpenForAppending()
Copies an existing archive in-place, and then opens it. Allows for new files to be added to an existing archive.
Warning
Does not allow for appending to files in an existing archive. This operation attempts to read the archive entirely into memory, before re-writing it. This is computationally expensive. Append mode should be used with caution.
- Throws:
runtime_error – If either OpenForReading() or OpenForWriting() would throw an error.
-
void WriteCleanup(unsigned int tailBlockRepetition = 2u)
Writes the terminating null-byte strings to an archive during.
-
void CheckWriteRegistry(std::string fileName)
Checks if fileName exists within FileIndex.
If fileName is found, sets RequiresDuplicateCleanup to true, which flags the old version as being redundant.
- Throws:
warning – If fileName is found, and not in Append mode.
-
void BuildIndex()
Scans through the archive block-by-block and detects the breakpoints between files, building an index of the internal structure of the archive.
- Throws:
runtime_error – If the archive is not correctly null-terminated
-
bool ReadBlock(char *buffer)
Reads in a block of size BLOCK_SIZE into the memory buffer.
- Parameters:
buffer – a c-style buffer, part of the file-streaming primitives
-
void StreamBlocks(std::string fileName, std::function<void(std::string)> dataCallback)
Iteratively calls ReadBlock(), and performs a callback function on it.
This forms the core of all Read functions. Note, however, that each Block is 512 bytes long, and does not necessarily correspond to anything particularly meaningful. File readins based on i.e. linebreaks have to get clever.
- Parameters:
fileName – The file to be queried
dataCallback – The function to be called on each block. Usually an accumulator of some form
-
void CheckValidState(ArchiveMode targetState)
A check performed at various stages to enforce the write/read divide.
Private Members
-
bool HasWritten
When true, the Archive has called WriteFile, and an archive has been populated. Used to throw warnings on creating empty archives.
-
bool HasClosed
When true, the Archive has gracefully closed down (via the Close() function). Calls to Write or Read will fail.
-
bool FileOpen
When true, a Write-stream is currently active (via ActivateStream()), writing to a single file within the archive. Used to throw warnings, and detect when to shut down the Stream object.
-
std::string OpenFileName
If FileOpen is true, holds the name of the file the Write-stream is currently opened to.
-
std::stringstream FileBuffer
If FileOpen is true, this acts as the Write-stream. Objects streamed into the Archive are streamed here until DeactivateStream() is called.
-
bool RequiresDuplicateCleanup
Detects if a file has been written with the same name as another one. The older file remains within the archive, but cannot be accessed, since the newer one takes priority. If true, triggers the duplication-cleanup process in the destructor.
-
std::fstream Stream
The stream associated with the archive on disk. Performs the complex read and write operations. Not to be confused with FileBuffer.
-
std::unordered_map<std::string, ReadMetaData> FileIndex
A list of files within the current archive. In read mode, the ReadMetaData is populated, allowing for Stream to jump to the correct location. In Write mode, this is used to set RequiresDuplicateCleanup, and the ReadMetaData is spoofed.
-
std::string Name
The filename and path associated with the archive - set during the constructor, or by Open()
-
ArchiveMode Mode
The current ArchiveMode. Set by construction, Open() or ChangeMode(). Changing the mode is (essentially) equivalent to re-constructing the Archive-in place.
Private Static Attributes
-
static constexpr size_t BLOCK_SIZE = 512
A magic number associated with the tar block size. DO NOT CHANGE!
-
Archive()