ersilia.io.readers package¶
Submodules¶
ersilia.io.readers.file module¶
- class ersilia.io.readers.file.BaseJsonFile(path, IO, entity_is_list, expected_number)[source]¶
Bases:
object
Base class for handling JSON files.
- Parameters:
path (str) – Path to the file.
IO (object) – IO handler object.
entity_is_list (bool) – Whether the entity is a list.
expected_number (int) – Expected number of elements.
- class ersilia.io.readers.file.BaseTabularFile(path, IO, entity_is_list, expected_number, filter_by_column_validity=None, sniff_line_limit=100)[source]¶
Bases:
object
Base class for handling tabular files.
- Parameters:
path (str) – Path to the file.
IO (object) – IO handler object.
entity_is_list (bool) – Whether the entity is a list.
expected_number (int) – Expected number of columns.
filter_by_column_validity (bool, optional) – Whether to filter by column validity.
sniff_line_limit (int, optional) – Line limit for sniffing the file.
- get_delimiter()[source]¶
Get the delimiter used in the file.
- Returns:
The delimiter used in the file.
- Return type:
str
- get_string_delimiter()[source]¶
Get the string delimiter used in the file.
- Returns:
The string delimiter.
- Return type:
str
- has_header()[source]¶
Check if the file has a header.
- Returns:
True if the file has a header, False otherwise.
- Return type:
bool
- is_flattened()[source]¶
Check if the file is flattened.
- Returns:
True if the file is flattened, False otherwise.
- Return type:
bool
- is_input(v: str) bool [source]¶
Check if a value is an input.
- Parameters:
v (str) – The value to check.
- Returns:
True if the value is an input, False otherwise.
- Return type:
bool
- is_key(v: str) bool [source]¶
Check if a value is a key.
- Parameters:
v (str) – The value to check.
- Returns:
True if the value is a key, False otherwise.
- Return type:
bool
- is_single_input()[source]¶
Check if the file has a single input.
- Returns:
True if the file has a single input, False otherwise.
- Return type:
bool
- class ersilia.io.readers.file.BatchCacher[source]¶
Bases:
object
Class to handle caching of file batches.
- get_cached_files(prefix)[source]¶
Get cached files with a specific prefix.
- Parameters:
prefix (str) – The prefix of the cached files.
- Returns:
List of cached files with the specified prefix.
- Return type:
list
- get_cached_input_files()[source]¶
Get cached input files.
- Returns:
List of cached input files.
- Return type:
list
- get_cached_output_files()[source]¶
Get cached output files.
- Returns:
List of cached output files.
- Return type:
list
- name_cached_output_files(cached_inputs, output_template)[source]¶
Name cached output files based on cached input files and an output template.
- Parameters:
cached_inputs (list) – List of cached input files.
output_template (str) – Template for naming the output files.
- Returns:
List of named cached output files.
- Return type:
list
- class ersilia.io.readers.file.FileTyper(path)[source]¶
Bases:
object
Class to determine the type of a file based on its extension.
- Parameters:
path (str) – Path to the file.
- is_csv()[source]¶
Check if the file is a CSV file.
- Returns:
True if the file is a CSV file, False otherwise.
- Return type:
bool
- is_hdf5()[source]¶
Check if the file is an HDF5 file.
- Returns:
True if the file is an HDF5 file, False otherwise.
- Return type:
bool
- is_json()[source]¶
Check if the file is a JSON file.
- Returns:
True if the file is a JSON file, False otherwise.
- Return type:
bool
- is_tabular()[source]¶
Check if the file is a tabular file (CSV or TSV).
- Returns:
True if the file is a tabular file, False otherwise.
- Return type:
bool
- is_tsv()[source]¶
Check if the file is a TSV file.
- Returns:
True if the file is a TSV file, False otherwise.
- Return type:
bool
- class ersilia.io.readers.file.JsonFileReader(path, IO)[source]¶
Bases:
StandardJsonFileReader
Class to read and standardize JSON files.
- Parameters:
path (str) – Path to the file.
IO (object) – IO handler object.
- class ersilia.io.readers.file.JsonFileShapeStandardizer(src_path, dst_path, input_shape, IO)[source]¶
Bases:
BaseJsonFile
Class to standardize the shape of JSON files.
- Parameters:
src_path (str) – Source path of the file.
dst_path (str) – Destination path of the standardized file.
input_shape (str or object) – Input shape specification.
IO (object) – IO handler object.
- class ersilia.io.readers.file.StandardJsonFileReader(path)[source]¶
Bases:
BatchCacher
Class to read standard JSON files.
- Parameters:
path (str) – Path to the file.
Examples
>>> sjfr = StandardJsonFileReader("data.json") >>> sjfr.read() [{'key': 'value'}, {'key': 'value'}]
- is_worth_splitting()[source]¶
Check if the JSON file is worth splitting into smaller chunks.
- Returns:
True if the JSON file is worth splitting, False otherwise.
- Return type:
bool
- class ersilia.io.readers.file.StandardTabularFileReader(path)[source]¶
Bases:
BatchCacher
Class to read standard tabular files.
- Parameters:
path (str) – Path to the file.
- get_delimiter()[source]¶
Get the delimiter used in the file.
- Returns:
The delimiter used in the file.
- Return type:
str
- is_worth_splitting()[source]¶
Check if the file is worth splitting into smaller chunks.
- Returns:
True if the file is worth splitting, False otherwise.
- Return type:
bool
- class ersilia.io.readers.file.TabularFileReader(path, IO, sniff_line_limit=100)[source]¶
Bases:
StandardTabularFileReader
Class to read and standardize tabular files.
- Parameters:
path (str) – Path to the file.
IO (object) – IO handler object.
sniff_line_limit (int, optional) – Line limit for sniffing the file.
- class ersilia.io.readers.file.TabularFileShapeStandardizer(src_path, dst_path, input_shape, IO, sniff_line_limit=100)[source]¶
Bases:
BaseTabularFile
Class to standardize the shape of tabular files.
- Parameters:
src_path (str) – Source path of the file.
dst_path (str) – Destination path of the standardized file.
input_shape (str or object) – Input shape specification.
IO (object) – IO handler object.
sniff_line_limit (int, optional) – Line limit for sniffing the file.
Examples
tfss = TabularFileShapeStandardizer( "data.csv", "standard_data.csv", "single", IOHandler(), ) tfss.standardize()
ersilia.io.readers.pyinput module¶
- class ersilia.io.readers.pyinput.PyInputReader(input, IO)[source]¶
Bases:
object
Class to read and process Python input data.
- Parameters:
input (any) – The input data.
IO (object) – IO handler object.