ersilia.io.readers package

Submodules

ersilia.io.readers.file module

class ersilia.io.readers.file.BaseJsonFile(path, IO, entity_is_list, expected_number)[source]

Bases: object

Base class for handling JSON files.

Parameters:
  • path (str) – Path to the file.

  • IO (object) – IO handler object.

  • entity_is_list (bool) – Whether the entity is a list.

  • expected_number (int) – Expected number of elements.

is_single_input()[source]

Check if the JSON file has a single input.

Returns:

True if the JSON file has a single input, False otherwise.

Return type:

bool

read_input_json()[source]

Read the input JSON file.

Returns:

Parsed JSON data.

Return type:

dict or list

class ersilia.io.readers.file.BaseTabularFile(path, IO, entity_is_list, expected_number, filter_by_column_validity=None, sniff_line_limit=100)[source]

Bases: object

Base class for handling tabular files.

Parameters:
  • path (str) – Path to the file.

  • IO (object) – IO handler object.

  • entity_is_list (bool) – Whether the entity is a list.

  • expected_number (int) – Expected number of columns.

  • filter_by_column_validity (bool, optional) – Whether to filter by column validity.

  • sniff_line_limit (int, optional) – Line limit for sniffing the file.

get_delimiter()[source]

Get the delimiter used in the file.

Returns:

The delimiter used in the file.

Return type:

str

get_string_delimiter()[source]

Get the string delimiter used in the file.

Returns:

The string delimiter.

Return type:

str

has_header()[source]

Check if the file has a header.

Returns:

True if the file has a header, False otherwise.

Return type:

bool

is_flattened()[source]

Check if the file is flattened.

Returns:

True if the file is flattened, False otherwise.

Return type:

bool

is_input(v: str) bool[source]

Check if a value is an input.

Parameters:

v (str) – The value to check.

Returns:

True if the value is an input, False otherwise.

Return type:

bool

is_key(v: str) bool[source]

Check if a value is a key.

Parameters:

v (str) – The value to check.

Returns:

True if the value is a key, False otherwise.

Return type:

bool

is_single_input()[source]

Check if the file has a single input.

Returns:

True if the file has a single input, False otherwise.

Return type:

bool

read_input_columns()[source]

Read the input columns from the file.

Returns:

List of input columns.

Return type:

list

resolve_columns()[source]

Resolve the columns in the file to determine input and key columns.

Return type:

None

class ersilia.io.readers.file.BatchCacher[source]

Bases: object

Class to handle caching of file batches.

get_cached_files(prefix)[source]

Get cached files with a specific prefix.

Parameters:

prefix (str) – The prefix of the cached files.

Returns:

List of cached files with the specified prefix.

Return type:

list

get_cached_input_files()[source]

Get cached input files.

Returns:

List of cached input files.

Return type:

list

get_cached_output_files()[source]

Get cached output files.

Returns:

List of cached output files.

Return type:

list

name_cached_output_files(cached_inputs, output_template)[source]

Name cached output files based on cached input files and an output template.

Parameters:
  • cached_inputs (list) – List of cached input files.

  • output_template (str) – Template for naming the output files.

Returns:

List of named cached output files.

Return type:

list

class ersilia.io.readers.file.FileTyper(path)[source]

Bases: object

Class to determine the type of a file based on its extension.

Parameters:

path (str) – Path to the file.

get_extension()[source]

Get the file extension.

Returns:

The file extension.

Return type:

str

is_csv()[source]

Check if the file is a CSV file.

Returns:

True if the file is a CSV file, False otherwise.

Return type:

bool

is_hdf5()[source]

Check if the file is an HDF5 file.

Returns:

True if the file is an HDF5 file, False otherwise.

Return type:

bool

is_json()[source]

Check if the file is a JSON file.

Returns:

True if the file is a JSON file, False otherwise.

Return type:

bool

is_tabular()[source]

Check if the file is a tabular file (CSV or TSV).

Returns:

True if the file is a tabular file, False otherwise.

Return type:

bool

is_tsv()[source]

Check if the file is a TSV file.

Returns:

True if the file is a TSV file, False otherwise.

Return type:

bool

is_valid_input_file()[source]

Check if the file is a valid input file.

Returns:

True if the file is a valid input file, False otherwise.

Return type:

bool

is_valid_output_file()[source]

Check if the file is a valid output file.

Returns:

True if the file is a valid output file, False otherwise.

Return type:

bool

class ersilia.io.readers.file.JsonFileReader(path, IO)[source]

Bases: StandardJsonFileReader

Class to read and standardize JSON files.

Parameters:
  • path (str) – Path to the file.

  • IO (object) – IO handler object.

read()[source]

Read the content of the JSON file.

Returns:

Parsed JSON data.

Return type:

dict or list

class ersilia.io.readers.file.JsonFileShapeStandardizer(src_path, dst_path, input_shape, IO)[source]

Bases: BaseJsonFile

Class to standardize the shape of JSON files.

Parameters:
  • src_path (str) – Source path of the file.

  • dst_path (str) – Destination path of the standardized file.

  • input_shape (str or object) – Input shape specification.

  • IO (object) – IO handler object.

standardize()[source]

Standardize the shape of the JSON file.

Return type:

None

class ersilia.io.readers.file.StandardJsonFileReader(path)[source]

Bases: BatchCacher

Class to read standard JSON files.

Parameters:

path (str) – Path to the file.

Examples

>>> sjfr = StandardJsonFileReader("data.json")
>>> sjfr.read()
[{'key': 'value'}, {'key': 'value'}]
is_worth_splitting()[source]

Check if the JSON file is worth splitting into smaller chunks.

Returns:

True if the JSON file is worth splitting, False otherwise.

Return type:

bool

read()[source]

Read the content of the JSON file.

Returns:

Parsed JSON data.

Return type:

dict or list

split_in_cache()[source]

Split the JSON file into smaller chunks and cache them.

Returns:

List of cached input files.

Return type:

list

class ersilia.io.readers.file.StandardTabularFileReader(path)[source]

Bases: BatchCacher

Class to read standard tabular files.

Parameters:

path (str) – Path to the file.

get_delimiter()[source]

Get the delimiter used in the file.

Returns:

The delimiter used in the file.

Return type:

str

is_worth_splitting()[source]

Check if the file is worth splitting into smaller chunks.

Returns:

True if the file is worth splitting, False otherwise.

Return type:

bool

read()[source]

Read the content of the file.

Returns:

List of rows in the file.

Return type:

list

read_header()[source]

Read the header of the file.

Returns:

List of header columns.

Return type:

list

split_in_cache()[source]

Split the file into smaller chunks and cache them.

Returns:

List of cached input files.

Return type:

list

class ersilia.io.readers.file.TabularFileReader(path, IO, sniff_line_limit=100)[source]

Bases: StandardTabularFileReader

Class to read and standardize tabular files.

Parameters:
  • path (str) – Path to the file.

  • IO (object) – IO handler object.

  • sniff_line_limit (int, optional) – Line limit for sniffing the file.

read()[source]

Read the content of the file.

Returns:

List of rows in the file.

Return type:

list

class ersilia.io.readers.file.TabularFileShapeStandardizer(src_path, dst_path, input_shape, IO, sniff_line_limit=100)[source]

Bases: BaseTabularFile

Class to standardize the shape of tabular files.

Parameters:
  • src_path (str) – Source path of the file.

  • dst_path (str) – Destination path of the standardized file.

  • input_shape (str or object) – Input shape specification.

  • IO (object) – IO handler object.

  • sniff_line_limit (int, optional) – Line limit for sniffing the file.

Examples

tfss = TabularFileShapeStandardizer(
    "data.csv",
    "standard_data.csv",
    "single",
    IOHandler(),
)
tfss.standardize()
standardize()[source]

Standardize the shape of the tabular file.

Return type:

None

ersilia.io.readers.pyinput module

class ersilia.io.readers.pyinput.PyInputReader(input, IO)[source]

Bases: object

Class to read and process Python input data.

Parameters:
  • input (any) – The input data.

  • IO (object) – IO handler object.

is_single_input()[source]

Check if the input data is a single input.

Returns:

True if the input data is a single input, False otherwise.

Return type:

bool

read()[source]

Read the input data.

Returns:

List of input data.

Return type:

list

Module contents