ersilia.core package

Submodules

ersilia.core.base module

class ersilia.core.base.ErsiliaBase(config_json=None, credentials_json=None)[source]

Bases: object

Base class of Ersilia.

This class is used as a configuration for many of the classes of the package.

ersilia.core.model module

class ersilia.core.model.ErsiliaModel(model: str, output_source: OutputSource | None = None, service_class: str | None = None, config_json: dict | None = None, credentials_json: dict | None = None, verbose: bool | None = None, fetch_if_not_available: bool = True, preferred_port: int | None = None, cache: bool = True, maxmemory: float | None = None)[source]

Bases: ErsiliaBase

ErsiliaModel class for managing and interacting with different models.

This class provides methods to fetch, serve, run, and close models form a model hub. It also supports tracking runs and handling various input and output formats.

Parameters:
  • model (str) – The identifier of the model.

  • output_source (OutputSource, optional) – The source of the output, by default OutputSource.LOCAL_ONLY.

  • service_class (str, optional) – The service class, by default None.

  • config_json (dict, optional) – Configuration in JSON format, by default None.

  • credentials_json (dict, optional) – Credentials in JSON format, by default None.

  • verbose (bool, optional) – Verbosity flag, by default None.

  • fetch_if_not_available (bool, optional) – Whether to fetch the model if not available locally, by default True.

  • preferred_port (int, optional) – Preferred port for serving the model, by default None.

  • track_runs (bool, optional) – Whether to track runs, by default False.

  • cache (bool) – Whether to use redis cache or not

  • maxmemory (float) – Fraction of memory used by redis

Examples

Fetching a model this requires to use asyncio since fetch is a coroutine.:

model = ErsiliaModel(model="model_id")
model.fetch()

Serving a model:

model = ErsiliaModel(model="model_id")
model.serve()

Running a model:

model = ErsiliaModel(model="model_id")
result = model.run(
    input="input_data.csv",
    output="output_data.csv",
)

Closing a model:

model = ErsiliaModel(model="model_id")
model.close()
api(api_name=None, input=None, output=None, batch_size=100)[source]

Run the specified API with the given input and output.

This method executes the specified API(usually with the end point run) using the provided input and output parameters. It handles file splitting and caching if necessary.

Parameters:
  • api_name (str, optional) – The name of the API to run, by default None.

  • input (str, optional) – The input data, by default None.

  • output (str, optional) – The output data, by default None.

  • batch_size (int, optional) – The batch size, by default DEFAULT_BATCH_SIZE.

Returns:

The result of the API run.

Return type:

Any

api_task(api_name, input, output, batch_size)[source]

Run the specified API task with the given input and output.

This method executes the specified API task using the provided input and output parameters. It returns the result of the API task, which can be a generator, file, or other data types.

Parameters:
  • api_name (str) – The name of the API to run.

  • input (str) – The input data.

  • output (str) – The output data.

  • batch_size (int) – The batch size.

Returns:

The result of the API task.

Return type:

Any

close()[source]

Close the model services and session.

This method stops the model service and closes the session.

example(n_samples, file_name=None, simple=True)[source]

Generate example data for the model.

This method generates example data for the model using the specified number of samples. The generated data can be saved to a file if a file name is provided.

Parameters:
  • n_samples (int) – The number of samples to generate.

  • file_name (str, optional) – The file name to save the examples, by default None.

  • simple (bool, optional) – Whether to generate simple examples, by default True.

Returns:

The generated example data(path, list of smiles etc…).

Return type:

Any

fetch()[source]

This method fetches the model from the Ersilia Model Hub.

get_apis()[source]

Get the list of available APIs for the model.

This method retrieves the list of APIs that are available for the model.

Returns:

The list of available APIs.

Return type:

list

info()[source]

Get the information of the model.

This method reads the information file of the model and returns its content as a dictionary.

Returns:

The information of the model.

Return type:

dict

property input_type

Get the input type of the model.

This property reads the input type information from the model’s card file and returns it as a list of input types.

Returns:

The list of input types(such as compounds).

Return type:

list

is_valid()[source]

Check if the model identifier is valid.

This method verifies if the provided model identifier is valid by checking its existence and validity in the model hub.

Returns:

True if the model identifier is valid, False otherwise.

Return type:

bool

property meta

Get the metadata of the model.

This property returns the metadata of the model, which provides additional information about the model, such as its description, version, and author.

Returns:

The metadata of the model.

Return type:

dict

property output_type

Get the output type of the model.

This property reads the output type information from the model’s card file and returns it as a list of output types.

Returns:

The list of output types(such as Descriptor, score, probability etc…).

Return type:

list

property paths

Get the paths related to the model.

This property returns a dictionary containing various paths related to the model, such as the destination path, repository path, and BentoML path.

Returns:

The dictionary containing paths.

Return type:

dict

run(**kwargs)
property schema

Get the schema of the model.

This property returns the schema of the model, which defines the structure and format of the model’s input and output data.

Returns:

The schema of the model.

Return type:

dict

serve(**kwargs)
setup()[source]

Setup the necessary requirements for the model.

This method ensures that the required dependencies and resources for the model are available.

property size

Get the size of the model.

This property reads the size information from the model’s size file and returns it as a dictionary.

Returns:

The size of the model.

Return type:

dict

update_model_usage_time(model_id)[source]

Update the model usage time.

This method updates the usage time of the specified model by recording the current timestamp in the fetched models file.

Parameters:

model_id (str) – The identifier of the model.

ersilia.core.modelbase module

class ersilia.core.modelbase.ModelBase(**kwargs)[source]

Bases: ErsiliaBase

Base class for managing models.

This class provides foundational functionality for handling models, including initialization, validation, and checking local availability.

Parameters:
  • model_id_or_slug (str, optional) – The model identifier or slug, by default None.

  • repo_path (str, optional) – The repository path, by default None.

  • config_json (dict, optional) – Configuration in JSON format, by default None.

is_available_locally()[source]

Check if the model is available locally either from the status file or from DockerHub.

Returns:

True if the model is available locally, False otherwise.

Return type:

bool

is_valid()[source]

Check if the model identifier and slug are valid.

Returns:

True if the model identifier and slug are valid, False otherwise.

Return type:

bool

was_fetched_from_dockerhub()[source]

Check if the model was fetched from DockerHub by reading the DockerHub file.

Returns:

True if the model was fetched from DockerHub, False otherwise.

Return type:

bool

ersilia.core.session module

class ersilia.core.session.Session(config_json)[source]

Bases: ErsiliaBase

Session class for managing model sessions.

This class provides functionality to manage sessions, including opening, closing, and updating session information. Sessions are essential for tracking the state and usage of models, ensuring that all necessary information is stored and can be retrieved when needed.

Parameters:

config_json (dict) – Configuration in JSON format.

close()[source]

Close the current session.

This method removes the session file, effectively closing the session.

current_identifier()[source]

Get the current identifier from the session.

This method retrieves the current identifier from the session data.

Returns:

The current identifier, or None if no session data is available.

Return type:

str or None

current_model_id()[source]

Get the current model ID from the session.

This method retrieves the current model ID from the session data.

Returns:

The current model ID, or None if no session data is available.

Return type:

str or None

current_output_source()[source]

Get the current output source from the session.

This method retrieves the current output source from the session data.

Returns:

The current output source, or None if no session data is available.

Return type:

str or None

current_service_class()[source]

Get the current service class from the session.

This method retrieves the current service class from the session data.

Returns:

The current service class, or None if no session data is available.

Return type:

str or None

get()[source]

Get the current session data.

This method retrieves the current session data from the session file. The session file is a JSON file that contains information about the current session, such as the model ID, timestamp, identifier, tracking status, service class, and output source.

Returns:

The session data, or None if no session file exists.

Return type:

dict or None

open(model_id, track_runs)[source]

Open a new session for the specified model.

This method creates a new session for the specified model and saves the session data.

Parameters:
  • model_id (str) – The identifier of the model.

  • track_runs (bool) – Whether to track runs.

register_output_source(output_source)[source]

Register the output source in the session.

This method updates the session data with the provided output source.

Parameters:

output_source (str) – The output source to register.

register_service_class(service_class)[source]

Register the service class in the session.

This method updates the session data with the provided service class.

Parameters:

service_class (str) – The service class to register.

tracking_status()[source]

Get the tracking status from the session.

This method retrieves the tracking status from the session data.

Returns:

The tracking status, or None if no session data is available.

Return type:

bool or None

update_cpu_time(cpu_time)[source]

Updates the total CPU time usage in the session data by adding the provided CPU time.

Parameters:

cpu_time (float) – The CPU time to add.

update_peak_memory(peak_memory)[source]

Update the peak memory usage in the session data.

This method updates the peak memory usage in the session data if the new peak is higher than the stored peak memory.

Parameters:

peak_memory (float) – The peak memory usage to update.

update_total_memory(additional_memory)[source]

Update the total memory usage in the session data.

This method updates the total memory usage in the session data by adding the provided additional memory.

Parameters:

additional_memory (float) – The additional memory to add.

ersilia.core.tracking module

class ersilia.core.tracking.AwsConfig[source]

Bases: ErsiliaBase

This class is responsible for retrieving AWS credentials from the environment variables or the AWS config file. It checks for the presence of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION in the environment variables. If not found, it looks for them in the AWS config file located at ~/.aws/credentials and ~/.aws/config. If the credentials are found, they are returned as a dictionary.

get()[source]

Get the AWS credentials from the environment variables or the AWS config file.

Returns:

A dictionary containing the AWS credentials.

Return type:

dict

is_valid()[source]

Validate access to AWS.

Returns:

True if the configured access in the system is valid, False otherwise.

Return type:

bool

class ersilia.core.tracking.RunTracker(model_id, config_json)[source]

Bases: ErsiliaBase

This class is responsible for tracking model runs. It calculates the desired metadata based on a model’s inputs, outputs, and other run-specific features, before uploading them to AWS to be ingested to Ersilia’s Splunk dashboard.

Parameters:
  • model_id (str) – The identifier of the model.

  • config_json (dict) – Configuration in JSON format.

create_event_data(**kwargs)
get_file_sizes(input_file, output_file)[source]

Calculate the size of the input and output dataframes.

Parameters:
  • input_file (str) – File path containing the input data.

  • output_file (str) – File path containing the output data.

Returns:

Dictionary containing the input size, output size.

Return type:

dict

log_files_metrics(file_log)[source]

Log the number of errors and warnings in the log files.

Parameters:

file_log (str) – The log file to be read.

Returns:

A dictionary containing the error count and warning count.

Return type:

dict

summarize_output(output_file)[source]

This method summarizes the output of a model run :param output_file: The path to the output file. :type output_file: str

Returns:

data – A dictionary containing the summarized data.

Return type:

dict

track(input, output, metadata, time_seconds)[source]

Track the model run and upload to S3 bucket. This method collects relevant data for the run, updates the session file with the stats, and uploads the data to AWS if credentials are available. :param input: The input data used in the model run. :type input: str :param output: The output data in the form of a CSV file path. :type output: str :param metadata: The metadata of the model. :type metadata: dict

Returns:

This method does not return any value.

Return type:

None

upload_to_s3(event_id)[source]

Upload event information into an S3 bucket

Parameters:

event_id (list) – Event identifier.

Returns:

True if uploading completed successfully, False otherwise.

Return type:

bool

validate_aws_access(**kwargs)

Module contents