ersilia.core package

Submodules

ersilia.core.base module

class ersilia.core.base.ErsiliaBase(config_json=None, credentials_json=None)[source]

Bases: object

Base class of Ersilia.

This class is used as a configuration for many of the classes of the package.

ersilia.core.model module

class ersilia.core.model.ErsiliaModel(model: str, output_source: OutputSource = 'local-only', save_to_lake: bool = True, service_class: str | None = None, config_json: dict | None = None, credentials_json: dict | None = None, verbose: bool | None = None, fetch_if_not_available: bool = True, preferred_port: int | None = None, track_runs: bool = False)[source]

Bases: ErsiliaBase

ErsiliaModel class for managing and interacting with different models.

This class provides methods to fetch, serve, run, and close models form a model hub. It also supports tracking runs and handling various input and output formats.

Parameters:
  • model (str) – The identifier of the model.

  • output_source (OutputSource, optional) – The source of the output, by default OutputSource.LOCAL_ONLY.

  • save_to_lake (bool, optional) – Whether to save to lake, by default True.

  • service_class (str, optional) – The service class, by default None.

  • config_json (dict, optional) – Configuration in JSON format, by default None.

  • credentials_json (dict, optional) – Credentials in JSON format, by default None.

  • verbose (bool, optional) – Verbosity flag, by default None.

  • fetch_if_not_available (bool, optional) – Whether to fetch the model if not available locally, by default True.

  • preferred_port (int, optional) – Preferred port for serving the model, by default None.

  • track_runs (bool, optional) – Whether to track runs, by default False.

Examples

Fetching a model this requires to use asyncio since fetch is a coroutine.:

model = ErsiliaModel(model="model_id")
model.fetch()

Serving a model:

model = ErsiliaModel(model="model_id")
model.serve()

Running a model:

model = ErsiliaModel(model="model_id")
result = model.run(
    input="input_data.csv",
    output="output_data.csv",
)

Closing a model:

model = ErsiliaModel(model="model_id")
model.close()
api(api_name=None, input=None, output=None, batch_size=100)[source]

Run the specified API with the given input and output.

This method executes the specified API(usually with the end point run) using the provided input and output parameters. It handles file splitting and caching if necessary.

Parameters:
  • api_name (str, optional) – The name of the API to run, by default None.

  • input (str, optional) – The input data, by default None.

  • output (str, optional) – The output data, by default None.

  • batch_size (int, optional) – The batch size, by default DEFAULT_BATCH_SIZE.

Returns:

The result of the API run.

Return type:

Any

api_task(api_name, input, output, batch_size)[source]

Run the specified API task with the given input and output.

This method executes the specified API task using the provided input and output parameters. It returns the result of the API task, which can be a generator, file, or other data types.

Parameters:
  • api_name (str) – The name of the API to run.

  • input (str) – The input data.

  • output (str) – The output data.

  • batch_size (int) – The batch size.

Returns:

The result of the API task.

Return type:

Any

close()[source]

Close the model services and session.

This method stops the model service and closes the session.

example(n_samples, file_name=None, simple=True)[source]

Generate example data for the model.

This method generates example data for the model using the specified number of samples. The generated data can be saved to a file if a file name is provided.

Parameters:
  • n_samples (int) – The number of samples to generate.

  • file_name (str, optional) – The file name to save the examples, by default None.

  • simple (bool, optional) – Whether to generate simple examples, by default True.

Returns:

The generated example data(path, list of smiles etc…).

Return type:

Any

fetch()[source]

This method fetches the model from the Ersilia Model Hub.

get_apis()[source]

Get the list of available APIs for the model.

This method retrieves the list of APIs that are available for the model.

Returns:

The list of available APIs.

Return type:

list

info()[source]

Get the information of the model.

This method reads the information file of the model and returns its content as a dictionary.

Returns:

The information of the model.

Return type:

dict

property input_type

Get the input type of the model.

This property reads the input type information from the model’s card file and returns it as a list of input types.

Returns:

The list of input types(such as compounds).

Return type:

list

is_valid()[source]

Check if the model identifier is valid.

This method verifies if the provided model identifier is valid by checking its existence and validity in the model hub.

Returns:

True if the model identifier is valid, False otherwise.

Return type:

bool

property meta

Get the metadata of the model.

This property returns the metadata of the model, which provides additional information about the model, such as its description, version, and author.

Returns:

The metadata of the model.

Return type:

dict

property output_type

Get the output type of the model.

This property reads the output type information from the model’s card file and returns it as a list of output types.

Returns:

The list of output types(such as Descriptor, score, probability etc…).

Return type:

list

property paths

Get the paths related to the model.

This property returns a dictionary containing various paths related to the model, such as the destination path, repository path, and BentoML path.

Returns:

The dictionary containing paths.

Return type:

dict

run(input: str | None = None, output: str | None = None, batch_size: int = 100, track_run: bool = False)[source]

Run the model with the given input and output.

This method executes the model using the provided input and output parameters. It first tries to run the model using the standard API, and if that fails, it falls back to the conventional run method. It also tracks the run if the track_run flag is set.

Parameters:
  • input (str, optional) – The input data, by default None.

  • output (str, optional) – The output data, by default None.

  • batch_size (int, optional) – The batch size, by default DEFAULT_BATCH_SIZE.

  • track_run (bool, optional) – Whether to track the run, by default False.

Returns:

The result of the model run(such as output csv file name, json).

Return type:

Any

property schema

Get the schema of the model.

This property returns the schema of the model, which defines the structure and format of the model’s input and output data.

Returns:

The schema of the model.

Return type:

dict

serve()[source]

Serve the model by starting the necessary services.

This method sets up the required dependencies, opens a session, and starts the model service. It registers the service class and output source, updates the model’s URL and process ID (PID), and tracks resource usage if tracking is enabled.

setup()[source]

Setup the necessary requirements for the model.

This method ensures that the required dependencies and resources for the model are available.

property size

Get the size of the model.

This property reads the size information from the model’s size file and returns it as a dictionary.

Returns:

The size of the model.

Return type:

dict

update_model_usage_time(model_id)[source]

Update the model usage time.

This method updates the usage time of the specified model by recording the current timestamp in the fetched models file.

Parameters:

model_id (str) – The identifier of the model.

ersilia.core.modelbase module

class ersilia.core.modelbase.ModelBase(**kwargs)[source]

Bases: ErsiliaBase

Base class for managing models.

This class provides foundational functionality for handling models, including initialization, validation, and checking local availability.

Parameters:
  • model_id_or_slug (str, optional) – The model identifier or slug, by default None.

  • repo_path (str, optional) – The repository path, by default None.

  • config_json (dict, optional) – Configuration in JSON format, by default None.

is_available_locally()[source]

Check if the model is available locally either from the status file or from DockerHub.

Returns:

True if the model is available locally, False otherwise.

Return type:

bool

is_valid()[source]

Check if the model identifier and slug are valid.

Returns:

True if the model identifier and slug are valid, False otherwise.

Return type:

bool

was_fetched_from_dockerhub()[source]

Check if the model was fetched from DockerHub by reading the DockerHub file.

Returns:

True if the model was fetched from DockerHub, False otherwise.

Return type:

bool

ersilia.core.session module

class ersilia.core.session.Session(config_json)[source]

Bases: ErsiliaBase

Session class for managing model sessions.

This class provides functionality to manage sessions, including opening, closing, and updating session information. Sessions are essential for tracking the state and usage of models, ensuring that all necessary information is stored and can be retrieved when needed.

Parameters:

config_json (dict) – Configuration in JSON format.

close()[source]

Close the current session.

This method removes the session file, effectively closing the session.

current_identifier()[source]

Get the current identifier from the session.

This method retrieves the current identifier from the session data.

Returns:

The current identifier, or None if no session data is available.

Return type:

str or None

current_model_id()[source]

Get the current model ID from the session.

This method retrieves the current model ID from the session data.

Returns:

The current model ID, or None if no session data is available.

Return type:

str or None

current_output_source()[source]

Get the current output source from the session.

This method retrieves the current output source from the session data.

Returns:

The current output source, or None if no session data is available.

Return type:

str or None

current_service_class()[source]

Get the current service class from the session.

This method retrieves the current service class from the session data.

Returns:

The current service class, or None if no session data is available.

Return type:

str or None

get()[source]

Get the current session data.

This method retrieves the current session data from the session file. The session file is a JSON file that contains information about the current session, such as the model ID, timestamp, identifier, tracking status, service class, and output source.

Returns:

The session data, or None if no session file exists.

Return type:

dict or None

open(model_id, track_runs)[source]

Open a new session for the specified model.

This method creates a new session for the specified model and saves the session data.

Parameters:
  • model_id (str) – The identifier of the model.

  • track_runs (bool) – Whether to track runs.

register_output_source(output_source)[source]

Register the output source in the session.

This method updates the session data with the provided output source.

Parameters:

output_source (str) – The output source to register.

register_service_class(service_class)[source]

Register the service class in the session.

This method updates the session data with the provided service class.

Parameters:

service_class (str) – The service class to register.

tracking_status()[source]

Get the tracking status from the session.

This method retrieves the tracking status from the session data.

Returns:

The tracking status, or None if no session data is available.

Return type:

bool or None

update_cpu_time(cpu_time)[source]

Updates the total CPU time usage in the session data by adding the provided CPU time.

Parameters:

cpu_time (float) – The CPU time to add.

update_peak_memory(peak_memory)[source]

Update the peak memory usage in the session data.

This method updates the peak memory usage in the session data if the new peak is higher than the stored peak memory.

Parameters:

peak_memory (float) – The peak memory usage to update.

update_total_memory(additional_memory)[source]

Update the total memory usage in the session data.

This method updates the total memory usage in the session data by adding the provided additional memory.

Parameters:

additional_memory (float) – The additional memory to add.

ersilia.core.tracking module

class ersilia.core.tracking.RunTracker(model_id, config_json)[source]

Bases: ErsiliaBase

This class is responsible for tracking model runs. It calculates the desired metadata based on a model’s inputs, outputs, and other run-specific features, before uploading them to AWS to be ingested to Ersilia’s Splunk dashboard.

Parameters:
  • model_id (str) – The identifier of the model.

  • config_json (dict) – Configuration in JSON format.

check_types(result, metadata)[source]

Check the types of the output file against the expected types.

This method checks the shape of the output file (list vs single) and the types of each column.

Parameters:
  • result (list) – The output data.

  • metadata (dict) – The metadata dictionary.

Returns:

A dictionary containing the number of mismatched types and a boolean for whether the shape is correct.

Return type:

dict

get_file_sizes(input_file, output_file)[source]

Calculate the size of the input and output dataframes, as well as the average size of each row.

Parameters:
  • input_file (pd.DataFrame) – Pandas dataframe containing the input data.

  • output_file (pd.DataFrame) – Pandas dataframe containing the output data.

Returns:

Dictionary containing the input size, output size, average input size, and average output size.

Return type:

dict

get_memory_info()[source]

Retrieve the memory information of the current process.

Returns:

A tuple containing the memory usage in MB and the total CPU time.

Return type:

tuple

get_peak_memory()[source]

Calculate the peak memory usage of Ersilia’s Python instance during the run.

Returns:

The peak memory usage in Megabytes.

Return type:

float

log_result(result)[source]

Log the result of the model run.

This method logs the result of the model run to a CSV file.

Parameters:

result (list) – The result data.

track(**kwargs)
ersilia.core.tracking.flatten_dict(data)[source]

Flatten the nested dictionaries from the generator into a single-level dictionary.

Parameters:

data (dict) – The nested dictionary to flatten.

Returns:

The flattened dictionary.

Return type:

dict

ersilia.core.tracking.get_nan_counts(data_list)[source]

Calculate the number of NAN values in each key of a list of dictionaries.

Parameters:

data_list (list) – List of dictionaries containing the data.

Returns:

The count of NAN values for each key.

Return type:

int

ersilia.core.tracking.log_files_metrics(file_log)[source]

Log the number of errors and warnings in the log files.

Parameters:

file_log (str) – The log file to be read.

Returns:

A dictionary containing the error count and warning count.

Return type:

dict

ersilia.core.tracking.serialize_session_json_to_csv(json_file, csv_file)[source]

Serialize session JSON data to a CSV file.

Parameters:
  • json_file (str) – The path to the JSON file.

  • csv_file (str) – The path to the CSV file.

ersilia.core.tracking.serialize_tracking_json_to_csv(json_file, csv_file)[source]

Serialize tracking JSON data to a CSV file.

Parameters:
  • json_file (str) – The path to the JSON file.

  • csv_file (str) – The path to the CSV file.

ersilia.core.tracking.upload_to_cddvault(output_df, api_key)[source]

Upload the output dataframe from the model run to CDD Vault.

Parameters:
  • output_df (pd.DataFrame) – The output dataframe from the model run.

  • api_key (str) – The API key for CDD Vault’s API.

Returns:

True if the API call was successful, False otherwise.

Return type:

bool

ersilia.core.tracking.upload_to_s3(model_id, metadata, bucket='ersilia-models-runs')[source]

Upload a file to an S3 bucket.

Parameters:
  • model_id (str) – The identifier of the model.

  • metadata (dict) – The metadata to upload.

  • bucket (str, optional) – The S3 bucket to upload to, by default TRACKING_BUCKET.

Returns:

True if the file was uploaded successfully, False otherwise.

Return type:

bool

Module contents