ersilia.core package¶
Submodules¶
ersilia.core.base module¶
ersilia.core.model module¶
- class ersilia.core.model.ErsiliaModel(model: str, output_source: OutputSource = 'local-only', save_to_lake: bool = True, service_class: str | None = None, config_json: dict | None = None, credentials_json: dict | None = None, verbose: bool | None = None, fetch_if_not_available: bool = True, preferred_port: int | None = None, track_runs: bool = False)[source]¶
Bases:
ErsiliaBase
ErsiliaModel class for managing and interacting with different models.
This class provides methods to fetch, serve, run, and close models form a model hub. It also supports tracking runs and handling various input and output formats.
- Parameters:
model (str) – The identifier of the model.
output_source (OutputSource, optional) – The source of the output, by default OutputSource.LOCAL_ONLY.
save_to_lake (bool, optional) – Whether to save to lake, by default True.
service_class (str, optional) – The service class, by default None.
config_json (dict, optional) – Configuration in JSON format, by default None.
credentials_json (dict, optional) – Credentials in JSON format, by default None.
verbose (bool, optional) – Verbosity flag, by default None.
fetch_if_not_available (bool, optional) – Whether to fetch the model if not available locally, by default True.
preferred_port (int, optional) – Preferred port for serving the model, by default None.
track_runs (bool, optional) – Whether to track runs, by default False.
Examples
Fetching a model this requires to use asyncio since fetch is a coroutine.:
model = ErsiliaModel(model="model_id") model.fetch()
Serving a model:
model = ErsiliaModel(model="model_id") model.serve()
Running a model:
model = ErsiliaModel(model="model_id") result = model.run( input="input_data.csv", output="output_data.csv", )
Closing a model:
model = ErsiliaModel(model="model_id") model.close()
- api(api_name=None, input=None, output=None, batch_size=100)[source]¶
Run the specified API with the given input and output.
This method executes the specified API(usually with the end point run) using the provided input and output parameters. It handles file splitting and caching if necessary.
- Parameters:
api_name (str, optional) – The name of the API to run, by default None.
input (str, optional) – The input data, by default None.
output (str, optional) – The output data, by default None.
batch_size (int, optional) – The batch size, by default DEFAULT_BATCH_SIZE.
- Returns:
The result of the API run.
- Return type:
Any
- api_task(api_name, input, output, batch_size)[source]¶
Run the specified API task with the given input and output.
This method executes the specified API task using the provided input and output parameters. It returns the result of the API task, which can be a generator, file, or other data types.
- Parameters:
api_name (str) – The name of the API to run.
input (str) – The input data.
output (str) – The output data.
batch_size (int) – The batch size.
- Returns:
The result of the API task.
- Return type:
Any
- close()[source]¶
Close the model services and session.
This method stops the model service and closes the session.
- example(n_samples, file_name=None, simple=True)[source]¶
Generate example data for the model.
This method generates example data for the model using the specified number of samples. The generated data can be saved to a file if a file name is provided.
- Parameters:
n_samples (int) – The number of samples to generate.
file_name (str, optional) – The file name to save the examples, by default None.
simple (bool, optional) – Whether to generate simple examples, by default True.
- Returns:
The generated example data(path, list of smiles etc…).
- Return type:
Any
- get_apis()[source]¶
Get the list of available APIs for the model.
This method retrieves the list of APIs that are available for the model.
- Returns:
The list of available APIs.
- Return type:
list
- info()[source]¶
Get the information of the model.
This method reads the information file of the model and returns its content as a dictionary.
- Returns:
The information of the model.
- Return type:
dict
- property input_type¶
Get the input type of the model.
This property reads the input type information from the model’s card file and returns it as a list of input types.
- Returns:
The list of input types(such as compounds).
- Return type:
list
- is_valid()[source]¶
Check if the model identifier is valid.
This method verifies if the provided model identifier is valid by checking its existence and validity in the model hub.
- Returns:
True if the model identifier is valid, False otherwise.
- Return type:
bool
- property meta¶
Get the metadata of the model.
This property returns the metadata of the model, which provides additional information about the model, such as its description, version, and author.
- Returns:
The metadata of the model.
- Return type:
dict
- property output_type¶
Get the output type of the model.
This property reads the output type information from the model’s card file and returns it as a list of output types.
- Returns:
The list of output types(such as Descriptor, score, probability etc…).
- Return type:
list
- property paths¶
Get the paths related to the model.
This property returns a dictionary containing various paths related to the model, such as the destination path, repository path, and BentoML path.
- Returns:
The dictionary containing paths.
- Return type:
dict
- run(input: str | None = None, output: str | None = None, batch_size: int = 100, track_run: bool = False)[source]¶
Run the model with the given input and output.
This method executes the model using the provided input and output parameters. It first tries to run the model using the standard API, and if that fails, it falls back to the conventional run method. It also tracks the run if the track_run flag is set.
- Parameters:
input (str, optional) – The input data, by default None.
output (str, optional) – The output data, by default None.
batch_size (int, optional) – The batch size, by default DEFAULT_BATCH_SIZE.
track_run (bool, optional) – Whether to track the run, by default False.
- Returns:
The result of the model run(such as output csv file name, json).
- Return type:
Any
- property schema¶
Get the schema of the model.
This property returns the schema of the model, which defines the structure and format of the model’s input and output data.
- Returns:
The schema of the model.
- Return type:
dict
- serve()[source]¶
Serve the model by starting the necessary services.
This method sets up the required dependencies, opens a session, and starts the model service. It registers the service class and output source, updates the model’s URL and process ID (PID), and tracks resource usage if tracking is enabled.
- setup()[source]¶
Setup the necessary requirements for the model.
This method ensures that the required dependencies and resources for the model are available.
- property size¶
Get the size of the model.
This property reads the size information from the model’s size file and returns it as a dictionary.
- Returns:
The size of the model.
- Return type:
dict
ersilia.core.modelbase module¶
- class ersilia.core.modelbase.ModelBase(**kwargs)[source]¶
Bases:
ErsiliaBase
Base class for managing models.
This class provides foundational functionality for handling models, including initialization, validation, and checking local availability.
- Parameters:
model_id_or_slug (str, optional) – The model identifier or slug, by default None.
repo_path (str, optional) – The repository path, by default None.
config_json (dict, optional) – Configuration in JSON format, by default None.
- is_available_locally()[source]¶
Check if the model is available locally either from the status file or from DockerHub.
- Returns:
True if the model is available locally, False otherwise.
- Return type:
bool
ersilia.core.session module¶
- class ersilia.core.session.Session(config_json)[source]¶
Bases:
ErsiliaBase
Session class for managing model sessions.
This class provides functionality to manage sessions, including opening, closing, and updating session information. Sessions are essential for tracking the state and usage of models, ensuring that all necessary information is stored and can be retrieved when needed.
- Parameters:
config_json (dict) – Configuration in JSON format.
- close()[source]¶
Close the current session.
This method removes the session file, effectively closing the session.
- current_identifier()[source]¶
Get the current identifier from the session.
This method retrieves the current identifier from the session data.
- Returns:
The current identifier, or None if no session data is available.
- Return type:
str or None
- current_model_id()[source]¶
Get the current model ID from the session.
This method retrieves the current model ID from the session data.
- Returns:
The current model ID, or None if no session data is available.
- Return type:
str or None
- current_output_source()[source]¶
Get the current output source from the session.
This method retrieves the current output source from the session data.
- Returns:
The current output source, or None if no session data is available.
- Return type:
str or None
- current_service_class()[source]¶
Get the current service class from the session.
This method retrieves the current service class from the session data.
- Returns:
The current service class, or None if no session data is available.
- Return type:
str or None
- get()[source]¶
Get the current session data.
This method retrieves the current session data from the session file. The session file is a JSON file that contains information about the current session, such as the model ID, timestamp, identifier, tracking status, service class, and output source.
- Returns:
The session data, or None if no session file exists.
- Return type:
dict or None
- open(model_id, track_runs)[source]¶
Open a new session for the specified model.
This method creates a new session for the specified model and saves the session data.
- Parameters:
model_id (str) – The identifier of the model.
track_runs (bool) – Whether to track runs.
- register_output_source(output_source)[source]¶
Register the output source in the session.
This method updates the session data with the provided output source.
- Parameters:
output_source (str) – The output source to register.
- register_service_class(service_class)[source]¶
Register the service class in the session.
This method updates the session data with the provided service class.
- Parameters:
service_class (str) – The service class to register.
- tracking_status()[source]¶
Get the tracking status from the session.
This method retrieves the tracking status from the session data.
- Returns:
The tracking status, or None if no session data is available.
- Return type:
bool or None
- update_cpu_time(cpu_time)[source]¶
Updates the total CPU time usage in the session data by adding the provided CPU time.
- Parameters:
cpu_time (float) – The CPU time to add.
ersilia.core.tracking module¶
- class ersilia.core.tracking.RunTracker(model_id, config_json)[source]¶
Bases:
ErsiliaBase
This class is responsible for tracking model runs. It calculates the desired metadata based on a model’s inputs, outputs, and other run-specific features, before uploading them to AWS to be ingested to Ersilia’s Splunk dashboard.
- Parameters:
model_id (str) – The identifier of the model.
config_json (dict) – Configuration in JSON format.
- check_types(result, metadata)[source]¶
Check the types of the output file against the expected types.
This method checks the shape of the output file (list vs single) and the types of each column.
- Parameters:
result (list) – The output data.
metadata (dict) – The metadata dictionary.
- Returns:
A dictionary containing the number of mismatched types and a boolean for whether the shape is correct.
- Return type:
dict
- get_file_sizes(input_file, output_file)[source]¶
Calculate the size of the input and output dataframes, as well as the average size of each row.
- Parameters:
input_file (pd.DataFrame) – Pandas dataframe containing the input data.
output_file (pd.DataFrame) – Pandas dataframe containing the output data.
- Returns:
Dictionary containing the input size, output size, average input size, and average output size.
- Return type:
dict
- get_memory_info()[source]¶
Retrieve the memory information of the current process.
- Returns:
A tuple containing the memory usage in MB and the total CPU time.
- Return type:
tuple
- get_peak_memory()[source]¶
Calculate the peak memory usage of Ersilia’s Python instance during the run.
- Returns:
The peak memory usage in Megabytes.
- Return type:
float
- log_result(result)[source]¶
Log the result of the model run.
This method logs the result of the model run to a CSV file.
- Parameters:
result (list) – The result data.
- track(**kwargs)¶
- ersilia.core.tracking.flatten_dict(data)[source]¶
Flatten the nested dictionaries from the generator into a single-level dictionary.
- Parameters:
data (dict) – The nested dictionary to flatten.
- Returns:
The flattened dictionary.
- Return type:
dict
- ersilia.core.tracking.get_nan_counts(data_list)[source]¶
Calculate the number of NAN values in each key of a list of dictionaries.
- Parameters:
data_list (list) – List of dictionaries containing the data.
- Returns:
The count of NAN values for each key.
- Return type:
int
- ersilia.core.tracking.log_files_metrics(file_log)[source]¶
Log the number of errors and warnings in the log files.
- Parameters:
file_log (str) – The log file to be read.
- Returns:
A dictionary containing the error count and warning count.
- Return type:
dict
- ersilia.core.tracking.serialize_session_json_to_csv(json_file, csv_file)[source]¶
Serialize session JSON data to a CSV file.
- Parameters:
json_file (str) – The path to the JSON file.
csv_file (str) – The path to the CSV file.
- ersilia.core.tracking.serialize_tracking_json_to_csv(json_file, csv_file)[source]¶
Serialize tracking JSON data to a CSV file.
- Parameters:
json_file (str) – The path to the JSON file.
csv_file (str) – The path to the CSV file.
- ersilia.core.tracking.upload_to_cddvault(output_df, api_key)[source]¶
Upload the output dataframe from the model run to CDD Vault.
- Parameters:
output_df (pd.DataFrame) – The output dataframe from the model run.
api_key (str) – The API key for CDD Vault’s API.
- Returns:
True if the API call was successful, False otherwise.
- Return type:
bool
- ersilia.core.tracking.upload_to_s3(model_id, metadata, bucket='ersilia-models-runs')[source]¶
Upload a file to an S3 bucket.
- Parameters:
model_id (str) – The identifier of the model.
metadata (dict) – The metadata to upload.
bucket (str, optional) – The S3 bucket to upload to, by default TRACKING_BUCKET.
- Returns:
True if the file was uploaded successfully, False otherwise.
- Return type:
bool