ersilia.utils.identifiers package

Submodules

ersilia.utils.identifiers.arbitrary module

class ersilia.utils.identifiers.arbitrary.ArbitraryIdentifier[source]

Bases: object

Class for handling arbitrary identifiers.

encode(text: str) str[source]

Encode the given text using MD5.

Parameters:

text (str) – The text to encode.

Returns:

The encoded text.

Return type:

str

ersilia.utils.identifiers.arbitrary.Identifier

alias of ArbitraryIdentifier

ersilia.utils.identifiers.compound module

class ersilia.utils.identifiers.compound.CompoundIdentifier(local=True, concurrency_limit=10, cache_maxsize=128)[source]

Bases: object

A class to handle compound identification and conversion between different chemical identifiers.

Parameters:
  • local (bool, optional) – If True, use local RDKit for SMILES to InChIKey conversion. Default is True.

  • concurrency_limit (int, optional) – The maximum number of concurrent API requests. Default is 10.

  • cache_maxsize (int, optional) – The maximum size of the LRU cache for storing API results. Default is 128.

Examples

identifier = CompoundIdentifier()
smiles = "CCO"
inchikey = identifier.encode(smiles)
print(inchikey)
static chemical_identifier_resolver(identifier)[source]

Resolve a chemical identifier to a SMILES string using the NCI tool.

Parameters:

identifier (str) – The chemical identifier to resolve.

Returns:

The resolved SMILES string, or ‘UNPROCESSABLE_INPUT’ if resolution fails.

Return type:

str

convert_smiles_to_inchikey_with_rdkit(smiles)[source]

Converts a SMILES string to an InChIKey using RDKit. The results are cached to improve performance.

Parameters:

smiles (str) – The SMILES string to convert.

Returns:

The InChIKey of the compound, or None if conversion fails.

Return type:

str

encode(smiles)[source]

Get the InChIKey of a compound based on its SMILES string.

Parameters:

smiles (str) – The SMILES string of the compound.

Returns:

The InChIKey of the compound, or ‘UNPROCESSABLE_INPUT’ if conversion fails.

Return type:

str

async encode_batch(smiles_list)[source]

Encode a batch of SMILES strings asynchronously.

Parameters:

smiles_list (list) – The list of SMILES strings to encode.

Returns:

The list of encoded results.

Return type:

list

guess_type(text)[source]

Guess the type of the given text (either ‘smiles’ or ‘inchikey’).

Parameters:

text (str) – The text to guess the type of.

Returns:

The guessed type (‘smiles’, ‘inchikey’, or ‘UNPROCESSABLE_INPUT’).

Return type:

str

is_input_header(h)[source]

Check if the given header is an input header.

Parameters:

h (str) – The header to check.

Returns:

True if the header is an input header, False otherwise.

Return type:

bool

is_key_header(h)[source]

Check if the given header is a key header.

Parameters:

h (str) – The header to check.

Returns:

True if the header is a key header, False otherwise.

Return type:

bool

async process_smiles(smiles, semaphore, session, result_list)[source]

Process a SMILES string asynchronously.

Parameters:
  • smiles (str) – The SMILES string to process.

  • semaphore (asyncio.Semaphore) – The semaphore to limit concurrency.

  • session (aiohttp.ClientSession) – The HTTP session for making requests.

  • result_list (list) – The list to store results.

unichem_resolver(inchikey)[source]

Resolve an InChIKey to a SMILES string using UniChem.

Parameters:

inchikey (str) – The InChIKey to resolve.

Returns:

The resolved SMILES string, or None if resolution fails.

Return type:

str

validate_smiles(smiles)[source]

Validate a SMILES string.

Parameters:

smiles (str) – The SMILES string to validate.

Returns:

True if the SMILES string is valid, False otherwise.

Return type:

bool

ersilia.utils.identifiers.compound.Identifier

alias of CompoundIdentifier

ersilia.utils.identifiers.file module

class ersilia.utils.identifiers.file.FileIdentifier(chunk_size=10000)[source]

Bases: object

A class to handle file identification and generate MD5 hashes for files.

Parameters:

chunk_size (int, optional) – The size of the chunks to read from the file. Default is 10000 bytes.

encode(filename, n=None)[source]

Generate an MD5 hash for the given file.

Parameters:
  • filename (str) – The path to the file to hash.

  • n (int, optional) – The number of characters of the hash to return. Default is None, which returns the full hash.

Returns:

The MD5 hash of the file, or the filename if MD5 is not available.

Return type:

str

ersilia.utils.identifiers.file.Identifier

alias of FileIdentifier

ersilia.utils.identifiers.long module

ersilia.utils.identifiers.long.Identifier

alias of LongIdentifier

class ersilia.utils.identifiers.long.LongIdentifier[source]

Bases: object

A class to generate long identifiers (UUIDs).

encode()[source]

Generate a UUID or a random identifier if UUID is not available.

static encode()[source]

Generate a UUID or a random identifier if UUID is not available.

Returns:

A UUID string or a randomly generated identifier.

Return type:

str

ersilia.utils.identifiers.model module

ersilia.utils.identifiers.model.Identifier

alias of ModelIdentifier

class ersilia.utils.identifiers.model.ModelIdentifier[source]

Bases: object

A class to handle model identification generation for new ersilia model and validation.

choice()[source]

Generate a unique model identifier that does not exist in the Ersilia repository.

Returns:

A unique model identifier.

Return type:

str

encode()[source]

Generate a new model identifier.

Returns:

A new model identifier.

Return type:

str

exists(model_id)[source]

Check if a model identifier exists in the Ersilia repository.

Parameters:

model_id (str) – The model identifier to check.

Returns:

True if the model identifier exists, False otherwise.

Return type:

bool

generate(n)[source]

Generate a list of unique model identifiers.

Parameters:

n (int) – The number of model identifiers to generate.

Returns:

A list of unique model identifiers.

Return type:

list

is_test(s)[source]

Check if a given model identifier is a test identifier.

Parameters:

s (str) – The model identifier to check.

Returns:

True if the model identifier is a test identifier, False otherwise.

Return type:

bool

is_valid(s)[source]

Check if a given string is a valid model identifier.

Parameters:

s (str) – The string to check.

Returns:

True if the string is a valid model identifier, False otherwise.

Return type:

bool

ersilia.utils.identifiers.protein module

ersilia.utils.identifiers.short module

ersilia.utils.identifiers.short.Identifier

alias of ShortIdentifier

class ersilia.utils.identifiers.short.ShortIdentifier[source]

Bases: object

A class to generate short identifiers.

encode()[source]

Generate a short identifier based on the current timestamp or a random number.

Returns:

A short identifier string.

Return type:

str

ersilia.utils.identifiers.text module

ersilia.utils.identifiers.text.Identifier

alias of TextIdentifier

class ersilia.utils.identifiers.text.TextIdentifier[source]

Bases: object

A class to handle text identification by generating MD5 checksums.

This class provides methods to generate a unique identifier (checksum) for a given text string using the MD5 hashing algorithm. It also includes a method to perform a basic validation check on the generated checksum.

encode(text: str) str[source]

Generate an MD5 checksum for the given text.

This method takes a text string as input and returns a unique identifier (checksum) for the text using the MD5 hashing algorithm. The checksum is prefixed with “key” to distinguish it from other strings.

Parameters:

text (str) – The text string to generate a checksum for.

Returns:

The MD5 checksum of the text, prefixed with “key”.

Return type:

str

ersilia.utils.identifiers.timestamp module

ersilia.utils.identifiers.timestamp.Identifier

alias of TimeStampIdentifier

class ersilia.utils.identifiers.timestamp.TimeStampIdentifier[source]

Bases: object

Class for handling timestamp identifiers.

encode()[source]

Encode the current timestamp.

Returns:

The encoded timestamp.

Return type:

str

Module contents