ersilia.utils.identifiers package¶
Submodules¶
ersilia.utils.identifiers.arbitrary module¶
- class ersilia.utils.identifiers.arbitrary.ArbitraryIdentifier[source]¶
Bases:
object
Class for handling arbitrary identifiers.
- ersilia.utils.identifiers.arbitrary.Identifier¶
alias of
ArbitraryIdentifier
ersilia.utils.identifiers.compound module¶
- class ersilia.utils.identifiers.compound.CompoundIdentifier(local=True, concurrency_limit=10, cache_maxsize=128)[source]¶
Bases:
object
A class to handle compound identification and conversion between different chemical identifiers.
- Parameters:
local (bool, optional) – If True, use local RDKit for SMILES to InChIKey conversion. Default is True.
concurrency_limit (int, optional) – The maximum number of concurrent API requests. Default is 10.
cache_maxsize (int, optional) – The maximum size of the LRU cache for storing API results. Default is 128.
Examples
identifier = CompoundIdentifier() smiles = "CCO" inchikey = identifier.encode(smiles) print(inchikey)
- static chemical_identifier_resolver(identifier)[source]¶
Resolve a chemical identifier to a SMILES string using the NCI tool.
- Parameters:
identifier (str) – The chemical identifier to resolve.
- Returns:
The resolved SMILES string, or ‘UNPROCESSABLE_INPUT’ if resolution fails.
- Return type:
str
- convert_smiles_to_inchikey_with_rdkit(smiles)[source]¶
Converts a SMILES string to an InChIKey using RDKit. The results are cached to improve performance.
- Parameters:
smiles (str) – The SMILES string to convert.
- Returns:
The InChIKey of the compound, or None if conversion fails.
- Return type:
str
- encode(smiles)[source]¶
Get the InChIKey of a compound based on its SMILES string.
- Parameters:
smiles (str) – The SMILES string of the compound.
- Returns:
The InChIKey of the compound, or ‘UNPROCESSABLE_INPUT’ if conversion fails.
- Return type:
str
- async encode_batch(smiles_list)[source]¶
Encode a batch of SMILES strings asynchronously.
- Parameters:
smiles_list (list) – The list of SMILES strings to encode.
- Returns:
The list of encoded results.
- Return type:
list
- guess_type(text)[source]¶
Guess the type of the given text (either ‘smiles’ or ‘inchikey’).
- Parameters:
text (str) – The text to guess the type of.
- Returns:
The guessed type (‘smiles’, ‘inchikey’, or ‘UNPROCESSABLE_INPUT’).
- Return type:
str
- is_input_header(h)[source]¶
Check if the given header is an input header.
- Parameters:
h (str) – The header to check.
- Returns:
True if the header is an input header, False otherwise.
- Return type:
bool
- is_key_header(h)[source]¶
Check if the given header is a key header.
- Parameters:
h (str) – The header to check.
- Returns:
True if the header is a key header, False otherwise.
- Return type:
bool
- async process_smiles(smiles, semaphore, session, result_list)[source]¶
Process a SMILES string asynchronously.
- Parameters:
smiles (str) – The SMILES string to process.
semaphore (asyncio.Semaphore) – The semaphore to limit concurrency.
session (aiohttp.ClientSession) – The HTTP session for making requests.
result_list (list) – The list to store results.
- ersilia.utils.identifiers.compound.Identifier¶
alias of
CompoundIdentifier
ersilia.utils.identifiers.file module¶
- class ersilia.utils.identifiers.file.FileIdentifier(chunk_size=10000)[source]¶
Bases:
object
A class to handle file identification and generate MD5 hashes for files.
- Parameters:
chunk_size (int, optional) – The size of the chunks to read from the file. Default is 10000 bytes.
- encode(filename, n=None)[source]¶
Generate an MD5 hash for the given file.
- Parameters:
filename (str) – The path to the file to hash.
n (int, optional) – The number of characters of the hash to return. Default is None, which returns the full hash.
- Returns:
The MD5 hash of the file, or the filename if MD5 is not available.
- Return type:
str
- ersilia.utils.identifiers.file.Identifier¶
alias of
FileIdentifier
ersilia.utils.identifiers.long module¶
- ersilia.utils.identifiers.long.Identifier¶
alias of
LongIdentifier
ersilia.utils.identifiers.model module¶
- ersilia.utils.identifiers.model.Identifier¶
alias of
ModelIdentifier
- class ersilia.utils.identifiers.model.ModelIdentifier[source]¶
Bases:
object
A class to handle model identification generation for new ersilia model and validation.
- choice()[source]¶
Generate a unique model identifier that does not exist in the Ersilia repository.
- Returns:
A unique model identifier.
- Return type:
str
- encode()[source]¶
Generate a new model identifier.
- Returns:
A new model identifier.
- Return type:
str
- exists(model_id)[source]¶
Check if a model identifier exists in the Ersilia repository.
- Parameters:
model_id (str) – The model identifier to check.
- Returns:
True if the model identifier exists, False otherwise.
- Return type:
bool
- generate(n)[source]¶
Generate a list of unique model identifiers.
- Parameters:
n (int) – The number of model identifiers to generate.
- Returns:
A list of unique model identifiers.
- Return type:
list
ersilia.utils.identifiers.protein module¶
ersilia.utils.identifiers.short module¶
- ersilia.utils.identifiers.short.Identifier¶
alias of
ShortIdentifier
ersilia.utils.identifiers.text module¶
- ersilia.utils.identifiers.text.Identifier¶
alias of
TextIdentifier
- class ersilia.utils.identifiers.text.TextIdentifier[source]¶
Bases:
object
A class to handle text identification by generating MD5 checksums.
This class provides methods to generate a unique identifier (checksum) for a given text string using the MD5 hashing algorithm. It also includes a method to perform a basic validation check on the generated checksum.
- encode(text: str) str [source]¶
Generate an MD5 checksum for the given text.
This method takes a text string as input and returns a unique identifier (checksum) for the text using the MD5 hashing algorithm. The checksum is prefixed with “key” to distinguish it from other strings.
- Parameters:
text (str) – The text string to generate a checksum for.
- Returns:
The MD5 checksum of the text, prefixed with “key”.
- Return type:
str
ersilia.utils.identifiers.timestamp module¶
- ersilia.utils.identifiers.timestamp.Identifier¶
alias of
TimeStampIdentifier