evo_prot_grad.common.tokenizers
ExpertTokenizer
          evo_prot_grad.common.tokenizers.ExpertTokenizer
  
            Bases: abc.ABC
Base interface for custom Expert tokenizers.
__init__(alphabet: List[str]) -> None
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| alphabet | List[str] | A list of amino acid characters. | required | 
get_vocab() -> Dict
  Return the vocab, a mapping of amino acid characters to integers.
__call__(seqs: List[str]) -> torch.FloatTensor
  
  
      abstractmethod
  
  Convert seqs to one hot tensors.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| seqs | List[str] | A list of protein sequence strings of len [parallel_chains]. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| ohs | torch.FloatTensor | of shape [parallel_chains, seq_len, vocab_size] | 
decode(ohs: torch.Tensor) -> List[str]
  
  
      abstractmethod
  
  Convert one-hot tensors back to a list of string sequences.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| ohs | torch.Tensor | shape [parallel_chains, seq_len, vocab_size] | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| seqs | List[str] | A list of protein sequence strings of len [parallel_chains]. | 
OneHotTokenizer
          evo_prot_grad.common.tokenizers.OneHotTokenizer
  
            Bases: ExpertTokenizer
Converts a string of amino acids into one-hot tensors.
get_vocab() -> Dict
  Return the vocab, a mapping of amino acid characters to integers.
__init__(alphabet: List[str])
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| alphabet | List[str] | A list of amino acid characters. | required | 
__call__(seqs: List[str]) -> torch.FloatTensor
  Convert seqs to one hot tensors. Assumes each sequence is the same length. Handles sequences with spaces between amino acids.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| seqs | List[str] | A list of protein sequence strings of len [parallel_chains]. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| ohs | torch.FloatTensor | of shape [parallel_chains, seq_len, vocab_size] | 
decode(ohs: torch.Tensor) -> List[str]
  Convert one-hot tensors back to a list of string sequences with a space between each amino acid.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| ohs | torch.Tensor | shape [parallel_chains, seq_len, vocab_size] | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| seqs | List[str] | A list of protein sequence strings of len [parallel_chains]. |