Implementation of different ACMG criteria for sequence variants#
This section contains the internal building blocks for the implementation of the different ACMG criteria for sequence variants.
PVS1#
For sequence variants:
PVS1 criteria for Sequence Variants (SeqVar).
- class src.seqvar.auto_pvs1.AutoPVS1[source]#
Bases:
SeqVarPVS1HelperHandles the PVS1 criteria assessment for sequence variants.
- _calc_alt_reg(var_pos: int, exons: List[Exon], strand: GenomicStrand) Tuple[int, int]#
Calculates the altered region’s start and end positions.
This method calculates the start and end positions of the altered region based on the position of the variant in the coding sequence and the exons of the gene. The method is implemented as follows: - If the genomic strand is plus, the start position is the variant position, and the end position is the last exon’s end position. - If the genomic strand is minus, the start position is the first exon’s start position, and the end position is the variant position.
- Parameters:
var_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
The start and end positions of the altered region.
- Return type:
Tuple[int, int]
- _closest_alt_start_cdn(cds_info: Dict[str, CdsInfo], hgvs: str) int | None#
Calculate the closest potential start codon.
The method calculates the closest potential start codon based on the position of the variant in the coding sequence and the CDS information of the gene.
- Parameters:
cds_info – A dictionary containing the CDS information for all transcripts.
hgvs – The main transcript ID.
- Returns:
The position of the closest potential start codon, or None if not found.
- Return type:
Optional[int]
- _convert_consequence(var_data: AutoACMGSeqVarData) SeqVarPVS1Consequence[source]#
Convert the VEP consequence of the sequence variant to the internal representation.
- Parameters:
seqvar – The sequence variant being analyzed.
- Returns:
The internal representation of the VEP consequence.
- Return type:
SeqVarConsequence
- _count_lof_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int]#
Counts Loss-of-Function (LoF) variants in the specified range.
The method retrieves variants from the specified range and iterates through the available data of each variant. The method counts the number of LoF variants and the number of frequent LoF variants in the specified range, based on the gnomAD genomes data (for the consequence of Nonsense and Frameshift variants) and the allele frequency (for the frequency of the LoF variants in the general population).
Note
A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 0.1% to determine if the LoF variant is frequent.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of frequent LoF variants and the total number of LoF variants.
- Return type:
Tuple[int, int]
- _count_pathogenic_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int]#
Counts pathogenic variants in the specified range.
The method retrieves variants from the specified range and iterates through the ClinVar data of each variant to count the number of pathogenic variants and the total number of variants.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of pathogenic variants and the total number of variants.
- Return type:
Tuple[int, int]
- Raises:
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _find_aff_exon_pos(var_pos: int, exons: List[Exon]) Tuple[int, int]#
Find start and end positions of the affected exon.
- Parameters:
var_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
- Returns:
The start and end positions of the affected exon.
- Return type:
Tuple[int, int]
- Raises:
AlgorithmError – If the affected exon is not found.
- _get_conseq(val: SeqVarPVS1Consequence) List[str]#
Get the VEP consequence of the sequence variant by value.
- Parameters:
val – The value of the consequence.
- Returns:
The VEP consequences of the sequence variant.
- Return type:
List[str]
- _skipping_exon_pos(seqvar: SeqVar, exons: List[Exon]) Tuple[int, int]#
Calculate the length of the closest to the seqvar exon.
The method calculates the length of the exon, which can be skipped due to the variant consequences.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
- Returns:
The start and end positions of the exon skipping region.
- Return type:
Tuple[int, int]
- alt_start_cdn(cds_info: Dict[str, CdsInfo], hgvs: str) bool#
Check if the variant introduces an alternative start codon in other transcripts.
- Implementation of the rule:
- Iterating through all transcripts and checking if the coding sequence start
differs from the main transcript.
If the start codon differs, the rule is met.
Note
- Rule:
If the variant introduces an alternative start codon in other transcripts, it is considered to be non-pathogenic.
- Parameters:
hgvs – The main transcript ID.
cds_info – A dictionary containing the CDS information for all transcripts.
- Returns:
- True if the variant introduces an alternative start codon in other transcripts,
False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_pvs1: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- crit4prot_func(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool#
Checks if the truncated or altered region is critical for the protein function.
This method assesses the impact of a sequence variant based on the presence of pathogenic variants downstream of the new stop codon, utilizing both experimental and clinical evidence.
Implementation of the rule: - Calculating the range of the altered region, based on the position of the variant in the coding sequence and the exons of the gene. - Fetching variants from the specified range of the altered region. - Counting the number of pathogenic variants in that region, by iterating through the clinvar data of each variant. - Considering the region critical if the frequency of pathogenic variants exceeds 5%.
Note: The significance of a truncated or altered region is determined by the presence and frequency of pathogenic variants downstream from the new stop codon. We use a threshold of 5% to determine if the region is critical for the protein function.
- Parameters:
seqvar – The sequence variant being analyzed.
cds_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
True if the altered region is critical for the protein function, otherwise False.
- Return type:
bool
- Raises:
InvalidAPIResponseError – If the API response is invalid or cannot be processed.
- exon_skip_or_cryptic_ss_disrupt(seqvar: SeqVar, exons: List[Exon], consequences: List[str], strand: GenomicStrand) bool#
Check if the variant causes exon skipping or cryptic splice site disruption.
The method checks if the variant causes exon skipping or cryptic splice site disruption based on the position of the variant in the coding sequence and the exons of the gene.
Implementation of the rule: - If the exon length is not a multiple of 3, the variant is predicted to cause exon skipping. - If the variant is a splice acceptor or donor variant, the method predicts cryptic splice site disruption.
Note
Rule: If the variant causes exon skipping or cryptic splice site disruption, it is considered to be pathogenic.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
consequences – A list of VEP consequences of the sequence variant.
- Returns:
- True if the variant causes exon skipping or cryptic splice site disruption,
False if preserves reading frame.
- Return type:
bool
- in_bio_relevant_tx(transcript_tags: List[str]) bool#
Checks if the exon with SeqVar is in a biologically relevant transcript.
- Implementation of the rule:
If the variant is located in a transcript with a MANE Select tag, it is considered to be in a biologically relevant transcript.
- Parameters:
transcript_tags – A list of tags for the transcript.
- Returns:
True if the variant is in a biologically relevant transcript, False otherwise.
- Return type:
bool
- lof_freq_in_pop(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool#
Checks if the Loss-of-Function (LoF) variants in the exon are frequent in the general population.
This function determines the frequency of LoF variants within a specified genomic region and evaluates whether this frequency exceeds a defined threshold indicative of common occurrence in the general population.
Implementation of the rule: - Calculating the range of the altered region (coding sequence of the transcript). - Counting the number of LoF variants and frequent LoF variants in that region. - Considering the LoF variants frequent in the general population if the frequency of “frequent” LoF variants exceeds 10%.
Note: A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 10% to determine if the LoF variant is frequent.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
True if the LoF variant frequency is greater than 10%, False otherwise.
- Return type:
bool
- Raises:
InvalidAPIResponseError – If the API response is invalid or cannot be processed.
- lof_rm_gt_10pct_of_prot(prot_pos: int, prot_length: int) bool#
Check if the LoF variant removes more than 10% of the protein.
The method checks if the LoF variant removes more than 10% of the protein based on the position of the variant in the protein and the length of the protein.
Note
- Rule:
A LoF variant is considered to remove more than 10% of the protein if the variant removes more than 10% of the protein.
- Parameters:
prot_pos – The position of the variant in the protein.
prot_length – The length of the protein.
- Returns:
True if the LoF variant removes more than 10% of the protein, False otherwise.
- Return type:
bool
- predict_pvs1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) AutoACMGCriteria[source]#
Predict the PVS1 criteria.
- undergo_nmd(var_pos: int, hgnc_id: str, strand: GenomicStrand, exons: List[Exon]) bool#
Classifies if the variant undergoes Nonsense-mediated decay (NMD).
Implementation of the rule: The method checks if the variant is in the GJB2 gene and always predicts it to undergo NMD. If the variant is not in the GJB2 gene, the method checks if the new stop codon position including the 5’ UTR length is less or equal to the NMD cutoff, and if it is indeed less or equal, the variant is predicted to undergo NMD. Otherwise, the variant is predicted to escape NMD.
Note
- Rule:
If the variant is located in the last exon or in the last 50 nucleotides of the penultimate exon, it is NOT predicted to undergo NMD.
- Important:
For the GJB2 gene (HGNC:4284), the variant is always predicted to undergo NMD.
- Parameters:
var_pos – The position of the new stop codon !including the 5’ UTR length!.
hgnc_id – The HGNC ID of the gene.
strand – The genomic strand of the gene.
exons – A list of exons of the gene.
- Returns:
True if the variant undergoes NMD, False if variant escapes NMD.
- Return type:
bool
- up_pathogenic_vars(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool#
Look for pathogenic variants upstream of the closest potential in-frame start codon.
The method checks for pathogenic variants upstream of the closest potential in-frame start codon. The method is implemented as follows: - Find the closest potential in-frame start codon. - Fetch and count pathogenic variants in the specified range. - Return True if pathogenic variants are found, otherwise False.
- Parameters:
seqvar – The sequence variant being analyzed.
cds_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
- True if pathogenic variants are found upstream of the closest potential in-frame
start codon, False otherwise.
- Return type:
bool
- verify_pvs1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PVS1Prediction, PVS1PredictionSeqVarPath, str][source]#
Make the PVS1 prediction.
The prediction is based on the PVS1 decision tree for sequence variants. The prediction and prediction path is stored in the prediction and prediction_path attributes.
- Parameters:
seqvar – The sequence variant being analyzed.
var_data – The data of the sequence variant.
- Returns:
- The prediction, prediction path,
and the comment.
- Return type:
Tuple[PVS1Prediction, PVS1PredictionSeqVarPath, str]
- src.seqvar.auto_pvs1.CUSTOM_VCEP_PVS1 = ['HGNC:92', 'HGNC:7551', 'HGNC:1748', 'HGNC:4175', 'HGNC:4136', 'HGNC:3546', 'HGNC:3551', 'HGNC:7720', 'HGNC:129', 'HGNC:7448', 'HGNC:10483', 'HGNC:17098', 'HGNC:1100', 'HGNC:1101', 'HGNC:10585', 'HGNC:10588', 'HGNC:10590', 'HGNC:10596', 'HGNC:10586', 'HGNC:6547', 'HGNC:795', 'HGNC:26144', 'HGNC:175', 'HGNC:3349', 'HGNC:583', 'HGNC:7127', 'HGNC:7325', 'HGNC:7329', 'HGNC:9122', 'HGNC:10294', 'HGNC:4065', 'HGNC:11621', 'HGNC:5024', 'HGNC:4195', 'HGNC:10471', 'HGNC:8582', 'HGNC:9588', 'HGNC:6742', 'HGNC:11634', 'HGNC:11079', 'HGNC:11411', 'HGNC:3811', 'HGNC:6990', 'HGNC:12496', 'HGNC:12765', 'HGNC:186', 'HGNC:17642', 'HGNC:6024', 'HGNC:6193', 'HGNC:9831', 'HGNC:9832', 'HGNC:6010', 'HGNC:11998', 'HGNC:12687']#
List of genes with custom PVS1 criteria.
- class src.seqvar.auto_pvs1.SeqVarPVS1Helper[source]#
Bases:
AutoACMGHelperHelper methods for PVS1 criteria for sequence variants.
- _calc_alt_reg(var_pos: int, exons: List[Exon], strand: GenomicStrand) Tuple[int, int][source]#
Calculates the altered region’s start and end positions.
This method calculates the start and end positions of the altered region based on the position of the variant in the coding sequence and the exons of the gene. The method is implemented as follows: - If the genomic strand is plus, the start position is the variant position, and the end position is the last exon’s end position. - If the genomic strand is minus, the start position is the first exon’s start position, and the end position is the variant position.
- Parameters:
var_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
The start and end positions of the altered region.
- Return type:
Tuple[int, int]
- _closest_alt_start_cdn(cds_info: Dict[str, CdsInfo], hgvs: str) int | None[source]#
Calculate the closest potential start codon.
The method calculates the closest potential start codon based on the position of the variant in the coding sequence and the CDS information of the gene.
- Parameters:
cds_info – A dictionary containing the CDS information for all transcripts.
hgvs – The main transcript ID.
- Returns:
The position of the closest potential start codon, or None if not found.
- Return type:
Optional[int]
- _count_lof_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int][source]#
Counts Loss-of-Function (LoF) variants in the specified range.
The method retrieves variants from the specified range and iterates through the available data of each variant. The method counts the number of LoF variants and the number of frequent LoF variants in the specified range, based on the gnomAD genomes data (for the consequence of Nonsense and Frameshift variants) and the allele frequency (for the frequency of the LoF variants in the general population).
Note
A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 0.1% to determine if the LoF variant is frequent.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of frequent LoF variants and the total number of LoF variants.
- Return type:
Tuple[int, int]
- _count_pathogenic_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int][source]#
Counts pathogenic variants in the specified range.
The method retrieves variants from the specified range and iterates through the ClinVar data of each variant to count the number of pathogenic variants and the total number of variants.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of pathogenic variants and the total number of variants.
- Return type:
Tuple[int, int]
- Raises:
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _find_aff_exon_pos(var_pos: int, exons: List[Exon]) Tuple[int, int][source]#
Find start and end positions of the affected exon.
- Parameters:
var_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
- Returns:
The start and end positions of the affected exon.
- Return type:
Tuple[int, int]
- Raises:
AlgorithmError – If the affected exon is not found.
- _get_conseq(val: SeqVarPVS1Consequence) List[str][source]#
Get the VEP consequence of the sequence variant by value.
- Parameters:
val – The value of the consequence.
- Returns:
The VEP consequences of the sequence variant.
- Return type:
List[str]
- _skipping_exon_pos(seqvar: SeqVar, exons: List[Exon]) Tuple[int, int][source]#
Calculate the length of the closest to the seqvar exon.
The method calculates the length of the exon, which can be skipped due to the variant consequences.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
- Returns:
The start and end positions of the exon skipping region.
- Return type:
Tuple[int, int]
- alt_start_cdn(cds_info: Dict[str, CdsInfo], hgvs: str) bool[source]#
Check if the variant introduces an alternative start codon in other transcripts.
- Implementation of the rule:
- Iterating through all transcripts and checking if the coding sequence start
differs from the main transcript.
If the start codon differs, the rule is met.
Note
- Rule:
If the variant introduces an alternative start codon in other transcripts, it is considered to be non-pathogenic.
- Parameters:
hgvs – The main transcript ID.
cds_info – A dictionary containing the CDS information for all transcripts.
- Returns:
- True if the variant introduces an alternative start codon in other transcripts,
False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_pvs1: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- crit4prot_func(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool[source]#
Checks if the truncated or altered region is critical for the protein function.
This method assesses the impact of a sequence variant based on the presence of pathogenic variants downstream of the new stop codon, utilizing both experimental and clinical evidence.
Implementation of the rule: - Calculating the range of the altered region, based on the position of the variant in the coding sequence and the exons of the gene. - Fetching variants from the specified range of the altered region. - Counting the number of pathogenic variants in that region, by iterating through the clinvar data of each variant. - Considering the region critical if the frequency of pathogenic variants exceeds 5%.
Note: The significance of a truncated or altered region is determined by the presence and frequency of pathogenic variants downstream from the new stop codon. We use a threshold of 5% to determine if the region is critical for the protein function.
- Parameters:
seqvar – The sequence variant being analyzed.
cds_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
True if the altered region is critical for the protein function, otherwise False.
- Return type:
bool
- Raises:
InvalidAPIResponseError – If the API response is invalid or cannot be processed.
- exon_skip_or_cryptic_ss_disrupt(seqvar: SeqVar, exons: List[Exon], consequences: List[str], strand: GenomicStrand) bool[source]#
Check if the variant causes exon skipping or cryptic splice site disruption.
The method checks if the variant causes exon skipping or cryptic splice site disruption based on the position of the variant in the coding sequence and the exons of the gene.
Implementation of the rule: - If the exon length is not a multiple of 3, the variant is predicted to cause exon skipping. - If the variant is a splice acceptor or donor variant, the method predicts cryptic splice site disruption.
Note
Rule: If the variant causes exon skipping or cryptic splice site disruption, it is considered to be pathogenic.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
consequences – A list of VEP consequences of the sequence variant.
- Returns:
- True if the variant causes exon skipping or cryptic splice site disruption,
False if preserves reading frame.
- Return type:
bool
- in_bio_relevant_tx(transcript_tags: List[str]) bool[source]#
Checks if the exon with SeqVar is in a biologically relevant transcript.
- Implementation of the rule:
If the variant is located in a transcript with a MANE Select tag, it is considered to be in a biologically relevant transcript.
- Parameters:
transcript_tags – A list of tags for the transcript.
- Returns:
True if the variant is in a biologically relevant transcript, False otherwise.
- Return type:
bool
- lof_freq_in_pop(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool[source]#
Checks if the Loss-of-Function (LoF) variants in the exon are frequent in the general population.
This function determines the frequency of LoF variants within a specified genomic region and evaluates whether this frequency exceeds a defined threshold indicative of common occurrence in the general population.
Implementation of the rule: - Calculating the range of the altered region (coding sequence of the transcript). - Counting the number of LoF variants and frequent LoF variants in that region. - Considering the LoF variants frequent in the general population if the frequency of “frequent” LoF variants exceeds 10%.
Note: A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 10% to determine if the LoF variant is frequent.
- Parameters:
seqvar – The sequence variant being analyzed.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
True if the LoF variant frequency is greater than 10%, False otherwise.
- Return type:
bool
- Raises:
InvalidAPIResponseError – If the API response is invalid or cannot be processed.
- lof_rm_gt_10pct_of_prot(prot_pos: int, prot_length: int) bool[source]#
Check if the LoF variant removes more than 10% of the protein.
The method checks if the LoF variant removes more than 10% of the protein based on the position of the variant in the protein and the length of the protein.
Note
- Rule:
A LoF variant is considered to remove more than 10% of the protein if the variant removes more than 10% of the protein.
- Parameters:
prot_pos – The position of the variant in the protein.
prot_length – The length of the protein.
- Returns:
True if the LoF variant removes more than 10% of the protein, False otherwise.
- Return type:
bool
- undergo_nmd(var_pos: int, hgnc_id: str, strand: GenomicStrand, exons: List[Exon]) bool[source]#
Classifies if the variant undergoes Nonsense-mediated decay (NMD).
Implementation of the rule: The method checks if the variant is in the GJB2 gene and always predicts it to undergo NMD. If the variant is not in the GJB2 gene, the method checks if the new stop codon position including the 5’ UTR length is less or equal to the NMD cutoff, and if it is indeed less or equal, the variant is predicted to undergo NMD. Otherwise, the variant is predicted to escape NMD.
Note
- Rule:
If the variant is located in the last exon or in the last 50 nucleotides of the penultimate exon, it is NOT predicted to undergo NMD.
- Important:
For the GJB2 gene (HGNC:4284), the variant is always predicted to undergo NMD.
- Parameters:
var_pos – The position of the new stop codon !including the 5’ UTR length!.
hgnc_id – The HGNC ID of the gene.
strand – The genomic strand of the gene.
exons – A list of exons of the gene.
- Returns:
True if the variant undergoes NMD, False if variant escapes NMD.
- Return type:
bool
- up_pathogenic_vars(seqvar: SeqVar, exons: List[Exon], strand: GenomicStrand) bool[source]#
Look for pathogenic variants upstream of the closest potential in-frame start codon.
The method checks for pathogenic variants upstream of the closest potential in-frame start codon. The method is implemented as follows: - Find the closest potential in-frame start codon. - Fetch and count pathogenic variants in the specified range. - Return True if pathogenic variants are found, otherwise False.
- Parameters:
seqvar – The sequence variant being analyzed.
cds_pos – The position of the variant in the coding sequence.
exons – A list of exons of the gene where the variant occurs.
strand – The genomic strand of the gene.
- Returns:
- True if pathogenic variants are found upstream of the closest potential in-frame
start codon, False otherwise.
- Return type:
bool
PS1 and PM5#
Implementation of PS1 and PM5 prediction for sequence variants.
- class src.seqvar.auto_ps1_pm5.AutoPS1PM5[source]#
Bases:
AutoACMGHelperClass for PS1 and PM5 prediction.
- _affect_splicing(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant affects splicing. If any of spliceAI scores are above the threshold, the variant is considered to affect splicing.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant affects splicing, False otherwise.
- Return type:
bool
- _get_var_info(seqvar: SeqVar) AnnonarsVariantResponse | None[source]#
Get variant information from Annonars.
- Parameters:
seqvar – The sequence variant.
- Returns:
Annonars response if the variant is found, None otherwise.
- Return type:
Optional[AnnonarsVariantResponse]
- _is_missense(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is missense.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a missense variant, False otherwise.
- Return type:
bool
- _is_pathogenic(variant_info: VariantResult) bool[source]#
Check if the variant is pathogenic based on ClinVar data.
- Parameters:
variant_info – Annonars variant information
- Returns:
True if the variant is pathogenic
- Return type:
bool
- _is_splice_affecting(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant is a splice-affecting variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a splice-affecting variant, False otherwise.
- Return type:
bool
- _parse_HGVSp(pHGVSp: str) AminoAcid | None[source]#
Parse the pHGVSp from VEP into its components.
- Parameters:
pHGVSp – The protein change in HGVS format.
- Returns:
The amino acid change if the pHGVSp is valid, None otherwise.
- Return type:
Optional[AminoAcid]
- annonars_client#
Annonars client for the API.
- comment_ps1pm5: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_ps1pm5(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[AutoACMGCriteria, AutoACMGCriteria][source]#
Predict PS1 and PM5 criteria.
- prediction_ps1pm5: PS1PM5 | None#
Prediction result.
- verify_ps1pm5(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PS1PM5 | None, str][source]#
Predicts the criteria PS1 and PM5 for the provided sequence variant.
- Implementation of the rule PS1 and PM5:
The method implements the rule by: - Getting the primary variant information & parsing the primary amino acid change. - Iterating over all possible alternative bases & getting the alternative variant information. - Parsing the alternative amino acid change & checking if the alternative variant is pathogenic. - If the alternative variant is pathogenic and the amino acid change is the same as the primary variant, then PS1 is set to True. - If the alternative variant is pathogenic and the amino acid change is different from the primary variant, then PM5 is set to True.
Note
Rules: PS1: Same amino acid change as a previously established pathogenic variant regardless of nucleotide change. PM5: Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before.
- Returns:
Prediction result and the comment with the explanation.
- Return type:
Tuple[Optional[PS1PM5], str]
- src.seqvar.auto_ps1_pm5.DNA_BASES = ['A', 'C', 'G', 'T']#
DNA bases
- src.seqvar.auto_ps1_pm5.REGEX_HGVSP = re.compile('p\\.(\\D+)(\\d+)(\\D+)')#
Regular expression for parsing pHGVSp
PM1#
Implementation of PM1 criteria.
- class src.seqvar.auto_pm1.AutoPM1[source]#
Bases:
AutoACMGHelperClass for PM1 prediction.
- _count_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int][source]#
Counts pathogenic and benign variants in the specified range.
The method retrieves variants from the specified range and iterates through the ClinVar data of each variant to count the number of pathogenic and benign SNVs.
Note
The method considers “Pathogenic” and “Likely pathogenic” as pathogenic and “Benign” and “Likely benign” as benign.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of pathogenic and benign variants.
- Return type:
Tuple[int, int]
- Raises:
AlgorithmError – If end position is less than the start position.
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _get_affected_exon(var_data: AutoACMGSeqVarData, seqvar: SeqVar) int[source]#
Get the affected exon number for the variant.
Go through all exons and count them before the variant position. The method also considers the strand of the gene.
- Parameters:
var_data – AutoACMGData object
seqvar – SeqVar object
- Returns:
Affected exon number
- Return type:
int
- Raises:
AlgorithmError – If the strand is invalid.
- _get_uniprot_domain(seqvar: SeqVar) Tuple[int, int] | None[source]#
Retrieve the UniProt domain for the variant and return the start and end positions if found or None otherwise.
- Parameters:
seqvar – The sequence variant being analyzed.
- Returns:
The start and end positions of the UniProt domain if found, None otherwise.
- Return type:
Optional[Tuple[int, int]]
- Raises:
AlgorithmError – If tabix fails to query the UniProt file.
- annonars_client#
Annonars client for the API.
- comment_pm1: str#
comment_pm1 to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_pm1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) AutoACMGCriteria[source]#
Predict PM1 criteria.
- prediction_pm1: PM1 | None#
Prediction result.
- verify_pm1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PM1 | None, str][source]#
Predict PM1 criteria.
The method verifies the PM1 criteria for the variant. It first checks if the variant is in the mitochondrial genome. If so, it returns PM1 as not met. Otherwise, it counts the number of pathogenic and benign variants in the range of 25 bases before and after the variant position. If the number of pathogenic variants is greater than or equal to the threshold, it returns PM1 as met. Otherwise, it retrieves the UniProt domain for the variant and counts the number of pathogenic and benign variants in the domain. If the number of pathogenic variants is greater than or equal to 1/4 of the domain length, it returns PM1 as met. Otherwise, it returns PM1 as not met.
- Parameters:
seqvar – The sequence variant being analyzed.
var_data – AutoACMGData object
- Returns:
The prediction result and the explanation
- Return type:
Tuple[Optional[PM1], str]
PM2, BA1, BS1, BS2#
Implementation of BA1, BS1, BS2, PM2 prediction for sequence variants.
- class src.seqvar.auto_pm2_ba1_bs1_bs2.AutoPM2BA1BS1BS2[source]#
Bases:
AutoACMGHelperClass for PM2, BA1, BS1, BS2 prediction.
- _ba1_exception(seqvar: SeqVar) bool[source]#
Check the exception for BA1 criteria, specified by VCEP modification. If the variant in the exception list, return True.
- Parameters:
seqvar – The sequence variant.
- Returns:
True if the variant is in exception list.
- Return type:
bool
- _bs2_not_applicable(var_data: AutoACMGSeqVarData) bool[source]#
Check if the BS2 criteria is not applicable.
Per default, the BS2 criteria is applicable. Only some specific VCEP modifications can exclude the BS2 criteria.
- Parameters:
seqvar – The sequence variant.
- Returns:
True if the BS2 criteria is not applicable.
- Return type:
bool
- _check_zyg(seqvar: SeqVar, var_data: AutoACMGSeqVarData) bool[source]#
Check the zygosity of the sequence variant.
If the variant is mitochondrial, it is not considered for BS2 criteria. Otherwise, parse the allele condition and check the zygosity: If the variant is on X chromosome, check the allele count for XX and XY as follows: - for dominant: XX allele count - 2 * XX nhomalt + XY allele count > 2 - for recessive: XX nhomalt + XY nhomalt > 2 - for dominant/recessive: XX allele count - 2 * XX nhomalt + XY allele count > 2 and XX nhomalt + XY nhomalt > 2 If the variant is on autosomal chromosomes, check the allele count as follows: - for dominant: allele count - 2 * nhomalt > 5 - for recessive: nhomalt > 5 - for dominant/recessive: allele count - 2 * nhomalt > 5 and nhomalt > 5 Return True if the variant is in a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder (condition is met).
- Parameters:
variant_data – The variant data.
- Returns:
True if the variant is recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder.
- Return type:
bool
- _get_af(seqvar: SeqVar, var_data: AutoACMGSeqVarData) float | None[source]#
Get the allele frequency for the sequence variant.
- Parameters:
seqvar – The sequence variant.
variant_data – The variant data.
- Returns:
The allele frequency. None if no controls data
- Return type:
Optional[float]
- _get_allele_cond(seqvar: SeqVar) AlleleCondition[source]#
Get the allele condition for the sequence variant.
Get the Clingen dosage for the gene from the gene transcript data (mehari). If the Clingen dosage is unknown, try Decipher and Domino scores. Compare the scores to specific thresholds to determine the allele condition.
- Parameters:
seqvar – The sequence variant.
- Returns:
The allele condition.
- Return type:
AlleleCOndition
- _get_any_af(var_data: AutoACMGSeqVarData) AlleleCount | None[source]#
Get the highest allele frequency information for any population. The control group has priority.
- Parameters:
var_data – The variant data.
- Returns:
The highest allele frequency for any population. None if no data found.
- Return type:
Optional[AlleleCount]
- _get_control_af(var_data: AutoACMGSeqVarData) AlleleCount | None[source]#
Get the allele frequency information for the control population.
- Parameters:
var_data – The variant data.
- Returns:
The allele frequency for the control population. None if no data found.
- Return type:
Optional[AlleleCount]
- _get_m_af(var_data: AutoACMGSeqVarData) float | None[source]#
Get the allele frequency for the mitochondrial sequence variant.
- Parameters:
variant_data – The variant data.
- Returns:
The allele frequency. None if no controls data
- Return type:
Optional[float]
- annonars_client#
Annonars client for the API.
- comment_pm2ba1bs1bs2: str#
comment_pm2ba1bs1bs2 to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_pm2ba1bs1bs2(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[AutoACMGCriteria, AutoACMGCriteria, AutoACMGCriteria, AutoACMGCriteria][source]#
Predict PM2, BA1, BS1, BS2 criteria.
- prediction_pm2ba1bs1bs2: PM2BA1BS1BS2 | None#
Prediction result.
- verify_pm2ba1bs1bs2(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PM2BA1BS1BS2 | None, str][source]#
Predicts the PM2, BA1, BS1, BS2 criteria for the sequence variant.
Predict criteria by checking the allele frequency data and comparing it to the thresholds. Assign PM2, BA1 and BS1 criteria based on the allele frequency data. For BS2 criteria, check zygosity of the variant.
Note
Rules: PM2: Absent from controls allele frequency data.
BA1: Allele frequency is >5%.
BS1: Allele frequency is between 1% and 5%.
BS2: Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age.
- Parameters:
seqvar – The sequence variant.
var_data – The variant data.
- Returns:
The prediction result and the explanation.
- Return type:
Tuple[Optional[PM2BA1BS1BS2], str]
PM4 and BP3#
Implementation of PM4 and BP3 rules for sequence variants.
- class src.seqvar.auto_pm4_bp3.AutoPM4BP3[source]#
Bases:
AutoACMGHelperClass for PM4 and BP3 prediction.
- _bp3_not_applicable(seqvar: SeqVar, var_data: AutoACMGSeqVarData) bool[source]#
Check if BP3 is not applicable for the variant.
- Parameters:
seqvar – Sequence variant.
var_data – The variant information.
- Returns:
True if BP3 is not applicable, False otherwise.
- Return type:
bool
- _in_repeat_region(seqvar: SeqVar) bool[source]#
Check if the variant is in a repeat region using the RepeatMasker track.
- Parameters:
seqvar – Sequence variant.
- Returns:
True if the variant is in a repeat region, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If tabix fails to query the RepeatMasker track.
- _is_stop_loss(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is a stop-loss.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a stop-loss variant, False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_pm4bp3: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- is_inframe_delins(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is an in-frame deletion/insertion.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is an in-frame deletion/insertion, False otherwise.
- Return type:
bool
- predict_pm4bp3(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[AutoACMGCriteria, AutoACMGCriteria][source]#
Predict PM4 and BP3 criteria.
- prediction_pm4bp3: PM4BP3 | None#
Prediction result.
- verify_pm4bp3(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PM4BP3 | None, str][source]#
Predicts PM4 and BP3 criteria for the provided sequence variant.
Implementation of the rule: - If the variant is a stop-loss variant, PM4 is True and BP3 is False. - If the variant is an in-frame deletion/insertion: - If the variant is not in a repeat region, PM4 is True and BP3 is False. - If the variant is in a repeat region, PM4 is False and BP3 is True. - Otherwise, PM4 and BP3 are False.
Note
Rules: PM4: Protein length changes due to in-frame deletions/insertions in a non-repeat region or stop-loss variants. BP3: In-frame deletions/insertions in a repetitive region without a known function.
- Parameters:
seqvar – Sequence variant.
var_data – The variant information
- Returns:
Prediction result and comment.
- Return type:
Tuple[Optional[PM4BP3], str]
PP2 and BP1#
Implementation of PP2 and BP1 criteria.
- class src.seqvar.auto_pp2_bp1.AutoPP2BP1[source]#
Bases:
AutoACMGHelperClass for PP2 and BP1 prediction.
- _get_missense_vars(seqvar: SeqVar, start_pos: int, end_pos: int) Tuple[int, int, int][source]#
Counts pathogenic, benign, and total missense variants in the specified range.
The method retrieves variants from the specified range and iterates through the ClinVar data of each variant to count the number of pathogenic variants, benign variants, and the total number of missense variants.
- Parameters:
seqvar – The sequence variant being analyzed.
start_pos – The start position of the range.
end_pos – The end position of the range.
- Returns:
The number of pathogenic variants, benign variants, and the total number of missense variants.
- Return type:
Tuple[int, int, int]
- Raises:
AlgorithmError – If end position is less than the start position.
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _is_missense(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant is a missense variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a missense variant, False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_pp2bp1: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_pp2bp1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[AutoACMGCriteria, AutoACMGCriteria][source]#
Predict PP2 and BP1 criteria,
- prediction_pp2bp1: PP2BP1 | None#
Prediction result.
- verify_pp2bp1(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PP2BP1 | None, str][source]#
Predict PP2 and BP1 criteria.
The method verifies the PP2 and BP1 criteria for the provided sequence variant. It checks if the variant is a missense variant and assigns PP2 and BP1 based on the missense Z-score. If the Z-score is not available, the method counts the pathogenic and benign missense variants in the range of the variant and assigns PP2 and BP1 based on the ratio of pathogenic and benign variants.
- Parameters:
seqvar – The sequence variant being analyzed.
var_data – The variant information.
- Returns:
The prediction result and the explanation.
- Return type:
Tuple[Optional[PP2BP1], str]
PP3 and BP4#
Implementation of the PP3 and BP4 criteria.
- class src.seqvar.auto_pp3_bp4.AutoPP3BP4[source]#
Bases:
AutoACMGHelperClass for PP3 and BP4 prediction.
- _affect_spliceAI(var_data: AutoACMGSeqVarData) bool[source]#
Predict splice site alterations using SpliceAI.
If any of SpliceAI scores are greater than specific thresholds, the variant is considered to affect splicing.
- Parameters:
var_data – The data containing variant scores and thresholds.
- Returns:
True if the variant is a splice site alteration, False otherwise.
- Return type:
bool
- _is_benign_score(var_data: AutoACMGSeqVarData, *score_threshold_pairs: Tuple[str, float]) bool[source]#
Check if any of the specified scores meet their corresponding threshold.
- Parameters:
var_data – Variant data containing scores and thresholds.
score_threshold_pairs – Pairs of score attributes and their corresponding benign
thresholds.
- Returns:
True if any of the specified scores meet their corresponding threshold, False otherwise.
- Return type:
bool
- _is_benign_splicing(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant is benign based on splicing scores.
Checks if the Ada and RF scores are less than the thresholds.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is benign, False otherwise.
- Return type:
bool
- Raises:
MissingDataError – If the Ada and RF scores are missing.
- _is_inframe_indel(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is an inframe indel.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is an inframe indel, False otherwise.
- Return type:
bool
- _is_intron_variant(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is an intron variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is an intron variant, False otherwise.
- Return type:
bool
- _is_missense_variant(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is a missense variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a missense variant, False otherwise.
- Return type:
bool
- _is_pathogenic_score(var_data: AutoACMGSeqVarData, *score_threshold_pairs: Tuple[str, float]) bool[source]#
Check if any of the specified scores meet their corresponding threshold.
- Parameters:
var_data – Variant data containing scores and thresholds.
score_threshold_pairs – Pairs of score attributes and their corresponding pathogenic
thresholds.
- Returns:
True if any of the specified scores meet their corresponding threshold, False otherwise.
- Return type:
bool
- _is_pathogenic_splicing(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant is pathogenic based on splicing scores.
Checks if the Ada and RF scores are greater than the thresholds.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is pathogenic, False otherwise.
- Return type:
bool
- Raises:
MissingDataError – If the Ada and RF scores are missing.
- _is_splice_variant(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is a splice related.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a splice variant, False otherwise.
- Return type:
bool
- _is_synonymous_variant(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is a synonymous variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is a synonymous variant, False otherwise.
- Return type:
bool
- _is_utr_variant(var_data: AutoACMGSeqVarData) bool[source]#
Check if the variant’s consequence is an UTR variant.
- Parameters:
var_data – The variant information.
- Returns:
True if the variant is an UTR variant, False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_pp3bp4: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_pp3bp4(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[AutoACMGCriteria, AutoACMGCriteria][source]#
Predict PP3 and BP4 criteria.
- prediction_pp3bp4: PP3BP4 | None#
Prediction result.
- verify_pp3bp4(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[PP3BP4 | None, str][source]#
Predict PP3 and BP4 criteria.
The method checks the variant’s pathogenicity based on the provided scores and thresholds. First of all it checks the pathogenic and benign scores against the thresholds if the default strategy is used. Otherwise, it checks the pathogenic and benign scores against the specified thresholds and then checks the splicing scores. If the variant is a splice site alteration or has the pathogenic score, the variant is considered pathogenic. If the variant doesn’t affect splicing and has the benign score, the variant is considered benign.
Note
The non-default assesment strategy is used for some VCEPs.
- Parameters:
seqvar – Sequence variant.
var_data – The variant information.
- Returns:
The prediction result and the comment.
- Return type:
Tuple[Optional[PP3BP4], str]
BP7#
Implementation of BP7 criteria.
- class src.seqvar.auto_bp7.AutoBP7[source]#
Bases:
AutoACMGHelperClass for BP7 prediction.
- _affect_canonical_ss(seqvar: SeqVar, var_data: AutoACMGSeqVarData) bool[source]#
Predict if the variant affects canonical splice site.
Check if the variant position is within +1/-2 of the start/end of an intron. Note, that for the minus strand, the donor site is at the start of the intron and the acceptor site is at the end of the intron.
- Parameters:
seqvar – The sequence variant.
var_data – The variant data.
- Returns:
True if the variant affects canonical splice site, False otherwise.
- Return type:
bool
- Raises:
MissingDataError – If the strand information is missing.
- _is_bp7_exception(seqvar: SeqVar, var_data: AutoACMGSeqVarData) bool[source]#
Help function to check if the variant is an exception.
Per default there are no exceptions for BP7. This function can be overridden in the special VCEP implementations.
- Parameters:
seqvar – The sequence variant.
var_data – The variant data.
- Returns:
True if the variant is an exception, False otherwise.
- Return type:
bool
- _is_conserved(var_data: AutoACMGSeqVarData) bool[source]#
Predict if the variant is conserved.
Check if the variant is conserved using the phyloP100 score.
- Parameters:
variant_info – The variant information.
- Returns:
True if the variant is not conserved, False otherwise.
- Return type:
bool
- _is_intronic(var_data: AutoACMGSeqVarData) bool[source]#
Predict if the variant is intronic.
Check if the variant’s consequence is intronic.
- Parameters:
variant_info – The variant information.
- Returns:
True if the variant is intronic, False otherwise.
- Return type:
bool
- _is_synonymous(var_data: AutoACMGSeqVarData) bool[source]#
Predict if the variant is synonymous.
Check if the variant’s consequence is synonymous.
- Parameters:
variant_info – The variant information.
- Returns:
True if the variant is synonymous, False otherwise.
- Return type:
bool
- _spliceai_impact(var_data: AutoACMGSeqVarData) bool[source]#
Predict splice site alterations using SpliceAI.
If any of SpliceAI scores are greater than specific thresholds, the variant is predicted to affect splicing.
- Parameters:
var_data – The data containing variant scores and thresholds.
- Returns:
True if the variant is a splice site alteration, False otherwise.
- Return type:
bool
- annonars_client#
Annonars client for the API.
- comment_bp7: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- predict_bp7(seqvar: SeqVar, var_data: AutoACMGSeqVarData) AutoACMGCriteria[source]#
Predict BP7 criterion.
- prediction_bp7: BP7 | None#
Prediction result.
- verify_bp7(seqvar: SeqVar, var_data: AutoACMGSeqVarData) Tuple[BP7 | None, str][source]#
Predict BP7 criterion.
Predict if the variant meets the BP7 criterion by: - Checking if the variant is in the mitochondrial genome. If so, BP7 is not met. - Checking if the variant is synonymous and not conserved and not predicted to affect splicing or intronic variant that does not affect canonical splice site and also not conserved and not predicted to affect splicing. If so, BP7 is met.
Note
Some VCEP implementations might have exceptions for BP7. In that case, the prediction avoids the default checks and returns that BP7 is not met.
- Parameters:
seqvar – The sequence variant.
var_data – The variant data.
- Returns:
The prediction and the comment.
- Return type:
Tuple[Optional[BP7], str]