Implementation of different ACMG criteria for structural variants#
This section contains the internal building blocks for the implementation of the different ACMG criteria for structural variants.
PVS1#
For structural variants:
PVS1 criteria for Structural Variants (StrucVar).
- class src.strucvar.auto_pvs1.AutoPVS1[source]#
Bases:
StrucVarHelperHandles the PVS1 criteria assesment for structural variants.
- _calc_cds(exons: List[Exon], strand: GenomicStrand, start_codon: int, stop_codon: int) List[Exon]#
Remove UTRs from exons.
- Parameters:
exons – List of exons for the gene.
strand – The genomic strand of the gene.
start_codon – Position of the start codon.
stop_codon – Position of the stop codon.
- Returns:
List of exons without UTRs.
- Return type:
List[Exon]
- Raises:
MissingDataError – If the genomic strand is not set.
- _count_lof_vars(strucvar: StrucVar) Tuple[int, int]#
Counts Loss-of-Function (LoF) variants within the range of a structural variant.
The method retrieves variants from the range defined by the structural variant’s start and stop positions and iterates through the available data of each variant to count the number of LoF variants and the number of frequent LoF variants, based on the gnomAD genomes data ( for the consequences of Nonsense and Frameshift variants) and the allele frequency (for the frequency of the LoF variants in the general population).
Note
A LoF variant is considered frequent if its occurrence in the general population exceeds a threshold of 0.1%.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
The number of frequent LoF variants and the total number of LoF variants.
- Return type:
Tuple[int, int]
- Raises:
AlgorithmError – If the end position is less than the start position.
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _count_pathogenic_vars(strucvar: StrucVar) Tuple[int, int]#
Counts pathogenic variants in the range specified by the structural variant.
The method retrieves variants from the range defined by the structural variant’s start and stop positions and iterates through the ClinVar data of each variant to count the number of pathogenic variants and the total number of variants. The method considers a variant pathogenic if its classification is “Pathogenic” or “Likely pathogenic”.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
The number of pathogenic variants and the total number of variants.
- Return type:
Tuple[int, int]
- Raises:
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _minimal_deletion(strucvar: StrucVar, exons: List[Exon]) bool#
Check if the variant is a minimal deletion. A minimal deletion affects at least one full exon.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
- Returns:
True if the deletion affects at least one full exon, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the variant is not a deletion.
MissingDataError – If exons are not available.
- annonars_client#
Annonars client for the API.
- comment_pvs1: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- crit4prot_func(strucvar: StrucVar) bool#
Check if the deletion is critical for protein function.
This method is implemented by fetching variants from the start to the end of the structural variant, then counting the number of pathogenic variants in that region, by iterating through the clinvar data of each variant. Consider the region critical if the frequency of pathogenic variants exceeds 5%.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
True if the deletion is critical for protein function, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the API response is invalid or cannot be processed.
- del_disrupt_rf(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand) bool#
Check if the single or multiple exon deletion disrupts the reading frame.
Find the start and end positions of alteration based on the affected exon(s). If the positions lie within the intron(s) of the affected exon(s), the deletion does not disrupt the reading frame. Otherwise, there’re two cases: - Check if the deletion starts within an exon. If so, check if the offset from the start of the exon to the start of the deletion is a multiple of 3. If so, the deletion does not disrupt the reading frame. - Check if the deletion stops within an exon. If so, check if the offset from the start of the last affected exon to the stop of the deletion is a multiple of 3. If so, the deletion does not disrupt the reading frame.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
strand – The genomic strand of the variant.
- Returns:
True if the deletion disrupts the reading frame, False otherwise.
- Raises:
MissingDataError – If exons or strand are not available.
AlgorithmError – Less than 1 full exon affected.
- static dup_disrupt_rf() bool#
Check if the duplication disrupts the reading frame. NOT IMPLEMENTED!
- full_gene_del(strucvar: StrucVar, exons: List[Exon]) bool#
Check if the variant is a full gene deletion. The deletion affects the whole gene if the start position of the deletion is less than or equal to the start of the first exon and the stop position of the deletion is greater than or equal to the end of the last exon.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
- Returns:
True if the variant is a full gene deletion, False otherwise.
- Return type:
bool
- Raises:
MissingDataError – If exons are not available.
- in_bio_relevant_tsx(transcript_tags: List[str]) bool#
Check if the deletion is in a biologically relevant transcript. Check if the transcript has a MANE Select tag.
- Parameters:
transcript_tags – The tags of the transcript.
- Returns:
True if the deletion is in a biologically relevant transcript, False otherwise.
- Return type:
bool
- lof_freq_in_pop(strucvar: StrucVar) bool#
Checks if the Loss-of-Function (LoF) variants within the structural variant are frequent in the general population.
This function determines the frequency of LoF variants within the range specified by the structural variant and evaluates whether this frequency exceeds a defined threshold indicative of common occurrence in the general population.
Implementation of the rule: - Retrieving the number of LoF variants and frequent LoF variants in the range defined by the structural variant. - Considering the LoF variants frequent in the general population if the frequency of “frequent” LoF variants exceeds 10%.
Note: A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 10% to determine if the LoF variant is frequent.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
True if the LoF variant frequency is greater than 10%, False otherwise.
- Return type:
bool
- Raises:
AlgoritmError – If the API response is invalid or cannot be processed.
- lof_rm_gt_10pct_of_prot(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand, start_codon: int, stop_codon: int) bool#
Determine if the deletion removes more than 10% of the protein-coding sequence.
First remove the UTRs from the exons. Then iterate through the CDS exons and calculate the total CDS length and the length of the deleted region. Return True if the deletion removes more than 10% of the protein-coding sequence, False otherwise.
- Parameters:
strucvar – The structural variant being analyzed.
exons – List of exons for the gene.
strand – The genomic strand of the gene.
start_codon – Position of the start codon.
stop_codon – Position of the stop codon.
- Returns:
True if the deletion removes more than 10% of the protein, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the total CDS length is zero.
- predict_pvs1(strucvar: StrucVar, var_data: AutoACMGStrucVarData) AutoACMGCriteria[source]#
Predict the PVS1 criteria.
- static presumed_in_tandem() bool#
Check if the duplication is presumed in tandem. NOT IMPLEMENTED!
- static proven_in_tandem() bool#
Check if the duplication is proven in tandem. NOT IMPLEMENTED!
- undergo_nmd(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand) bool#
Check if the variant undergoes NMD.
Check if the whole deletion affects only the last exon and 50 base pairs of the penultimate exon. If so, the variant does not undergo NMD.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
strand – The genomic strand of the variant.
- Returns:
True if the variant undergoes NMD, False otherwise.
- Raises:
MissingDataError – If exons or strand are not available.
AlgorithmError – If less than 2 exons are available.
- verify_pvs1(strucvar: StrucVar, var_data: AutoACMGStrucVarData) Tuple[PVS1Prediction, PVS1PredictionStrucVarPath, str][source]#
Make the PVS1 prediction.
The prediction is based on the PVS1 decision tree for structural variants.
- Parameters:
strucvar – The structural variant.
var_data – The variant information.
- Returns:
The prediction, prediction path, and the comment.
- Return type:
Tuple[PVS1Prediction, PVS1PredictionStrucVarPath, str]
- class src.strucvar.auto_pvs1.StrucVarHelper[source]#
Bases:
AutoACMGHelperHelper methods for PVS1 criteria for Structural Variants (StrucVar).
- _calc_cds(exons: List[Exon], strand: GenomicStrand, start_codon: int, stop_codon: int) List[Exon][source]#
Remove UTRs from exons.
- Parameters:
exons – List of exons for the gene.
strand – The genomic strand of the gene.
start_codon – Position of the start codon.
stop_codon – Position of the stop codon.
- Returns:
List of exons without UTRs.
- Return type:
List[Exon]
- Raises:
MissingDataError – If the genomic strand is not set.
- _count_lof_vars(strucvar: StrucVar) Tuple[int, int][source]#
Counts Loss-of-Function (LoF) variants within the range of a structural variant.
The method retrieves variants from the range defined by the structural variant’s start and stop positions and iterates through the available data of each variant to count the number of LoF variants and the number of frequent LoF variants, based on the gnomAD genomes data ( for the consequences of Nonsense and Frameshift variants) and the allele frequency (for the frequency of the LoF variants in the general population).
Note
A LoF variant is considered frequent if its occurrence in the general population exceeds a threshold of 0.1%.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
The number of frequent LoF variants and the total number of LoF variants.
- Return type:
Tuple[int, int]
- Raises:
AlgorithmError – If the end position is less than the start position.
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _count_pathogenic_vars(strucvar: StrucVar) Tuple[int, int][source]#
Counts pathogenic variants in the range specified by the structural variant.
The method retrieves variants from the range defined by the structural variant’s start and stop positions and iterates through the ClinVar data of each variant to count the number of pathogenic variants and the total number of variants. The method considers a variant pathogenic if its classification is “Pathogenic” or “Likely pathogenic”.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
The number of pathogenic variants and the total number of variants.
- Return type:
Tuple[int, int]
- Raises:
InvalidAPIResposeError – If the API response is invalid or cannot be processed.
- _minimal_deletion(strucvar: StrucVar, exons: List[Exon]) bool[source]#
Check if the variant is a minimal deletion. A minimal deletion affects at least one full exon.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
- Returns:
True if the deletion affects at least one full exon, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the variant is not a deletion.
MissingDataError – If exons are not available.
- annonars_client#
Annonars client for the API.
- comment_pvs1: str#
Comment to store the prediction explanation.
- config: Config#
Configuration settings.
- crit4prot_func(strucvar: StrucVar) bool[source]#
Check if the deletion is critical for protein function.
This method is implemented by fetching variants from the start to the end of the structural variant, then counting the number of pathogenic variants in that region, by iterating through the clinvar data of each variant. Consider the region critical if the frequency of pathogenic variants exceeds 5%.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
True if the deletion is critical for protein function, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the API response is invalid or cannot be processed.
- del_disrupt_rf(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand) bool[source]#
Check if the single or multiple exon deletion disrupts the reading frame.
Find the start and end positions of alteration based on the affected exon(s). If the positions lie within the intron(s) of the affected exon(s), the deletion does not disrupt the reading frame. Otherwise, there’re two cases: - Check if the deletion starts within an exon. If so, check if the offset from the start of the exon to the start of the deletion is a multiple of 3. If so, the deletion does not disrupt the reading frame. - Check if the deletion stops within an exon. If so, check if the offset from the start of the last affected exon to the stop of the deletion is a multiple of 3. If so, the deletion does not disrupt the reading frame.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
strand – The genomic strand of the variant.
- Returns:
True if the deletion disrupts the reading frame, False otherwise.
- Raises:
MissingDataError – If exons or strand are not available.
AlgorithmError – Less than 1 full exon affected.
- static dup_disrupt_rf() bool[source]#
Check if the duplication disrupts the reading frame. NOT IMPLEMENTED!
- full_gene_del(strucvar: StrucVar, exons: List[Exon]) bool[source]#
Check if the variant is a full gene deletion. The deletion affects the whole gene if the start position of the deletion is less than or equal to the start of the first exon and the stop position of the deletion is greater than or equal to the end of the last exon.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
- Returns:
True if the variant is a full gene deletion, False otherwise.
- Return type:
bool
- Raises:
MissingDataError – If exons are not available.
- in_bio_relevant_tsx(transcript_tags: List[str]) bool[source]#
Check if the deletion is in a biologically relevant transcript. Check if the transcript has a MANE Select tag.
- Parameters:
transcript_tags – The tags of the transcript.
- Returns:
True if the deletion is in a biologically relevant transcript, False otherwise.
- Return type:
bool
- lof_freq_in_pop(strucvar: StrucVar) bool[source]#
Checks if the Loss-of-Function (LoF) variants within the structural variant are frequent in the general population.
This function determines the frequency of LoF variants within the range specified by the structural variant and evaluates whether this frequency exceeds a defined threshold indicative of common occurrence in the general population.
Implementation of the rule: - Retrieving the number of LoF variants and frequent LoF variants in the range defined by the structural variant. - Considering the LoF variants frequent in the general population if the frequency of “frequent” LoF variants exceeds 10%.
Note: A LoF variant is considered frequent if its occurrence in the general population exceeds some threshold. We use a threshold of 10% to determine if the LoF variant is frequent.
- Parameters:
strucvar – The structural variant being analyzed.
- Returns:
True if the LoF variant frequency is greater than 10%, False otherwise.
- Return type:
bool
- Raises:
AlgoritmError – If the API response is invalid or cannot be processed.
- lof_rm_gt_10pct_of_prot(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand, start_codon: int, stop_codon: int) bool[source]#
Determine if the deletion removes more than 10% of the protein-coding sequence.
First remove the UTRs from the exons. Then iterate through the CDS exons and calculate the total CDS length and the length of the deleted region. Return True if the deletion removes more than 10% of the protein-coding sequence, False otherwise.
- Parameters:
strucvar – The structural variant being analyzed.
exons – List of exons for the gene.
strand – The genomic strand of the gene.
start_codon – Position of the start codon.
stop_codon – Position of the stop codon.
- Returns:
True if the deletion removes more than 10% of the protein, False otherwise.
- Return type:
bool
- Raises:
AlgorithmError – If the total CDS length is zero.
- static presumed_in_tandem() bool[source]#
Check if the duplication is presumed in tandem. NOT IMPLEMENTED!
- static proven_in_tandem() bool[source]#
Check if the duplication is proven in tandem. NOT IMPLEMENTED!
- undergo_nmd(strucvar: StrucVar, exons: List[Exon], strand: GenomicStrand) bool[source]#
Check if the variant undergoes NMD.
Check if the whole deletion affects only the last exon and 50 base pairs of the penultimate exon. If so, the variant does not undergo NMD.
- Parameters:
strucvar – The structural variant.
exons – The exons of the gene.
strand – The genomic strand of the variant.
- Returns:
True if the variant undergoes NMD, False otherwise.
- Raises:
MissingDataError – If exons or strand are not available.
AlgorithmError – If less than 2 exons are available.