Usage

Command Line Interface

HmtNote can be used as a command line tool, using the annotate command and providing the input VCF file name and the file name or path where the annotated VCF will be saved:

hmtnote annotate input.vcf annotated.vcf

By default, HmtNote will annotate the VCF file using all four groups of annotations (basic, cross-reference, variability and predictions). If desired, you can select which specific annotation you want, using respectively --basic, --crossref, --variab and --predict (or -b, -c, -v, -p), or any combination of these options:

hmtnote annotate input.vcf annotated_basic.vcf --basic
hmtnote annotate input.vcf annotated_crossreferences.vcf --crossref
hmtnote annotate input.vcf annotated_variability.vcf --variab
hmtnote annotate input.vcf annotated_predictions.vcf --predict
hmtnote annotate input.vcf annotate_basic_variability.vcf --basic --variab

It is also possible to convert the resulting annotated VCF file to CSV format, for a simpler visual inspection of the data, by simply specifying the --csv option (please note that an output VCF file name must be provided):

hmtnote annotate input.vcf annotated.vcf --csv

An additional annotated.csv file will be created in the same directory of annotated.vcf.

By default, HmtNote works by pulling the required data from HmtVar on the fly, but if you’re planning to annotate VCF files offline, first download the annotation database using the dump command:

hmtnote dump

After that, HmtNote is capable of working even when no internet connection is available; this can be achieved using the --offline option after the usual annotation command:

hmtnote annotate input.vcf annotated.vcf --offline
hmtnote annotate input.vcf annotated_variability.vcf --variab --offline

HmtNote will look for data in the dumped database, which was saved as hmtnote_dump.pkl, to perform annotations. PLEASE NOTE: when working in *online mode*, HmtNote will retrieve from HmtVar only those entries that correspond to variants contained in the input VCF file; the ``dump`` command, instead, downloads the entire HmtVar database (actually just the subset used by HmtNote) to the local disk. Although this local database is not bigger than a few dozen MB, the download process may take a while.

PLEASE NOTE: data in HmtVar is subject to frequent updates, so please remember to run ``hmtnote dump`` as frequently as possible to be sure you’re working with the latest data. Use the offline mode at your own risk.

Python Module

HmtNote can also be imported in a Python script and its function annotate_vcf() can be used to annotate a given VCF:

from hmtnote import annotate_vcf
annotate("input.vcf", "annotated.vcf")

By default, annotate_vcf() will annotate the VCF using all four groups of annotations (basic, cross-reference, variability and predictions). If desired, you can specify which kind of annotation you want, using respectively the basic=True, crossref=True, variab=True, predict=True arguments, or any combination of them:

annotate("input.vcf", "annotated_basic.vcf", basic=True)
annotate("input.vcf", "annotated_crossreferences.vcf", crossref=True)
annotate("input.vcf", "annotated_variability.vcf", variab=True)
annotate("input.vcf", "annotated_predictions.vcf", predict=True)
annotate("input.vcf", "annotate_basic_variability.vcf", basic=True, variab=True)

An additional annotated CSV can be produced from the output VCF using the csv=True argument:

annotate("input.vcf", "annotated.vcf", csv=True)

If you want to work offline, HmtNote offers an offline mode, that will download the annotation database so that it can be used when no internet connection is available. The dump() function allows to download the local HmtNote database:

from hmtnote import dump
dump()

Now it is possible to perform offline annotation of VCF files, by simply adding the offline=True argument to the usual annotation function:

annotate("input.vcf", "annotated.vcf", offline=True)
annotate("input.vcf", "annotated_variability.vcf, variab=True, offline=True)

Please read above for potential limitations of the offline mode.

Annotations

HmtNote offers several annotations, grouped for simplicity into basic, cross-reference, variability and predictions, depending on the type of information they provide.

Basic

Basic information about the variant; they include:

  • Locus: Locus to which the variant belongs

  • AaChange: Aminoacidic change determined

  • Pathogenicity: Pathogenicity predicted by HmtVar

  • DiseaseScore: Disease score calculated by HmtVar

  • HmtVar: HmtVar ID of the variant (can be used to view the related VariantCard on https://www.hmtvar.uniba.it/varCard/<HmtVarID>)

Cross-reference

Cross-reference information about the variant; they include:

  • Clinvar: Clinvar ID of the variant

  • dbSNP: dbSNP ID of the variant

  • OMIM: OMIM ID of the variant

  • MitomapAssociatedDiseases: Diseases associated to the variant according to Mitomap

  • MitomapSomaticMutations: Diseases associated to the variant according to Mitomap Somatic Mutations

  • MitomapHeteroplasmy: The variant was found as heteroplasmic in Mitomap datasets

  • MitomapHomoplasmy: The variant was found as homoplasmic in Mitomap datasets

  • SomaticMutationsHeteroplasmy: The variant was found as heteroplasmic in Mitomap Somatic Mutations datasets

  • SomaticMutationsHomoplasmy: The variant was found as homoplasmic in Mitomap Somatic Mutations datasets

  • 1KGenomesHeteroplasmy: The variant was found as heteroplasmic in 1KGenomes datasets

  • 1KGenomesHomoplasmy: The variant was found as homoplasmic in 1KGenomes datasets

Variability

Variability and allele frequency data about the variant; they include:

  • NtVarH: Nucleotide variability of the position in healthy individuals

  • NtVarP: Nucleotide variability of the position in patient individuals

  • AaVarH: Aminoacid variability of the position in healthy individuals

  • AaVarP: Aminoacid variability of the position in patient individuals

  • AlleleFreqH: Allele frequency of the variant in healthy individuals overall

  • AlleleFreqP: Allele frequency of the variant in patient individuals overall

  • AlleleFreqH_AF: Allele frequency of the variant in healthy individuals from Africa

  • AlleleFreqP_AF: Allele frequency of the variant in patient individuals from Africa

  • AlleleFreqH_AM: Allele frequency of the variant in healthy individuals from America

  • AlleleFreqP_AM: Allele frequency of the variant in patient individuals from America

  • AlleleFreqH_AS: Allele frequency of the variant in healthy individuals from Asia

  • AlleleFreqP_AS: Allele frequency of the variant in patient individuals from Asia

  • AlleleFreqH_EU: Allele frequency of the variant in healthy individuals from Europe

  • AlleleFreqP_EU: Allele frequency of the variant in patient individuals from Europe

  • AlleleFreqH_OC: Allele frequency of the variant in healthy individuals from Oceania

  • AlleleFreqP_OC: Allele frequency of the variant in patient individuals from Oceania

Predictions

Pathogenicity prediction information of the variant from external resources; they include:

  • MutPred_Prediction: Pathogenicity prediction offered by MutPred

  • MutPred_Probability: Confidence of the pathogenicity prediction offered by MutPred

  • Panther_Prediction: Pathogenicity prediction offered by Panther

  • Panther_Probability: Confidence of the pathogenicity prediction offered by Panther

  • PhDSNP_Prediction: Pathogenicity prediction offered by PhD SNP

  • PhDSNP_Probability: Confidence of the pathogenicity prediction offered by PhD SNP

  • SNPsGO_Prediction: Pathogenicity prediction offered by SNPs & GO

  • SNPsGO_Probability: Confidence of the pathogenicity prediction offered by SNPs & GO

  • Polyphen2HumDiv_Prediction: Pathogenicity prediction offered by Polyphen2 HumDiv

  • Polyphen2HumDiv_Probability: Confidence of the pathogenicity prediction offered by Polyphen2 HumDiv

  • Polyphen2HumVar_Prediction: Pathogenicity prediction offered by Polyphen2 HumVar

  • Polyphen2HumVar_Probability: Confidence of the pathogenicity prediction offered by Polyphen2 HumVar