Mutations in half or more of sequences for pango-designated SARS-CoV-2 lineages
These files provide the mutations in half or more of sequences for pango-designated SARS-CoV-2 lineages as computed from the Nextstrain SARS-CoV-2 GenBank metadata.
Mutations are reported as shown by the Nextstrain metadata pipeline except that RNA nucleotide mutations are prefixed by "Nuc:".
Also, when an RNA nucleotide mutation produces a non-synonymous amino acid residue mutation, both mutations should be listed. The exceptions will be when the Nextstrain pipeline does not report mutations for the protein as happens with ORF2b, ORF3b, ORF3c, ORF3d, ORF9c, ORF10, and N*. N* evolved during the pandemic, and whether or not the others are even expressed has been more or less disputed despite publications on the functions of at least some of them.
There files are working intermediates generated by my emerging lineage search code. In essence, these are lists of mutations that my code ignores as uninteresting when they occur in a lineage.
Sequences are assigned to lineages either due to name matches between the Nextstrain SARS-CoV-2 GenBank metadata file and the pango-designation lineages.csv file or due to the lineage called in the Nextstrain SARS-COV-2 GenBank metadata file. There is a possibility that some samples may be assigned to two different lineages due to batching to control job size, but this would only occur if Nextstrain calls a lineage different from the designation in the master lineages.csv file.
There may be also be lineages with no mutations listed. This will occur for small lineages with no representation in the GenBank data.
Note that when a mutation that evolves after lineage designation comes to be present in the majority of sequences for that lineages, it will be added to that lineage's file here.
To manage the number of entries in a single directory, the mutation files are in a path hierarchy determined by their lineage designation. Each letter gives a level, and each dot number gives a level. So XBB.1.9.2's mutations are in X/B/B/1/9/2/XBB.1.9.2-muts.txt relative to the repo base path. This may seem inconvenient, but it should avoid file system performance issues.
I can make no warranty on the accuracy of this data. Use at your own risk.