Usage¶
Command Line Interface¶
allfreqs can be used as a command line tool, using the allfreqs
command and providing the input
fasta or csv file with multialigned sequences:
# multialignment in fasta format
$ allfreqs multialg_seqs.fasta
# if reference is stored separately:
$ allfreqs multialg_seqs.fasta --reference my_ref.fasta
# multialignment in csv format
# e.g. seq1,ACGTACGT
# seq2,A-CTAGGT
$ allfreqs multialg_seqs.csv
# if reference is stored separately:
$ allfreqs multialg_seqs.csv --reference my_ref.csv
The program will use the first sequence in the multialignment as the reference sequence; if this is
not the case, you can supply a reference sequence using the --reference|-r
option followed by
the fasta or csv file with the desired reference sequence to use. Please note that in this case
both multialigned sequences and reference sequence must be in the same format (both fasta or csv
files).
By default, allfreqs will add frequencies of non-standard (ambiguous) nucleotides together, showing
them in the oth
column of the output; it is possible to show them in separate columns, specific
for each of them, using the --ambiguous
flag.
allfreqs will calculate allele frequencies for each position in the multialignment and save them as
a csv file called all_freqs.csv
in the current working directory. It is possible to specify a
different output location using the --out|-o
option followed by the desired path/filename.
Python Module¶
allfreqs can be used in a Python script by importing its AlleleFreqs
class.
This class has two methods, .from_fasta()
and .from_csv()
, which can be used to load
multialignments from either fasta or csv files respectively. Both methods accept a mandatory
sequences
argument, which specifies the file containing multialigned sequences, and an optional
reference
argument, which can be used to specify the reference sequence in case it is not
reported as the first sequence in the provided sequences
file:
from allfreqs import AlleleFreqs
# multialignment in fasta format
a = AlleleFreqs.from_fasta(sequences="multialg_seqs.fasta")
# if reference is stored separately:
a = AlleleFreqs.from_fasta(sequences="multialg_seqs.fasta", reference="my_ref.fasta")
# multialignment in csv format
a = AlleleFreqs.from_csv(sequences="multialg_seqs.csv")
# if reference is stored separately:
a = AlleleFreqs.from_csv(sequences="multialg_seqs.csv", reference="my_ref.csv")
The AlleleFreqs
class has two useful properties:
df
, which returns a dataframe with sequences as rows and single positions as columns;frequencies
, which returns a dataframe with the actual allele frequencies for each position.
By default, allfreqs will add frequencies of non-standard (ambiguous) nucleotides together, showing
them in the oth
column of the output; it is possible to show them in separate columns, specific
for each of them, using the ambiguous=True
option.
These allele frequencies can be saved to a csv file using the .to_csv()
method; by default they
will be saved to a file called all_freqs.csv
in the current working directory, but this can be
overridden by providing the output_file
argument to .to_csv()
.