Package pal.alignment

Class AlignmentUtils


  • public class AlignmentUtils
    extends java.lang.Object
    Helper utilities for alignments.
    Version:
    $Id: AlignmentUtils.java,v 1.29 2004/10/14 02:01:43 matt Exp $
    Author:
    Alexei Drummond * @note
    • 14 August 2003 - Changed call to new SimpleAlignment() to reflect change in construtors (refered to not calculating frequencies but that no longer happens anyhow)
    • Constructor Detail

      • AlignmentUtils

        public AlignmentUtils()
    • Method Detail

      • report

        public static void report​(Alignment a,
                                  java.io.PrintWriter out)
        Report number of sequences, sites, and data type
      • print

        public static void print​(Alignment a,
                                 java.io.PrintWriter out)
        print alignment (default format: INTERLEAVED)
      • printPlain

        public static void printPlain​(Alignment a,
                                      java.io.PrintWriter out)
        print alignment (in plain format)
      • printPlain

        public static void printPlain​(Alignment a,
                                      java.io.PrintWriter out,
                                      boolean relaxed)
        print alignment (in plain format)
      • printSequential

        public static void printSequential​(Alignment a,
                                           java.io.PrintWriter out)
        print alignment (in PHYLIP SEQUENTIAL format)
      • printInterleaved

        public static void printInterleaved​(Alignment a,
                                            java.io.PrintWriter out)
        print alignment (in PHYLIP 3.4 INTERLEAVED format)
      • printCLUSTALW

        public static void printCLUSTALW​(Alignment a,
                                         java.io.PrintWriter out)
        Print alignment (in CLUSTAL W format)
      • getAlignedSequenceIndices

        public static final void getAlignedSequenceIndices​(Alignment a,
                                                           int i,
                                                           int[] indices,
                                                           DataType dataType,
                                                           int unknownState)
        Returns state indices for a sequence.
      • getAlignedStates

        public static final int[][] getAlignedStates​(Alignment base)
        Unknown characters are given the state of -1
      • getAlignedStates

        public static final int[][] getAlignedStates​(Alignment base,
                                                     int unknownState)
      • getAlignmentPenalty

        public static double getAlignmentPenalty​(Alignment a,
                                                 TransitionPenaltyTable penalties,
                                                 double gapCreation,
                                                 double gapExtension)
        Returns total sum of pairs alignment penalty using gap creation and extension penalties and transition penalties in the TransitionPenaltyTable provided. By default this is end-weighted.
      • getAlignmentPenalty

        public static double getAlignmentPenalty​(Alignment a,
                                                 DataType dataType,
                                                 TransitionPenaltyTable penalties,
                                                 double gapCreation,
                                                 double gapExtension,
                                                 boolean local)
        Returns total sum of pairs alignment distance using gap creation and extension penalties and transition penalties as defined in the TransitionPenaltyTable provided. Gap cost calculated as follows: given gap of length len => gapCreation + (1en-l)*gapExtension
        Parameters:
        gapCreation - the cost of the initial gap opening character
        gapExtension - the cost of the remaining gap characters
        local - true if end gaps ignored, false otherwise
      • getSuitableInstance

        public static DataType getSuitableInstance​(Alignment alignment)
        guess data type suitable for a given sequence data set
        Parameters:
        alignment - alignment
        Returns:
        suitable DataType object
      • getSuitableInstance

        public static DataType getSuitableInstance​(java.lang.String[] sequences)
        guess data type suitable for a given sequence data set
        Parameters:
        alignment - the alignment represented as an array of strings
        Returns:
        suitable DataType object
      • getSuitableInstance

        public static DataType getSuitableInstance​(char[][] sequences)
        guess data type suitable for a given sequence data set
        Parameters:
        alignment - the alignment represented as an array of strings
        Returns:
        suitable DataType object
      • estimateCodonFrequenciesF1X4

        public static double[] estimateCodonFrequenciesF1X4​(Alignment a)
        Estimate the frequencies of codons, calculated from the average nucleotide frequencies. As for CodonFreq = F1X4 (1) in PAML
        Parameters:
        a - The base alignment, will be converted to nucleotides
        Returns:
        The codon frequences as estimated by the average of nuceltoide frequences
      • estimateCodonFrequenciesF3X4

        public static double[] estimateCodonFrequenciesF3X4​(Alignment a)
        Estimate the frequencies of codons, calculated from the average nucleotide frequencies at the three codon positions. As for CodonFreq = F3X4 (2) in PAML
        Parameters:
        a - The base alignment, will be converted to nucleotides
        Returns:
        The codon frequences as estimated by the average of nuceltoide frequences
      • estimateFrequencies

        public static double[] estimateFrequencies​(Alignment a)
        count states
      • estimateTupletFrequencies

        public static double[][] estimateTupletFrequencies​(Alignment a,
                                                           int tupletSize)
        Estimates frequencies via tuplets. This is most useful for nucleotide data where the frequencies at each codon position are\ of interest (tuplet size = 3)
        Parameters:
        a - The input alignment
        tupletSize - the size of the tuplet
      • isSiteRedundant

        public static final boolean isSiteRedundant​(Alignment a,
                                                    int site)
      • removeRedundantSites

        public static final Alignment removeRedundantSites​(Alignment a)
      • isGap

        public static final boolean isGap​(Alignment a,
                                          int seq,
                                          int site)
        Returns true if the alignment has a gap at the site in the sequence specified.
      • getPositionMisalignmentInfo

        public static void getPositionMisalignmentInfo​(Alignment a,
                                                       java.io.PrintWriter out,
                                                       int startingCodonPosition,
                                                       CodonTable translator,
                                                       boolean removeIncompleteCodons)
        Parameters:
        startingCodonPosition - from {0,1,2}, representing codon position of first value in sequences...
        translator - the translator to use for converting codons to amino acids.
        removeIncompleteCodons - removes end codons that are not complete (due to startingPosition, and sequence length).
      • getPositionMisalignmentInfo

        public static void getPositionMisalignmentInfo​(Alignment a,
                                                       java.io.PrintWriter out,
                                                       int startingCodonPosition)
        Parameters:
        startingCodonPosition - - from {0,1,2}, representing codon position of first value in sequences...
      • concatAlignments

        public static final Alignment concatAlignments​(Alignment[] alignments,
                                                       DataType dt)
        Concatenates an array of alignments such that the resulting alignment is all of the sub alignments place along side each other
      • getSequenceCharArray

        public static final char[] getSequenceCharArray​(Alignment a,
                                                        int sequence)
        Returns a particular sequence of an alignment as a char array
      • getSequenceString

        public static final java.lang.String getSequenceString​(Alignment a,
                                                               int sequence)
        Returns a particular sequence of an alignment as a String
      • getChangedDataType

        public static final Alignment getChangedDataType​(Alignment a,
                                                         DataType dt)
        Returns an alignment which follows the pattern of the input alignment except that all sites which do not contain states in dt (excluding the gap character) are removed. The Datatype of the returned alignment is dt
      • countUnknowns

        public static final int countUnknowns​(Alignment a,
                                              DataType dt)
        Tests the characters of an alignment to see if there are any characters that are not within a data type.
      • getLeadingIncompleteCodonsStripped

        public static final Alignment getLeadingIncompleteCodonsStripped​(Alignment base)
        Creates a new nucleotide alignment based on the input that has any leading incomplete codons (that is, the first codon of the sequence that is not a gap/unknown but is not complete - has a nucleotide unknown) replaced by a triplet of unknowns
        Parameters:
        base - The basis alignment (of any molecular data type)
        Returns:
        the resulting alignment
      • getConsistency

        public static double getConsistency​(Alignment a,
                                            Alignment b)
        Returns:
        the consistency of homology assignment between two alignments.