Class FastaFileIterator

  • All Implemented Interfaces:
    java.lang.Iterable<java.lang.String>, java.util.Iterator<java.lang.String>

    public class FastaFileIterator
    extends FileIterator<java.lang.String>
    Opens a fasta file and iterates over all fasta sequences in the file
    Author:
    pcingola
    • Field Detail

      • TRANSCRIPT_ID_SEPARATORS_REGEX

        public static java.lang.String TRANSCRIPT_ID_SEPARATORS_REGEX
      • TRANSCRIPT_ID_SEPARATORS

        public static char[] TRANSCRIPT_ID_SEPARATORS
    • Constructor Detail

      • FastaFileIterator

        public FastaFileIterator​(java.lang.String fastaFileName)
    • Method Detail

      • fastaHeader2Ids

        public java.util.List<java.lang.String> fastaHeader2Ids()
        Try to parse IDs from a fasta header
      • getHeader

        public java.lang.String getHeader()
        Current sequence header
      • getName

        public java.lang.String getName()
        Sequence name (first 'word') It extracts the characters after the leading '>' and before the first space, then removes leading 'chr', 'chr:', etc.
      • getTranscriptId

        public java.lang.String getTranscriptId()
        Get transcript name from FASTA header (ENSEMBL protein files) Format example: '>ENSP00000356130 pep:known chromosome:GRCh37:1:205111633:205180694:-1 gene:ENSG00000133059 transcript:ENST00000367162'
      • readNext

        protected java.lang.String readNext()
        Read a sequence from the file
        Specified by:
        readNext in class FileIterator<java.lang.String>