Class ArffLoader.ArffReader

  • All Implemented Interfaces:
    RevisionHandler
    Enclosing class:
    ArffLoader

    public static class ArffLoader.ArffReader
    extends java.lang.Object
    implements RevisionHandler
    Reads data from an ARFF file, either in incremental or batch mode.

    Typical code for batch usage:

     BufferedReader reader =
       new BufferedReader(new FileReader("/some/where/file.arff"));
     ArffReader arff = new ArffReader(reader);
     Instances data = arff.getData();
     data.setClassIndex(data.numAttributes() - 1);
     
    Typical code for incremental usage:
     BufferedReader reader =
       new BufferedReader(new FileReader("/some/where/file.arff"));
     ArffReader arff = new ArffReader(reader, 1000);
     Instances data = arff.getStructure();
     data.setClassIndex(data.numAttributes() - 1);
     Instance inst;
     while ((inst = arff.readInstance(data)) != null) {
       data.add(inst);
     }
     
    Version:
    $Revision: 11137 $
    Author:
    Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz), fracpete (fracpete at waikato dot ac dot nz)
    • Constructor Summary

      Constructors 
      Constructor Description
      ArffReader​(java.io.Reader reader)
      Reads the data completely from the reader.
      ArffReader​(java.io.Reader reader, int capacity)
      Reads only the header and reserves the specified space for instances.
      ArffReader​(java.io.Reader reader, Instances template, int lines)
      Reads the data without header according to the specified template.
      ArffReader​(java.io.Reader reader, Instances template, int lines, int capacity)
      Initializes the reader without reading the header according to the specified template.
    • Constructor Detail

      • ArffReader

        public ArffReader​(java.io.Reader reader)
                   throws java.io.IOException
        Reads the data completely from the reader. The data can be accessed via the getData() method.
        Parameters:
        reader - the reader to use
        Throws:
        java.io.IOException - if something goes wrong
        See Also:
        getData()
      • ArffReader

        public ArffReader​(java.io.Reader reader,
                          int capacity)
                   throws java.io.IOException
        Reads only the header and reserves the specified space for instances. Further instances can be read via readInstance().
        Parameters:
        reader - the reader to use
        capacity - the capacity of the new dataset
        Throws:
        java.io.IOException - if something goes wrong
        java.lang.IllegalArgumentException - if capacity is negative
        See Also:
        getStructure(), readInstance(Instances)
      • ArffReader

        public ArffReader​(java.io.Reader reader,
                          Instances template,
                          int lines)
                   throws java.io.IOException
        Reads the data without header according to the specified template. The data can be accessed via the getData() method.
        Parameters:
        reader - the reader to use
        template - the template header
        lines - the lines read so far
        Throws:
        java.io.IOException - if something goes wrong
        See Also:
        getData()
      • ArffReader

        public ArffReader​(java.io.Reader reader,
                          Instances template,
                          int lines,
                          int capacity)
                   throws java.io.IOException
        Initializes the reader without reading the header according to the specified template. The data must be read via the readInstance() method.
        Parameters:
        reader - the reader to use
        template - the template header
        lines - the lines read so far
        capacity - the capacity of the new dataset
        Throws:
        java.io.IOException - if something goes wrong
        See Also:
        getData()
    • Method Detail

      • getLineNo

        public int getLineNo()
        returns the current line number
        Returns:
        the current line number
      • readInstance

        public Instance readInstance​(Instances structure)
                              throws java.io.IOException
        Reads a single instance using the tokenizer and returns it.
        Parameters:
        structure - the dataset header information, will get updated in case of string or relational attributes
        Returns:
        null if end of file has been reached
        Throws:
        java.io.IOException - if the information is not read successfully
      • readInstance

        public Instance readInstance​(Instances structure,
                                     boolean flag)
                              throws java.io.IOException
        Reads a single instance using the tokenizer and returns it.
        Parameters:
        structure - the dataset header information, will get updated in case of string or relational attributes
        flag - if method should test for carriage return after each instance
        Returns:
        null if end of file has been reached
        Throws:
        java.io.IOException - if the information is not read successfully
      • getStructure

        public Instances getStructure()
        Returns the header format
        Returns:
        the header format
      • getData

        public Instances getData()
        Returns the data that was read
        Returns:
        the data
      • getRevision

        public java.lang.String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface RevisionHandler
        Returns:
        the revision