All Superinterfaces:

java.lang.Iterable<java.util.List<Attribute>>

All Known Implementing Classes:

MySQLDataSet, TextFileDataSet
```
public interface DataSet
extends java.lang.Iterable<java.util.List<Attribute>>
```
DataSet is a container compound by examples. Each example (or DataRecord) is a set of named attributes either discrete or continuous that represents a particular case of some type of class object. This DataSet contain methods to manipulate those DataRecord and calculate information gain, and entropy among the dataset.
```
 TODO:
 - methods to read from .names/.data files
 - methods to export in format .names/.data
 - methods to read from databases tables?
 - optimizations in the sorting to avoid multiple sorting.
 
```

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method	Description
`Attribute`	`allTheSame()`	Returns the most common output attribute if the rest of the attributes are exactly the same over the whole data set.
`boolean`	`allTheSameOutput()`
`void`	`close()`	Closes the underlying data source if possible.
`java.util.HashMap<Attribute,java.lang.Integer>`	`getFrequencies(int lo, int hi, int fieldIndex)`	Gets a map between the different values of the attribute at the fieldIndex and their respective frequencies.
`int`	`getItemsCount()`
`MetaData`	`getMetaData()`
`int`	`getOutputIndex()`
`DataSet`	`getSubset(int lo, int hi)`	Create a slice of this data set as a new data set from [lo, hi).
`java.lang.Iterable<java.util.List<Attribute>>`	`sortOver(int fieldIndex)`	Sort the data set over the fieldIndex as primary key and the output index to break any ties.
`java.lang.Iterable<java.util.List<Attribute>>`	`sortOver(int lo, int hi, int fieldIndex)`	Sort the data set over the fieldIndex as primary key and the output index to break any ties, and limit the elements to [lo, hi).
`DataSet[]`	`splitKeepingRelation(double proportion)`	Splits this data set into two new dataset where the proportion between the output classes is kept.

Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator

- Method Detail
  - getOutputIndex
```
int getOutputIndex()
```
    Returns:
    
    The index of the field used as output.
  - getItemsCount
```
int getItemsCount()
```
    Returns:
    
    The total count of elements in this data set.
  - getMetaData
```
MetaData getMetaData()
```
    Returns:
    
    A MetaData object containing information about the attributes on the data set.
  - sortOver
```
java.lang.Iterable<java.util.List<Attribute>> sortOver(int fieldIndex)
```
    Sort the data set over the fieldIndex as primary key and the output index to break any ties. This index is remember for future internal references.
    
    Parameters:
    
    fieldIndex - The field to sort over
    
    Returns:
    
    A iterable representation of this data set sorted over the field index.
  - sortOver
```
java.lang.Iterable<java.util.List<Attribute>> sortOver(int lo,
                                                       int hi,
                                                       int fieldIndex)
```
    Sort the data set over the fieldIndex as primary key and the output index to break any ties, and limit the elements to [lo, hi). This index is remember for future internal references.
    
    Parameters:
    
    lo - The lower bound (inclusive) of the data set to be returned
    
    hi - The upper bound (exclusive) of the data set to be returned.
    
    fieldIndex - The field to sort over
    
    Returns:
    
    A iterable representation of this data set sorted over the field index from [lo, hi).
  - getSubset
```
DataSet getSubset(int lo,
                  int hi)
```
    Create a slice of this data set as a new data set from [lo, hi).
    
    Parameters:
    
    lo - The lower bound (inclusive) of the data set to be returned
    
    hi - The upper bound (exclusive) of the data set to be returned.
    
    Returns:
    
    A new data set that is the copy of this from [lo, hi)
  - allTheSameOutput
```
boolean allTheSameOutput()
```
    Returns:
    
    True if all the classes (value of output index) are the same.
  - allTheSame
```
Attribute allTheSame()
```
    Returns the most common output attribute if the rest of the attributes are exactly the same over the whole data set. If there is one single record with one single attribute different from the rest, then this method will return null.
    
    Returns:
    
    The most common output attribute or null if there is one record different from the rest.
  - splitKeepingRelation
```
DataSet[] splitKeepingRelation(double proportion)
```
    Splits this data set into two new dataset where the proportion between the output classes is kept. The first dataset contains the proportion of the original data set, for instance, if the data set has 100 elements distributed between 2 classes, in a 60/40 proportion, this first set will contain (60*proportion + 40*proportion) elements and the second data set will contain the rest. This method is useful to generate training/test sets from one massive data set.
    
    Parameters:
    
    proportion - the percentage of element to be keep of each class for the first data set.
    
    Returns:
    
    an array of 2 positions with the dataset as described above.
  - getFrequencies
```
java.util.HashMap<Attribute,java.lang.Integer> getFrequencies(int lo,
                                                              int hi,
                                                              int fieldIndex)
```
    Gets a map between the different values of the attribute at the fieldIndex and their respective frequencies. It limits the count space to [lo, hi).
    
    Parameters:
    
    lo - The lower bound (inclusive) of the data set to be returned
    
    hi - The upper bound (exclusive) of the data set to be returned.
    
    fieldIndex - The field to count
    
    Returns:
    
    a map with the different values and their respective frequencies.
  - close
```
void close()
```
    Closes the underlying data source if possible.

Interface DataSet

Method Summary

Methods inherited from interface java.lang.Iterable

Method Detail

getOutputIndex

getItemsCount

getMetaData

sortOver

sortOver

getSubset

allTheSameOutput

allTheSame

splitKeepingRelation

getFrequencies

close