Package libai.classifiers.dataset
Class MySQLDataSet
- java.lang.Object
-
- libai.classifiers.dataset.MySQLDataSet
-
-
Constructor Summary
Constructors Constructor Description MySQLDataSet(java.sql.Connection conn, java.lang.String tableName, int output)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Attribute
allTheSame()
Returns the most common output attribute if the rest of the attributes are exactly the same over the whole data set.boolean
allTheSameOutput()
void
clean()
void
close()
Closes the underlying data source if possible.java.util.HashMap<Attribute,java.lang.Integer>
getFrequencies(int lo, int hi, int fieldIndex)
Gets a map between the different values of the attribute at the fieldIndex and their respective frequencies.int
getItemsCount()
MetaData
getMetaData()
int
getOutputIndex()
DataSet
getSubset(int lo, int hi)
Create a slice of this data set as a new data set from [lo, hi).java.util.Iterator<java.util.List<Attribute>>
iterator()
java.lang.Iterable<java.util.List<Attribute>>
sortOver(int fieldIndex)
Sort the data set over the fieldIndex as primary key and the output index to break any ties.java.lang.Iterable<java.util.List<Attribute>>
sortOver(int lo, int hi, int fieldIndex)
Sort the data set over the fieldIndex as primary key and the output index to break any ties, and limit the elements to [lo, hi).DataSet[]
splitKeepingRelation(double proportion)
Splits this data set into two new dataset where the proportion between the output classes is kept.java.lang.String
toString()
-
-
-
Method Detail
-
getSubset
public DataSet getSubset(int lo, int hi)
Description copied from interface:DataSet
Create a slice of this data set as a new data set from [lo, hi).
-
getOutputIndex
public int getOutputIndex()
- Specified by:
getOutputIndex
in interfaceDataSet
- Returns:
- The index of the field used as output.
-
getItemsCount
public int getItemsCount()
- Specified by:
getItemsCount
in interfaceDataSet
- Returns:
- The total count of elements in this data set.
-
getMetaData
public MetaData getMetaData()
- Specified by:
getMetaData
in interfaceDataSet
- Returns:
- A MetaData object containing information about the attributes on the data set.
-
sortOver
public java.lang.Iterable<java.util.List<Attribute>> sortOver(int fieldIndex)
Description copied from interface:DataSet
Sort the data set over the fieldIndex as primary key and the output index to break any ties. This index is remember for future internal references.
-
sortOver
public java.lang.Iterable<java.util.List<Attribute>> sortOver(int lo, int hi, int fieldIndex)
Description copied from interface:DataSet
Sort the data set over the fieldIndex as primary key and the output index to break any ties, and limit the elements to [lo, hi). This index is remember for future internal references.- Specified by:
sortOver
in interfaceDataSet
- Parameters:
lo
- The lower bound (inclusive) of the data set to be returnedhi
- The upper bound (exclusive) of the data set to be returned.fieldIndex
- The field to sort over- Returns:
- A iterable representation of this data set sorted over the field index from [lo, hi).
-
splitKeepingRelation
public DataSet[] splitKeepingRelation(double proportion)
Description copied from interface:DataSet
Splits this data set into two new dataset where the proportion between the output classes is kept. The first dataset contains theproportion
of the original data set, for instance, if the data set has 100 elements distributed between 2 classes, in a 60/40 proportion, this first set will contain (60*proportion + 40*proportion) elements and the second data set will contain the rest. This method is useful to generate training/test sets from one massive data set.- Specified by:
splitKeepingRelation
in interfaceDataSet
- Parameters:
proportion
- the percentage of element to be keep of each class for the first data set.- Returns:
- an array of 2 positions with the dataset as described above.
-
iterator
public java.util.Iterator<java.util.List<Attribute>> iterator()
- Specified by:
iterator
in interfacejava.lang.Iterable<java.util.List<Attribute>>
-
clean
public void clean()
-
allTheSameOutput
public boolean allTheSameOutput()
- Specified by:
allTheSameOutput
in interfaceDataSet
- Returns:
- True if all the classes (value of output index) are the same.
-
allTheSame
public Attribute allTheSame()
Description copied from interface:DataSet
Returns the most common output attribute if the rest of the attributes are exactly the same over the whole data set. If there is one single record with one single attribute different from the rest, then this method will return null.- Specified by:
allTheSame
in interfaceDataSet
- Returns:
- The most common output attribute or null if there is one record different from the rest.
-
getFrequencies
public java.util.HashMap<Attribute,java.lang.Integer> getFrequencies(int lo, int hi, int fieldIndex)
Description copied from interface:DataSet
Gets a map between the different values of the attribute at the fieldIndex and their respective frequencies. It limits the count space to [lo, hi).- Specified by:
getFrequencies
in interfaceDataSet
- Parameters:
lo
- The lower bound (inclusive) of the data set to be returnedhi
- The upper bound (exclusive) of the data set to be returned.fieldIndex
- The field to count- Returns:
- a map with the different values and their respective frequencies.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-