crowdUser
Class Refiner

java.lang.Object
  extended by crowdUser.Refiner

public class Refiner
extends java.lang.Object

Parses csv formatted raw results and allows their amelioration by get another label, a program designed by P Ipeirotis (a researcher in computer science specialized in crowdsourcing).

In doing so, the noise is reduced and the comparisons are more accurate, thus assuring a better sort.

More generally, this class parses results in order them to be easily used by a CrowdManager instance, for example by transforming a csv file into a list of String.

Author:
Leo Perrin (perrin.leo@gmail.com)

Field Summary
private  java.lang.Integer columnAnswer
          The column in which is stored the result of the question asked to the worker.
private  java.lang.Integer columnAxis
          The column in which the axis considered for the comparison is stored.
private  java.lang.Integer columnMedia1
          The column in which the identifier of the first media is.
private  java.lang.Integer columnMedia2
          The column in which the identifier of the second media is.
private  java.lang.String[] columnsCSV
          The labels of the columns in the csv file.
private  java.lang.Integer columnWorkerId
          The column in which the identifier of the worker who performed the comparison.
private  java.util.Map<java.lang.String,java.lang.Integer> compScores
          A Map containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1.
private  java.util.List<java.lang.String[]> data
          A List of array of string containing the data currently studied : hard data or treated one, depending on the moment.
 
Constructor Summary
Refiner(java.util.List<java.lang.String> fields)
          Creates a Refiner instance and sets the fields considered in the formatted results.
 
Method Summary
 void csv2List(java.lang.String csvFile)
          Puts the data contained in a csv file in the data attribute.
private  java.lang.String generateCorrectFile()
          Creates the "correct file" used by get another label (it is returned as a String).
private  java.lang.String generateInputFile()
          Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map.
 void getanotherlabel()
          Calls Panos Ipeirotis and his students' get another label to treat the results of the HIT.
 java.util.List<java.lang.String[]> getData()
          Returns the data attribute.
private  void getFields(java.util.List<java.lang.String> csvFields)
          Initializes columnsCSV and column* using the content of the 'csvFields' parameter.
private  java.lang.String signature(java.lang.String axis, java.lang.String idMedia1, java.lang.String idMedia2)
          Returns the "signature" of a comparison, a String identifying it without ambiguity.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

columnsCSV

private java.lang.String[] columnsCSV
The labels of the columns in the csv file.


columnAxis

private java.lang.Integer columnAxis
The column in which the axis considered for the comparison is stored.


columnMedia1

private java.lang.Integer columnMedia1
The column in which the identifier of the first media is.


columnMedia2

private java.lang.Integer columnMedia2
The column in which the identifier of the second media is.


columnWorkerId

private java.lang.Integer columnWorkerId
The column in which the identifier of the worker who performed the comparison.


columnAnswer

private java.lang.Integer columnAnswer
The column in which is stored the result of the question asked to the worker.


compScores

private java.util.Map<java.lang.String,java.lang.Integer> compScores
A Map containing the score of a comparison : for each hit, if the worker said the right member was greater, score+=1 ; else score += -1. Each comparisons is identified by its signature.


data

private java.util.List<java.lang.String[]> data
A List of array of string containing the data currently studied : hard data or treated one, depending on the moment.

Constructor Detail

Refiner

public Refiner(java.util.List<java.lang.String> fields)
Creates a Refiner instance and sets the fields considered in the formatted results.

It will use the fields contained in the 'fields' parameter to initialize the columnsCSV attribute as well as all those corresponding to a column number (i.e : column[Axis | Media[1|2] | Answer | WorkerId]).

Parameters:
fields - A list containing the fields of the CSV file containing the HIT results.
Method Detail

signature

private java.lang.String signature(java.lang.String axis,
                                   java.lang.String idMedia1,
                                   java.lang.String idMedia2)
Returns the "signature" of a comparison, a String identifying it without ambiguity.

The signature of a comparison is defined as follow : if a comparisons has been performed between idMedia1 and idMedia2 along the axis axis, its signature is : axis__idMedia1__idMedia2. It is the key under which the result of the comparison is stored in the compScores attribute.

Parameters:
axis - the axis along which the studied comparison is performed
idMedia1 - the identifier of the first media
idMedia2 - the identifier of the second media
Returns:
sig, a String : the signature of a comparison.
See Also:
compScores

getFields

private void getFields(java.util.List<java.lang.String> csvFields)
Initializes columnsCSV and column* using the content of the 'csvFields' parameter.

For each String contained in csvFields, it is checked whether its value is one of the compulsory ones (i.e, idMedia1, idMedia2, WorkerId, axis, and a last much longer one corresponding to the question asked), in which case the column number is saved in the corresponding attribute.

Although MiscData[1/2] are also compulsory fields, they are not saved here in order to save unnecessary space.

Parameters:
csvFields - A lists containing the fields of the CSV file containing the HIT results.

csv2List

public void csv2List(java.lang.String csvFile)
Puts the data contained in a csv file in the data attribute.

The data attribute is a List of Arrays of String. Each array is made of the content of one row of the csv file, each one of its case being the content of a column in this line.

Parameters:
csvFile - The path to the csv file within the '../data/HITresults/' directory, minus the extension (for instance, results$JOB_ID instead of results$JOB_ID.csv).

getanotherlabel

public void getanotherlabel()
Calls Panos Ipeirotis and his students' get another label to treat the results of the HIT.

This method has three steps:

  1. All the "files" (here, String formatted like files) are generated using the appropriate method, except the costFile which is hard coded.
  2. An ipeirotis.gal.scripts.DawidSkene instance is then created if their is enough HIT results to better their quality ; otherwise results are taken directly from the correctFile String. Indeed, to few data causes get another label to violently crash.
  3. The results are parsed and put in the data attribute.

For more details on how get another label works, feel free to read its partially commented source files or to browse its absence of documentation.


generateCorrectFile

private java.lang.String generateCorrectFile()
Creates the "correct file" used by get another label (it is returned as a String).

"Get another label" needs a "correct file" in its input. For more details, see this (correct file section).


generateInputFile

private java.lang.String generateInputFile()
Creates the "input file" used by 'get another label' (it is returned as a String) and puts the comparisons in the compScores and compNumber Map.

"Get another label" needs an "input file" in its input. For more details, see this (input file section).

This String formatted like a file (it contains "\n" at the end of each "line") contains the hard results of the HITs. The String is actually generated in the end of the "try" block. In this String, results are recorded they way they are in the csv file, i.e "1" means that the greater of the two media is the first, "2" meaning the contrary.

However, the score of each comparisons is calculated in this function for better performance. The score is an integer. This score is 0 in the beginning, then 1 is added if media1 IS greater than media2 and -1 is if it is actually the contrary. All the hard results are checked only once, results being added to the compScores Map attribute as they are checked.


getData

public java.util.List<java.lang.String[]> getData()
Returns the data attribute.

Returns:
this.data