crowdUser
Class CrowdManager

java.lang.Object
  extended by crowdUser.CrowdManager

public class CrowdManager
extends java.lang.Object

Serves as an intermediary between the MySQLbase database in which HIT results are stored and CrowdFlower, our crowdsourcing provider.

This class performs several operations : after retrieving unknown comparisons, it submits them to the crowd using CrowdFlower's "self service". When the HITs are finished, it downloads the results from the Antechamber (see the online documentation), saves them, gives them to its Refiner instance (refiner) to parse them and sends them to dbSQL, a MySQLbase instance in which they are stored.

Once this is done, it uses refiner to reduce the noise in the hard results and thus to obtain refined results (more details in the Refiner documentation). These are then sent to dbSQL.

Author:
Leo Perrin (perrin.leo@gmail.com)

Field Summary
private  java.util.List<java.lang.String> axes
          The axes along which each comparison must be performed.
private  java.lang.String cfKey
          The key given by crowdFlower to authenticate yourself.
private  java.lang.Integer comparisonsNumber
          The number of comparisons performed during the current iteration.
private  MediaBase dbMedia
          The MediaBase instance in which the names of the media are stored
private  MySQLbase dbSQL
          The MySQLbase instance to which it is connected and in which HIT results must be stored.
private  java.util.List<java.lang.Integer> greater
          The ID of the media alleged to be "greater".
private  java.lang.Integer originalJobId
          The identifier of the "original" job, i.e the one from which every characteristics (webhook address, tags, cml configuration...) will be copied in all the jobs created during the current execution of CPS.
private  Refiner refiner
          The Refiner instance used to treat the result of the HITs.
private  java.lang.String resultsUrl
          The number of jobs already submitted (a job is a group of HITs)
private  java.util.List<java.lang.Integer> smaller
          The ID of the media alleged to be "smaller".
 
Constructor Summary
CrowdManager(java.lang.Integer firstJobId, java.lang.String key, java.lang.String resultsUrl, MySQLbase baseSQL, MediaBase baseDIR, java.util.List<java.lang.String> fieldsHardData)
          Creates an instance of the CrowdManager class.
 
Method Summary
 java.lang.Integer createNewJob()
          Creates a new job at CrowdFlower that is a copy of the "original" one (the one with the originalJobId identifier) and returns its identifier.
 java.util.List<java.lang.String> getAxes()
          Returns the value of axes.
 java.lang.Integer getComparisonsNumber()
          Returns the value of comparisonsNumber.
 java.util.List<java.lang.Integer> getGreater()
          Returns the value of greater.
 java.lang.String getHITresults(java.lang.Integer jobId)
          Downloads results from the URL specified in the resultsUrl attribute and returns the path to it minus its extension (i.e '..
 java.util.List<java.lang.Integer> getSmaller()
          Returns the value of smaller.
 void readHITresults(java.lang.String path)
          Adds the HIT results contained in the csv file in path to the dbSQL MySQLbase database.
static void thisIsTheEnd(java.lang.String antechamberUrl, java.lang.String results)
          Asks the Antechamber to send an email signaling the end of the Splitsort.
 void uploadHIT(java.lang.Integer jobId)
          Uploads at CrowdFlower the csv file corresponding to the comparisons required.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

greater

private java.util.List<java.lang.Integer> greater
The ID of the media alleged to be "greater". They are to be compared with those of the "smaller" attribute through HITs.


smaller

private java.util.List<java.lang.Integer> smaller
The ID of the media alleged to be "smaller". They are to be compared with those of the "greater" attribute through HITs.


axes

private java.util.List<java.lang.String> axes
The axes along which each comparison must be performed.


comparisonsNumber

private java.lang.Integer comparisonsNumber
The number of comparisons performed during the current iteration.


cfKey

private java.lang.String cfKey
The key given by crowdFlower to authenticate yourself. If you have an account (and you definitively have to), it is available here.


originalJobId

private java.lang.Integer originalJobId
The identifier of the "original" job, i.e the one from which every characteristics (webhook address, tags, cml configuration...) will be copied in all the jobs created during the current execution of CPS.


resultsUrl

private java.lang.String resultsUrl
The number of jobs already submitted (a job is a group of HITs)


dbMedia

private MediaBase dbMedia
The MediaBase instance in which the names of the media are stored


dbSQL

private MySQLbase dbSQL
The MySQLbase instance to which it is connected and in which HIT results must be stored.


refiner

private Refiner refiner
The Refiner instance used to treat the result of the HITs.

Constructor Detail

CrowdManager

public CrowdManager(java.lang.Integer firstJobId,
                    java.lang.String key,
                    java.lang.String resultsUrl,
                    MySQLbase baseSQL,
                    MediaBase baseDIR,
                    java.util.List<java.lang.String> fieldsHardData)
Creates an instance of the CrowdManager class. Initializes all of its attributes, except the greater, smaller and axes attributes. Indeed, these are set afterward using affectation like CrowdManagerInstance.getGreater() = someArrayOfIntegers.

Parameters:
firstJobId - The identifier of a job in your CrowdFlower account already configured.
key - The key given by CrowdFlower.
resultsUrl - The URL where results$JOB_ID.txt is, most likely that of the csv/ folder of the Antechamber.
baseSQL - The MySQLbase in which the results must be stored.
baseDIR - The MediaBase in which the name of the media sorted are stored
fieldsHardData - The fields of the CSV file returned by the Antechamber containing the HITs' hard results.
Method Detail

createNewJob

public java.lang.Integer createNewJob()
Creates a new job at CrowdFlower that is a copy of the "original" one (the one with the originalJobId identifier) and returns its identifier.

Creates a copy of of the original job using a http POST to the address given in Crowdflower's documentation. When the POST is done, CrowdFlower returns a JSON formatted string containing, among other things, the identifier of the job.

Returns:
The identifier of the new job.

uploadHIT

public void uploadHIT(java.lang.Integer jobId)
Uploads at CrowdFlower the csv file corresponding to the comparisons required.

The csv (Comma Separated File) file is first generated. The fields used are NOT set using the config.xml, they are hard coded. Indeed, CrowdFlower needs nothing else than the axis along which the comparison is performed, the identifier of the media (so they can be sent back), their names (so they can be displayed) and a text associated with each one (for example to respect a CC-BY license).

Once generated, it is sent using a PUT http request containing the csv file, the identifier of the job (jobNUmber) and the key given by CrowdFlower (the key parameter).

Parameters:
jobId - The identifier of the job that will receive the data contained in the csv file.

getHITresults

public java.lang.String getHITresults(java.lang.Integer jobId)
Downloads results from the URL specified in the resultsUrl attribute and returns the path to it minus its extension (i.e '../data/HITresults/results42' for the results stored in the '../data/HITresults/results42.csv' file).

First, you have to make sure the results are at the given address: was the correct webhook sent at the Antechamber ? Is the processing of the raw results finished? An email is sent when both of these operations are finisehded.

If so, use this method to download the results$JOB_ID.txt file containing the answers of the workers and to save it as ../data/results/results$JOB_ID.csv.

Parameters:
jobId - The identifier of the job used to obtain comparisons. It is given by CrowdFlower's web application when a new job is created as well as the createNewJob() method.
Returns:
The path to the csv file containing the results of the job.

readHITresults

public void readHITresults(java.lang.String path)
Adds the HIT results contained in the csv file in path to the dbSQL MySQLbase database.

The hard results are sent to refiner in order it to parse them. They are then sent to dbSQL to be stored. At this point, dbSQL contains all of the results obtained.

After that, results are refined by refiner in order to reduce the noise, see the Refiner.getanotherlabel method for more details on how it is done. Again, when this is over, refined results are stored in dbSQL.

Parameters:
path - The path to the csv file containing the HIT results
See Also:
Refiner

thisIsTheEnd

public static void thisIsTheEnd(java.lang.String antechamberUrl,
                                java.lang.String results)
Asks the Antechamber to send an email signaling the end of the Splitsort.

Uses an http POST powered by java.net to the correct URL to ask the Antechamber to send an e-mail containing the identifier of this last job, information on how to retrieve the results and the content of the results String as an attachment.

Parameters:
antechamberUrl - The url to which the information must be posted
results - Content to be sent as an attachment file along with the e-mail. The name it will have is set in the Antechamber.

getGreater

public java.util.List<java.lang.Integer> getGreater()
Returns the value of greater.

Returns:
greater

getSmaller

public java.util.List<java.lang.Integer> getSmaller()
Returns the value of smaller.

Returns:
smaller

getAxes

public java.util.List<java.lang.String> getAxes()
Returns the value of axes.

Returns:
axes

getComparisonsNumber

public java.lang.Integer getComparisonsNumber()
Returns the value of comparisonsNumber.

Returns:
comparisonsNumber