0.8
Sorting media using crowdsourcing.   
Doxygen
LIRIS
Public Member Functions | Static Public Member Functions | Private Attributes

crowdUser.CrowdManager Class Reference

Serves as an intermediary between the MySQLbase database in which HIT results are stored and CrowdFlower, our crowdsourcing provider. More...

Collaboration diagram for crowdUser.CrowdManager:

List of all members.

Public Member Functions

 CrowdManager (Integer firstJobId, String key, String resultsUrl, MySQLbase baseSQL, MediaBase baseDIR, List< String > fieldsHardData)
 Creates an instance of the CrowdManager class.
Integer createNewJob ()
 Creates a new job at CrowdFlower that is a copy of the "original" one (the one with the originalJobId identifier) and returns its identifier.
void uploadHIT (Integer jobId)
 Uploads at CrowdFlower the csv file corresponding to the comparisons required.
String getHITresults (Integer jobId)
 Downloads results from the URL specified in the resultsUrl attribute and returns the path to it minus its extension (i.e '../data/HITresults/results42' for the results stored in the '../data/HITresults/results42.csv' file).
void readHITresults (String path)
 Adds the HIT results contained in the csv file in path to the dbSQL myDataBases.MySQLbase database.
List< Integer > getGreater ()
 Returns the value of greater.
List< Integer > getSmaller ()
 Returns the value of smaller.
List< String > getAxes ()
 Returns the value of axes.
Integer getComparisonsNumber ()
 Returns the value of comparisonsNumber.

Static Public Member Functions

static void thisIsTheEnd (String antechamberUrl, String results)
 Asks the Antechamber to send an email signaling the end of the Splitsort.

Private Attributes

List< Integer > greater
 The ID of the media alleged to be "greater".
List< Integer > smaller
 The ID of the media alleged to be "smaller".
List< String > axes
 The axes along which each comparison must be performed.
Integer comparisonsNumber
 The number of comparisons performed during the current iteration.
String cfKey
 The key given by crowdFlower to authenticate yourself.
Integer originalJobId
 The identifier of the "original" job, i.e the one from which every characteristics (webhook address, tags, cml configuration...) will be copied in all the jobs created during the current execution of CPS.
String resultsUrl
 The number of jobs already submitted (a job is a group of HITs)
MediaBase dbMedia
 The MediaBase instance in which the names of the media are stored.
MySQLbase dbSQL
 The MySQLbase instance to which it is connected and in which HIT results must be stored.
Refiner refiner
 The Refiner instance used to treat the result of the HITs.

Detailed Description

Serves as an intermediary between the MySQLbase database in which HIT results are stored and CrowdFlower, our crowdsourcing provider.

This class performs several operations : after retrieving unknown comparisons, it submits them to the crowd using CrowdFlower's "self service". When the HITs are finished, it downloads the results from the Antechamber (see the online documentation), saves them, gives them to its Refiner instance (refiner) to parse them and sends them to dbSQL, a myDataBases.MySQLbase instance in which they are stored.

Once this is done, it uses refiner to reduce the noise in the hard results and thus to obtain refined results (more details in the Refiner documentation). These are then sent to dbSQL.

Author:
Leo Perrin (perrin.leo@gmail.com)

Definition at line 39 of file CrowdManager.java.


Constructor & Destructor Documentation

crowdUser.CrowdManager.CrowdManager ( Integer  firstJobId,
String  key,
String  resultsUrl,
MySQLbase  baseSQL,
MediaBase  baseDIR,
List< String >  fieldsHardData 
)

Creates an instance of the CrowdManager class.

Initializes all of its attributes, except the greater, smaller and axes attributes. Indeed, these are set afterward using affectation like CrowdManagerInstance.getGreater() = someArrayOfIntegers.

Parameters:
firstJobIdThe identifier of a job in your CrowdFlower account already configured.
keyThe key given by CrowdFlower.
resultsUrlThe URL where results$JOB_ID.txt is, most likely that of the csv/ folder of the Antechamber.
baseSQLThe MySQLbase in which the results must be stored.
baseDIRThe MediaBase in which the name of the media sorted are stored
fieldsHardDataThe fields of the CSV file returned by the Antechamber containing the HITs' hard results.

Definition at line 110 of file CrowdManager.java.

  {
        this.comparisonsNumber = 0;
        this.resultsUrl = resultsUrl;
        this.greater = new ArrayList<Integer>();
        this.smaller = new ArrayList<Integer>();
        this.axes = new ArrayList<String>();
        this.originalJobId = firstJobId;
        this.cfKey = key;
        this.dbSQL = baseSQL;
        this.dbMedia = baseDIR ;
        this.refiner = new Refiner(fieldsHardData);
  };

Member Function Documentation

Integer crowdUser.CrowdManager.createNewJob ( )

Creates a new job at CrowdFlower that is a copy of the "original" one (the one with the originalJobId identifier) and returns its identifier.

Creates a copy of of the original job using a http POST to the address given in Crowdflower's documentation. When the POST is done, CrowdFlower returns a JSON formatted string containing, among other things, the identifier of the job.

Returns:
The identifier of the new job.

Definition at line 139 of file CrowdManager.java.

  {
        Integer id = 0;

        String postUrl = "http://api.crowdflower.com/v1/jobs/" + this.originalJobId
                  + "/copy.json?key=" + this.cfKey ;
        // posting the request
        try
        {
              URL url = new URL(postUrl);
              HttpURLConnection conn = (HttpURLConnection) url.openConnection();
              // Building an HTTP request
              conn.setDoOutput(true);
              conn.setRequestMethod("POST");
              conn.connect();
              // getting the answer from CrowdFlower and reading the "id" field
              BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
              String line, idString = "";
              while ((line = rd.readLine()) != null)
                    if (line.contains('"'+"id"+'"'))
                          // we go to the index of ' "id": ' and read the number. In order no to take '"id":', 5 is added.
                          for (int i = line.lastIndexOf('"' +"id"+'"')+5; i<line.length(); i++)
                                if ( line.charAt(i) == ',' )
                                      break;
                                else
                                      idString += line.charAt(i) ;
              id = Integer.parseInt(idString);
              rd.close();
        } catch (Exception e) { e.printStackTrace(); }

        return id;
  }
List<String> crowdUser.CrowdManager.getAxes ( )

Returns the value of axes.

Returns:
axes

Definition at line 382 of file CrowdManager.java.

  { return this.axes; }
Integer crowdUser.CrowdManager.getComparisonsNumber ( )

Returns the value of comparisonsNumber.

Returns:
comparisonsNumber

Definition at line 390 of file CrowdManager.java.

  { return this.comparisonsNumber; }
List<Integer> crowdUser.CrowdManager.getGreater ( )

Returns the value of greater.

Returns:
greater

Definition at line 366 of file CrowdManager.java.

  { return this.greater ; }
String crowdUser.CrowdManager.getHITresults ( Integer  jobId)

Downloads results from the URL specified in the resultsUrl attribute and returns the path to it minus its extension (i.e '../data/HITresults/results42' for the results stored in the '../data/HITresults/results42.csv' file).

First, you have to make sure the results are at the given address: was the correct webhook sent at the Antechamber ? Is the processing of the raw results finished? An email is sent when both of these operations are finisehded.

If so, use this method to download the results$JOB_ID.txt file containing the answers of the workers and to save it as ../data/results/results$JOB_ID.csv.

Parameters:
jobIdThe identifier of the job used to obtain comparisons. It is given by CrowdFlower's web application when a new job is created as well as the createNewJob() method.
Returns:
The path to the csv file containing the results of the job.

Definition at line 259 of file CrowdManager.java.

  {
        try
        {
              // creating connection
              String urlString = this.resultsUrl + "csv/" + jobId + ".txt";
              System.out.println(urlString);
              URL url = new URL(urlString);
              System.out.println("Opening connection to " + urlString + "...");
              // reading input
              InputStream is = url.openStream();
              System.out.flush();
              // creating results$JOB_ID.csv file
              FileOutputStream fos=null;
              fos = new FileOutputStream("../data/HITresults/results" + jobId + ".csv");
              // writing to file
              int oneChar, count=0;
              while ((oneChar=is.read()) != -1)
              {
                    fos.write(oneChar);
                    count++;
              }
              // close everything
              is.close();
              fos.close();
              System.out.println("csv results file downloaded, " + count + " byte(s) copied");
        }
      catch (Exception e) { e.printStackTrace(); }
      // return the name of the file minus ".csv"
        return "results" + jobId ;
  }
List<Integer> crowdUser.CrowdManager.getSmaller ( )

Returns the value of smaller.

Returns:
smaller

Definition at line 374 of file CrowdManager.java.

  { return this.smaller; }
void crowdUser.CrowdManager.readHITresults ( String  path)

Adds the HIT results contained in the csv file in path to the dbSQL myDataBases.MySQLbase database.

The hard results are sent to refiner in order it to parse them. They are then sent to dbSQL to be stored. At this point, dbSQL contains all of the results obtained.

After that, results are refined by refiner in order to reduce the noise, see the Refiner.getanotherlabel method for more details on how it is done. Again, when this is over, refined results are stored in dbSQL.

Parameters:
pathThe path to the csv file containing the HIT results
See also:
Refiner

Definition at line 306 of file CrowdManager.java.

  {
        List<String[]> entries ;
        // retrieve hard data from the csv file
        this.refiner.csv2List(path) ;
        entries = this.refiner.getData() ;
        this.dbSQL.insertHardDataInSQL(entries);
        // retrieve good data from the refiner
        this.refiner.getanotherlabel() ;
        entries = this.refiner.getData() ;
        this.dbSQL.insertResultsInSQL(entries,this.dbMedia) ;
  }
static void crowdUser.CrowdManager.thisIsTheEnd ( String  antechamberUrl,
String  results 
) [static]

Asks the Antechamber to send an email signaling the end of the Splitsort.

Uses an http POST powered by java.net to the correct URL to ask the Antechamber to send an e-mail containing the identifier of this last job, information on how to retrieve the results and the content of the results String as an attachment.

Parameters:
antechamberUrlThe url to which the information must be posted
resultsContent to be sent as an attachment file along with the e-mail. The name it will have is set in the Antechamber.

Definition at line 333 of file CrowdManager.java.

  {
        try
        {
              URL url = new URL(antechamberUrl);
              HttpURLConnection conn = (HttpURLConnection) url.openConnection();
              // Building an HTTP request
              conn.setDoOutput(true);
              conn.setRequestMethod("POST");
              OutputStreamWriter osw = new OutputStreamWriter(conn.getOutputStream());
              String data = URLEncoder.encode("signal", "UTF-8") + "=" + URLEncoder.encode("sort_finished","UTF-8")
                  + "&" + URLEncoder.encode("payload", "UTF-8") + "=" + URLEncoder.encode(results, "UTF-8");
              osw.write(data);
              osw.flush();
              osw.close();
              // getting the answer from CrowdFlower
              BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
              String line;
              while ((line = rd.readLine()) != null) {
                  System.out.println(line);
              }
              rd.close();
        } catch (Exception e) { e.printStackTrace(); }
  }
void crowdUser.CrowdManager.uploadHIT ( Integer  jobId)

Uploads at CrowdFlower the csv file corresponding to the comparisons required.

The csv (Comma Separated File) file is first generated. The fields used are NOT set using the config.xml, they are hard coded. Indeed, CrowdFlower needs nothing else than the axis along which the comparison is performed, the identifier of the media (so they can be sent back), their names (so they can be displayed) and a text associated with each one (for example to respect a CC-BY license).

Once generated, it is sent using a PUT http request containing the csv file, the identifier of the job (jobNUmber) and the key given by CrowdFlower (the key parameter).

Parameters:
jobIdThe identifier of the job that will receive the data contained in the csv file.

Definition at line 187 of file CrowdManager.java.

  {
        this.comparisonsNumber += this.greater.size() ;
        String post = "";
        // writing the .csv file and the post String
        try{
              BufferedWriter fichier = new BufferedWriter
                              (new FileWriter("../data/HITdata/" + jobId + ".csv"));
              String line = "axis,idMedia1,idMedia2,urlMedia1,urlMedia2,miscData1,miscData2";
              fichier.write(line);
            fichier.newLine();
            post += line ;
              for (int i=0; i<this.greater.size(); i++)
              {
                    line = this.axes.get(i) + "," + this.greater.get(i) + "," + this.smaller.get(i) +
                        "," + this.dbMedia.getMedia(this.greater.get(i)) +
                        "," + this.dbMedia.getMedia(this.smaller.get(i)) +
                        ",\"" + this.dbMedia.getContent().get(this.greater.get(i)).get("miscData") +
                        "\",\"" + this.dbMedia.getContent().get(this.smaller.get(i)).get("miscData") + "\"";
                    fichier.write(line);
                  fichier.newLine();
                  post += "\n" + line ;
              }
            fichier.close();
            
        }
        catch (Exception e) { e.printStackTrace(); }
        
        // posting the file

        try
        {
              URL url = new URL("http://api.crowdflower.com/v1/jobs/" + jobId +
                        "/upload.json?key=" + this.cfKey);
              HttpURLConnection conn = (HttpURLConnection) url.openConnection();
              // Building an HTTP request
              conn.setDoOutput(true);
              conn.setRequestMethod("PUT");
              conn.setRequestProperty("Content-Type", "text/csv");
              OutputStreamWriter osw = new OutputStreamWriter(conn.getOutputStream());
              osw.write(post);
              osw.flush();
              osw.close();
              
              // getting the answer from CrowdFlower
              BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
              String line;
              while ((line = rd.readLine()) != null) {
                  System.out.println(line);
              }
              
              rd.close();
        } catch (Exception e) { e.printStackTrace(); }

  }

Member Data Documentation

List<String> crowdUser.CrowdManager.axes [private]

The axes along which each comparison must be performed.

Definition at line 58 of file CrowdManager.java.

String crowdUser.CrowdManager.cfKey [private]

The key given by crowdFlower to authenticate yourself.

If you have an account (and you definitively have to), it is available here.

Definition at line 67 of file CrowdManager.java.

The number of comparisons performed during the current iteration.

Definition at line 62 of file CrowdManager.java.

The MediaBase instance in which the names of the media are stored.

Definition at line 82 of file CrowdManager.java.

The MySQLbase instance to which it is connected and in which HIT results must be stored.

Definition at line 86 of file CrowdManager.java.

List<Integer> crowdUser.CrowdManager.greater [private]

The ID of the media alleged to be "greater".

They are to be compared with those of the "smaller" attribute through HITs.

Definition at line 49 of file CrowdManager.java.

The identifier of the "original" job, i.e the one from which every characteristics (webhook address, tags, cml configuration...) will be copied in all the jobs created during the current execution of CPS.

Definition at line 74 of file CrowdManager.java.

The Refiner instance used to treat the result of the HITs.

Definition at line 90 of file CrowdManager.java.

The number of jobs already submitted (a job is a group of HITs)

Definition at line 78 of file CrowdManager.java.

List<Integer> crowdUser.CrowdManager.smaller [private]

The ID of the media alleged to be "smaller".

They are to be compared with those of the "greater" attribute through HITs.

Definition at line 54 of file CrowdManager.java.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables