Data Challenge

Data Challenge

Guidance to Data Challenge at PHMAP 2021
Data Challenge Committee

  1. Introduction
    The Data Challenge in PHMAP 2021 is a competition open to all potential attendees. In this year, with rich time-series dataset given, it is required for participants to carry out regression and classification as described in part 8.
    • Participants are encouraged to enroll in the conference to see the presentations of the three winning data challengers, as well as many paper presentations on data analytics and diagnostics.
    • In this Data Challenge at PHMAP 2021, participants are encouraged to apply state-of-the-art algorithms and models to perform the regression and classification problems that happen in real-world industrial field. In collaboration with SK Telecom, exclusive access to rich dataset has been provided for this competition.
    • Data has been acquired under a variety of environments, not only fault-free condition, but also seeded faults under controlled conditions with the support of domain expertise. Hence, the dataset comprises signals in six categories: Normal, Unbalance, Belt-looseness, Belt-looseness (High), Bearing fault and Bolt fault.
    • As two sensors equipped on the compressor monitor the signals, the dataset contains two signals from each channel.


  2. Teams
    • Collaboration is encouraged and teams may comprise students, researchers, and professionals in various industries, from single or multiple organizations. There is no requirement on team size (but having at least one member of a team).
    • The winning teams will be selected and awarded contingent upon:
      • Register and attend the PHMAP 2021 Conference.
      • Submitting a peer-reviewed conference paper, including the analysis results and technique, and presenting at the conference.


  3. Prize
    The top three ranked teams will be invited to present at a special session of the conference and the prize will be awarded during the conference as follows:
    • 1st place: USD 1,000
    • 2nd place: USD 500
    • 3rd place: USD 300


  4. Important Dates
    Table 1: Important Dates
    Key Dates PHMAP 2021
    Round Open May 1, 2021
    Evaluation Data Open May 20, 2021
    Round Closed Jun 11, 2021
    Preliminary Submission Due July 10 2021
    Preliminary Winner Announcement July 24, 2021
    Winning Paper Submission Due Aug 20, 2021
    Winner Announcement Aug 27, 2021
    Conference Dates Sep 8, 2021


  5. System Information
    • Equipment
      • Type of Equipment: Oil-injection screw compressor
      • Motor: 15kW
      • Axis rotating speed of Motor: 3,600 rpm
      • Axis rotating speed of Screw: 7,200 rpm

    • Figure 1: Shape of equipment(Compressor)

    • Data Acquisition from System
      • Sampling rate of Data Acquisition
          ① 10,544 samples per second
      • Output Channels
          ① Channel 1: Measuring vibration from Motor
          ② Channel 2: Measuring vibration from Screw

      Figure 2: Motor and Screw on Compressor


  6. Datasets
    • Dataset provided for both Regression and Classification
      No. of files No. of Channels No. of Classes
      10 files, which are differentiated
      from the times thateach
      signal were measured
      2 Channels 6 Classes
      – 1 Normal Class
      – 5 Abnormal Classes
    • Two tasks shared a given experimental dataset, which is composed of 10 zip-files gathered from different times as follows.
      • Data recorded from 2020/11/16 (year/month/day)
          ① Normal 1st: Data recorded from 12:20:36 to 13:01:53
          ② Unbalance 1st: Data recorded from 10:46:41 to 11:17:44
          ③ Belt-Looseness 1st: Data recorded from 11:38:05 to 11:54:59
          ④ Belt-Looseness High 1st: Data recorded from 11:59:14 to 12:10:46
          ⑤ Bearing-fault 1st: Data recorded from 12:12:36 to 13:02:50
      • Data recorded from 2021/01/20 (year/month/day)
          ① Unbalance 2nd: Data recorded from 13:35:35 to 13:54:57
          ② Belt-Looseness 2nd: Data recorded from 14:11:30 to 15:37:03
          ③ Bearing-fault 2nd: Data recorded from 15:58:28 to 17:46:39
      • Data recorded from 2021/02/01 (year/month/day)
          ① Normal 2nd: Data recorded from 11:05:44 to 13:30:04
          ② Unbalance 3rd: Data recorded from 13:55:32 to 14:57:04
    • There are 2 channels, one from Motor and the other from Screw.
    • Description of 6 Classes
      • Normal → Fault-free operating condition
      • Unbalance → Unbalance between centers of mass and axis
      • Belt-Looseness → Looseness of V‐belt connecting between motor pully and screw pully
      • Belt-Looseness High → High Looseness of V-belt
      • Bearing fault → Removing grease of Ball Bearing on Motor, which induces its wear-out


  7. Tasks
    • Task I: Regression (Imputation)
      • As Figure 3, given N samples of training set, predict period in test set.

        Figure 3: Description of Task I “Regression” and Dataset given

        Figure 4: Description of Task II “Classification” and Dataset given
    • Task II: Classification
      • As Figure 4, given N Samples of training set, classify the test set which class it is.


  8. Evaluation
    The results from every team will be evaluated by the PHMAP Local Organizing Committee and all teams will be ranked. The top three scoring teams will be invited to present at a special session of the conference and will be recognized during the conference.
    • Evaluation of algorithm and model will be assessed with the total scores gained from Regression and Classification.
      • Evaluation Metrics
          ① Regression: RMSE
          ② Classification: F1 Score(Macro)
    • With these evaluation metrics, the ranks of participants in each Regression and Classification Task are yield, with which the total score is yield according to the NDCG
      • NDCG (Normalized Discounted Cumulative Gain)


  9. Submission for Data Challenge
    • A correct submission will be composed of a CSV file whose name is the task and team name (i.e. If the task is “classification”, then the file name will be “classification-team name.csv”). The CSV file should have just one column that contains the values inferred from each team’s model, with regard to evaluation data.
        ① Code for Evaluation will be released later