Data Challenge

Data Challenge

Guidance to Data Challenge at PHMAP 2021

Data is provided by

! Evaluation Open on May 28, 2021. Check ‘9. Submission for Data Challenge’ at the bottom.
! Updated on May 24, 2021 (Check for changes here.)
! Updated on May 18, 2021 (Check for changes here.)

  1. Introduction
    The Data Challenge in PHMAP 2021 is a competition open to all potential attendees. In this year, with rich time-series dataset given, it is required for participants to carry out regression and classification as described in part 7.
    • Participants are encouraged to enroll in the conference to see the presentations of the three winning data challengers, as well as many paper presentations on data analytics and diagnostics.
    • In this Data Challenge at PHMAP 2021, participants are encouraged to apply state-of-the-art algorithms and models to perform the regression and classification problems that happen in real-world industrial field. In collaboration with SK Telecom, exclusive access to rich dataset has been provided for this competition.
    • Data has been acquired under a variety of environments, not only fault-free condition, but also seeded faults under controlled conditions with the support of domain expertise. Hence, the dataset comprises signals in five categories: Normal, Unbalance, Belt-looseness, Belt-looseness (High), and Bearing fault.
    • As two sensors equipped on the compressor monitor the signals, the dataset contains two signals from each channel.


  2. Teams
    • Collaboration is encouraged and teams may comprise students, researchers, and professionals in various industries, from single or multiple organizations. There is no requirement on team size (but having at least one member of a team).
    • The winning teams will be selected and awarded contingent upon:
      • Register and attend the PHMAP 2021 Conference.
      • Submitting a peer-reviewed conference paper, including the analysis results and technique, and presenting at the conference.


  3. Prize
    The top three ranked teams will be invited to present at a special session of the conference and the prize will be awarded during the conference as follows:
    • 1st place: USD 1,000
    • 2nd place: USD 500
    • 3rd place: USD 300


  4. Important Dates
    Table 1: Important Dates
    Key Dates PHMAP 2021
    Round Open May 1, 2021
    Evaluation Open May 28, 2021
    Round Closed & Submission Due Jun 25, 2021
    Winner Announcement July 9, 2021
    Winning Paper Submission Due Aug 20, 2021
    Conference Dates Sep 8, 2021


  5. System Information
    • Equipment
      • Type of Equipment: Oil-injection screw compressor
      • Motor: 15kW
      • Axis rotating speed of Motor: 3,600 rpm
      • Axis rotating speed of Screw: 7,200 rpm

    • Figure 1: Shape of equipment(Compressor)

    • Data Acquisition from System
      • Sampling rate of Data Acquisition
          ① 10,544 samples per second
      • Output Channels
          ① Channel 1: Measuring vibration from Motor
          ② Channel 2: Measuring vibration from Screw

      Figure 2: Motor and Screw on Compressor


  6. Datasets
    •  Dataset provided for both Regression and Classification is consecutive sequential data.
      No. of files No. of Channels No. of Classes
      10 files, which are differentiated
      from the times thateach
      signal were measured
      2 Channels 5 Classes
      – 1 Normal Class
      – 4 Abnormal Classes
    • Two tasks shared a given experimental dataset, which is composed of 10 zip-files gathered from different times as follows.
      • Data recorded from 2020/11/16 (year/month/day)
          ① Normal 1st: Data recorded from 12:20:36 ~
          ② Unbalance 1st: Data recorded from 10:46:41~
          ③ Belt-Looseness 1st: Data recorded from 11:38:05~
          ④ Belt-Looseness High 1st: Data recorded from 11:59:14 ~
      • ii. Data recorded from 2020/12/20 (year/month/day)
          ① Bearing-fault 1st: Data recorded from 12:12:36 ~
      • Data recorded from 2021/01/20 (year/month/day)
          ① Unbalance 2nd: Data recorded from 13:35:35~
          ② Belt-Looseness 2nd: Data recorded from 14:11:30 ~
          ③ Bearing-fault 2nd: Data recorded from 15:58:28 to~
      • Data recorded from 2021/02/01 (year/month/day)
          ① Normal 2nd: Data recorded from 11:05:44 ~
          ② Unbalance 3rd: Data recorded from 13:55:32 ~
    • There are 2 channels, one from Motor and the other from Screw.
    • Description of 5 Classes
      • Normal → Fault-free operating condition
      • Unbalance → Unbalance between centers of mass and axis
      • Belt-Looseness → Looseness of V‐belt connecting between motor pully and screw pully
      • Belt-Looseness High → High Looseness of V-belt
      • Bearing fault → Removing grease of Ball Bearing on Motor, which induces its wear-out


  7. Tasks
    • For loading the training dataset, with csv file, it is recommended to use pandas.read_csv, with parameter setting “float_precision = round_trip”
    • Task I: Regression (Imputation)
      • As Figure 3, with given period in orange box, predict the period in red box in test set.
      • As seen in Test Set in the Figure 3, with given orange box which are both 0.4s(=4216 samples) in Channel 1 and 0.3s(=3162 samples) in Channel 2, predict the period in red box which is 0.1s(=1054 samples) in Channel 2. In this manner, the participants could utilize Training Set as they wish.
      • The number of test samples would be approximately 19,000.

        Figure 3: Description of Task I “Regression”
    • Task II: Classification
      • As Figure 4, for the given, classify the test set which class it is.
      • As seen in Test Set in the Figure 4, a test sample whose shape is 128 (0.012s) X 2, will be given to participant to classify the class (out of total 5 classes). In this manner, the participant could utilize Training Set as they wish.
      • The number of test samples would be approximately 78,000.

        Figure 4: Description of Task II “Classification”


  8. Evaluation
    The results from every team will be evaluated by the PHMAP Local Organizing Committee and all teams will be ranked. The top three scoring teams will be invited to present at a special session of the conference and will be recognized during the conference.
    • Evaluation of algorithm and model will be assessed with the total scores gained from Regression and Classification.
      • Evaluation Metrics
          ① Regression: RMSE
          ② Classification: F1 Score(Macro)
    • With these evaluation metrics, the ranks of participants in each Regression and Classification Task are yield, with which the total score is yield according to the NDCG
      • NDCG (Normalized Discounted Cumulative Gain)


  9. Submission for Data Challenge
    • In the middle of competition period, it is available for all participants to check the score and rank at the evaluation pages, which are technically supported by Kaggle and accordance with the Kaggle competition rule as well.
      • Note that anyone is able to view the details of this competition via the links as below.
        – Task 1. Regression (Imputation):

        – Task 2. Classification:

      • Utilizing the evaluation pages is restricted to those with access to certain link provided by the committee, which will be only given to those who are willing to participate in this competition. Please click the “Go to Application Form” button below, and fill out the form with your team information, if you wish to participate in the data challenge.

      Go to Application Form