CPII system ranks FIRST in BOTH SEEN and UNSEEN Tests in the 2022 DialDoc Workshop organized by The Association for Computational Linguistics (ACL)
EVENT | JUNE | 2022
It is pleased to announce that our Principal Investigator Professor Helen Meng’s system ranks FIRST in BOTH SEEN and UNSEEN Tests in the 2022 DialDoc Workshop which was organized by The Association for Computational Linguistics (ACL). This year, ACL organized the 2022 DialDoc Workshop which hosted the Shared Task on building open-book goal-oriented dialogue systems. Out of the top 3 systems in each Test, the organizers selected 100 out of some 800 dialog turns. The organizers sent these 100 selected turns to anonymous human evaluators, who scored each turn manually. The top 3 systems were re-ranked according to the human scoring. After reranking, the CPII system ranks FIRST in the SEEN test, SECOND in the UNSEEN test, and achieve the BEST system overall.
In the Human-scored Re-ranking of the MultiDoc2Dial Challenge, Prof Helen Meng’s system ranked first overall.
This challenge provides the necessary data to contestants for free. The data include several thousand passages from US sources from over 480 US government webpages, such as the Department of Motor Vehicles, Veteran Affairs (va.gov), Social Security Administration (ssa.gov), Student Aid (studentaid.gov), etc. The contestants need to generate natural responses to answer questions that are likely asked by the general public seeking related information. The 2022 Shared Task, MultiDoc2Dial, is a new task with a dataset on modeling goal-oriented in multiple documents. The aim is to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics and hence is grounded on different documents.
Prof Meng’s team developed a dialog system and competed in the Shared Task. The evaluation includes both the SEEN test and UNSEEN test, depending on whether some of the test data may have been included (by the organizers) in the training data (please see Figure 2-1a). Out of over 500 submissions from around the world, automatic evaluation shows .
The ranking results on the LeaderBoard:
LeaderBoard Ranking based on Automatic Evaluation Results on the SharedTask SEEN test
LeaderBoard Ranking based on Automatic Evaluation Results on the SharedTask UNSEEN test
More information can be found on the following websites: