SacreBLEU poster

This page contains an accessible presentation of the SacreBLEU poster, which I developed to present the ideas in (the SacreBLEU paper). This presentation was itself presented in a paper at the 2021 ICLR workshop on Rethinking ML Papers. It suggests that authors consider presenting posters in comic form, at least in certain situations, and shows how a panel-by-panel decomposition can be easily annotated for accessibility using standard web accessibility tools.

I used Procreate on an iPad Pro to draw the comic, and estimate that it took me about 20 hours. You can see a time-lapse of the drawing in this Youtube video.

# Image Description
1 Title: "A call (and proposal) for clarity in reporting BLEU scores". Author: Matt Post.</figcaption> Title: "A call (and proposal) for clarity in reporting BLEU scores". Author: Matt Post.
2 Text: Not so long ago in a galaxy you may in fact recognize Text: Not so long ago in a galaxy you may in fact recognize
3 Text announces that "Everyone is excited about the new paper from Acme University". A researcher, Stella Chercheure, is scrolling through a website named Fritter. She sees an announcement from ACME corporation about a new "Transmogrifier model". One user responds, "whoa so much bleu score." Another responds, "This reminds me of when I had the idea in 1997", providing an illegible arXiv link. Stella remarks, "I"ve got to add this to our toolkit." Text announces that "Everyone is excited about the new paper from Acme University". A researcher, Stella Chercheure, is scrolling through a website named Fritter. She sees an announcement from ACME corporation about a new "Transmogrifier model". One user responds, "whoa so much bleu score." Another responds, "This reminds me of when I had the idea in 1997", providing an illegible arXiv link. Stella remarks, "I"ve got to add this to our toolkit."
4 Text reads, "She gets to work". Stella is pictured at a whiteboard, erasing the equation "P(f|e) = P(e|f)P(f)", replacing it with "y = softmax(Wx+". There is a garbage can with crumpled up paper around it. Text reads, "She gets to work". Stella is pictured at a whiteboard, erasing the equation "P(f|e) = P(e|f)P(f)", replacing it with "y = softmax(Wx+". There is a garbage can with crumpled up paper around it.
5 Text reads, "One month later, the new system is better, and Stella had new ideas that improved it further still!". Stella, arms folded and smiling, comments, "I can't believe no one thought to use Lagrange multipliers and Kronecker delta functions before." Text reads, "One month later, the new system is better, and Stella had new ideas that improved it further still!". Stella, arms folded and smiling, comments, "I can't believe no one thought to use Lagrange multipliers and Kronecker delta functions before."
6 Text reads, "But trouble brews. She can't match Acme's scores." A table shows that her numbers are higher internally against her own baseline, but she can't match their baseline scores. Red text reads, "Problem: BLEU is underspecified, and details can be hard to find in papers." Text reads, "But trouble brews. She can't match Acme's scores." A table shows that her numbers are higher internally against her own baseline, but she can't match their baseline scores. Red text reads, "Problem: BLEU is underspecified, and details can be hard to find in papers."
7 Text at the top reads, "An email to the authors helps." It shows an email from Author 2, with a date of "A week later" and a subject of "Re: (not) feelin' blue". The email reads, "Hi Stella, In the rush to meet the ArXiv deadline, we forgot to mention we applied lambda-lambda pre-processing...". Text at the bottom reads, "With this info, she establishes parity. Time to write things up for *ACL." Text at the top reads, "An email to the authors helps." It shows an email from Author 2, with a date of "A week later" and a subject of "Re: (not) feelin' blue". The email reads, "Hi Stella, In the rush to meet the ArXiv deadline, we forgot to mention we applied lambda-lambda pre-processing...". Text at the bottom reads, "With this info, she establishes parity. Time to write things up for *ACL."
8 Text at the top reads, "Meanwhile...", and at the bottom, "Hot startup Initech has something even newer!". The panel shows an article from a tech site named "Diverge" (with similarity to the logo from the tech site named "Verge"), with a mast of reading "Tech—Science—Sensationalism". A headline reads, "Initech's new translator invents its own AI! Are nuclear weapons next?". An image shows bombs falling on a hillside. Text at the top reads, "Meanwhile...", and at the bottom, "Hot startup Initech has something even newer!". The panel shows an article from a tech site named "Diverge" (with similarity to the logo from the tech site named "Verge"), with a mast of reading "Tech—Science—Sensationalism". A headline reads, "Initech's new translator invents its own AI! Are nuclear weapons next?". An image shows bombs falling on a hillside.
9 The image is a split panel. In the middle, Stella is holding her face in her hands, with her mouth open. Text on the left reads, "Stella would like to include [Initech's] scores, but unfortunately, a comparison is not possible." In red text, "Problem: different reference processing cannot be compared." Stella exclaims, "beta-lambda-xi pre-processing?! You have got to be kidding me." On the right, text reads "Curiosity moves her to investigate". Text scattered about reads "UNKS", "casing?", "compound splitting?", "reference count", "tokenization", and "test set". A table shows different pre-processing configurations and varying BLEU scores. Stella says, "The variance in scores is larger than the gains reported in many papers!" The image is a split panel. In the middle, Stella is holding her face in her hands, with her mouth open. Text on the left reads, "Stella would like to include [Initech's] scores, but unfortunately, a comparison is not possible." In red text, "Problem: different reference processing cannot be compared." Stella exclaims, "beta-lambda-xi pre-processing?! You have got to be kidding me." On the right, text reads "Curiosity moves her to investigate". Text scattered about reads "UNKS", "casing?", "compound splitting?", "reference count", "tokenization", and "test set". A table shows different pre-processing configurations and varying BLEU scores. Stella says, "The variance in scores is larger than the gains reported in many papers!"
10 Mirroring the text from the previous panel, text at the top reads, "Frustration leads her to curse!" A large fiery shape, varying from red to yellow, consumes the panel. The text SACREBLEU!! is written overtop it. Mirroring the text from the previous panel, text at the top reads, "Frustration leads her to curse!" A large fiery shape, varying from red to yellow, consumes the panel. The text SACREBLEU!! is written overtop it.
11 The SacreBLEU character appears. He is a stereotypical Frenchman, wearing a beret and smoking a cigarette. His skin is blue. He says, "Did someone call my name?" The SacreBLEU character appears. He is a stereotypical Frenchman, wearing a beret and smoking a cigarette. His skin is blue. He says, "Did someone call my name?"
12 A long conversation ensues. Stella: "Who are you?". Sacre: "I'm the solution to all your problems." (an asterisk on "all" is clarified at the bottom with a box reading "closed world assumption"). Stella: "You make BLEU scores more comparable?" Sacre: "Well, not past ones, but for future work, I suggest everyone scores detokenized (underlined) outputs with the same reference tokenization." Stella: "How do we decide which one?" Sacre: "Typically this is accomplished with wars, but everyone's afraid of Initech..." A long conversation ensues. Stella: "Who are you?". Sacre: "I'm the solution to all your problems." (an asterisk on "all" is clarified at the bottom with a box reading "closed world assumption"). Stella: "You make BLEU scores more comparable?" Sacre: "Well, not past ones, but for future work, I suggest everyone scores detokenized (underlined) outputs with the same reference tokenization." Stella: "How do we decide which one?" Sacre: "Typically this is accomplished with wars, but everyone's afraid of Initech..."
13 The conversation continues. Sacre: "A neutral choice would be to use WMT's. They provide many of our test sets, after all." Stella: "I can live with that." The conversation continues. Sacre: "A neutral choice would be to use WMT's. They provide many of our test sets, after all." Stella: "I can live with that."
14 Stella is typing on a computer. She says, "I'll just recompute those numbers a minute now. Where did I put those references?" Sacre responds, "Actually, why don't you let me handle that?" Stella: "Okay...thanks!". White text on blue reads, "Solution: when reporting scores, use WMT scoring on detokenized system outputs." Stella is typing on a computer. She says, "I'll just recompute those numbers a minute now. Where did I put those references?" Sacre responds, "Actually, why don't you let me handle that?" Stella: "Okay...thanks!". White text on blue reads, "Solution: when reporting scores, use WMT scoring on detokenized system outputs."
15 Text reads "SacreBLEU: installation and usage are easy". It shows a bash shell prompt containing two commands for installing and using sacrebleu: "pip install sacrebleu" and "cat out.detok | sacrebleu -t wmt18 -l de-en". Underneath is a bulleted summary nothing that (*) scores comparable w/ matrix.statmt.org (*) downloads common references for you (*) info string to save your peers some grief!. At the very bottom is sacrebleu's verbose signature string. Text reads "SacreBLEU: installation and usage are easy". It shows a bash shell prompt containing two commands for installing and using sacrebleu: "pip install sacrebleu" and "cat out.detok | sacrebleu -t wmt18 -l de-en". Underneath is a bulleted summary nothing that (*) scores comparable w/ matrix.statmt.org (*) downloads common references for you (*) info string to save your peers some grief!. At the very bottom is sacrebleu's verbose signature string.
16 The panel contains a caricature of a paper entitled "Even Better MT", by Stella Chercheure. An generic image shows scores going up, and at the bottom of the paper, it reads "With comparable scores!". Text at the bottom reads, "The grief you save may be your own." The panel contains a caricature of a paper entitled "Even Better MT", by Stella Chercheure. An generic image shows scores going up, and at the bottom of the paper, it reads "With comparable scores!". Text at the bottom reads, "The grief you save may be your own."
17 White text on a blackboard-style background reads "Summary" with three bullets: (1) BLEU scores can vary wildly with different parameterizations (2) Papers often do not report all the details (3) This is a needness impediment to science! Further text reads "Two proposed remedies: (1) Use WMT scoring (on detokenized references) so scores can be directly compared across papers (2) Include all the details in the writeup. At the bottom, the text reads, "SacreBLEU can help!" White text on a blackboard-style background reads "Summary" with three bullets: (1) BLEU scores can vary wildly with different parameterizations (2) Papers often do not report all the details (3) This is a needness impediment to science! Further text reads "Two proposed remedies: (1) Use WMT scoring (on detokenized references) so scores can be directly compared across papers (2) Include all the details in the writeup. At the bottom, the text reads, "SacreBLEU can help!"
18 The poster footer contains hand-drawn images noting (a) the author's affiliation (Johns Hopkins University), where the work was done (Amazon), and linking to the Github repository (github.com/mjpost/sacrebleu). The poster footer contains hand-drawn images noting (a) the author's affiliation (Johns Hopkins University), where the work was done (Amazon), and linking to the Github repository (github.com/mjpost/sacrebleu).

©2021 Matt Post