[{"data":1,"prerenderedAt":84},["ShallowReactive",2],{"article-2021\u002Fneurips-2021":3},{"id":4,"title":5,"authors":6,"body":12,"date":68,"description":69,"extension":70,"image":71,"meta":72,"navigation":73,"path":74,"related_works":75,"seo":79,"sitemap":80,"stem":81,"subtitle":82,"__hash__":83},"articles\u002Farticles\u002F2021\u002Fneurips-2021.md","Dataset presented at NEURIPS 2021",[7],{"name":8,"blurb":9,"src":10,"link":11},"Prof. Dr. Dennis Riehle","Dennis Riehle leads the EoTlab at the University of Koblenz. His current research interests include Data Science, Internet-of-Things and Smart Sensors.","\u002Fmedia\u002Fpeople\u002Fdennis-riehle\u002Fportrait.jpg","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fdennismriehle",{"type":13,"value":14,"toc":62},"minimark",[15,42,47,50,54],[16,17,18,19,26,27,31,32,35,36,41],"p",{},"Machine learning algorithms require large datasets of labeled for training. In practice, such datasets are often unavailable and difficult to create. As part of the ",[20,21,25],"a",{"href":22,"rel":23},"https:\u002F\u002Fwww.moderat.nrw\u002F",[24],"nofollow","MODERAT! project",", we have created two labeled datasets named ",[28,29,30],"code",{},"RP-Mod"," and ",[28,33,34],{},"RP-Crowd",". These datasets help in training models for classifying user comments at ",[20,37,40],{"href":38,"rel":39},"https:\u002F\u002Frp-online.de\u002F",[24],"Rheinische Post",", a German news outlet. The dataset and corresponding analyses have been presented at this year's NEURIPS, which was held virtually.",[43,44,46],"h2",{"id":45},"paper-abstract","Paper Abstract",[16,48,49],{},"Abuse and hate are penetrating social media and many comment sections of news media companies. These platform providers invest considerable efforts to moderate user-generated contributions to prevent losing readers who get appalled by inappropriate texts. This is further enforced by legislative actions, which make non-clearance of these comments a punishable action. While (semi-)automated solutions using Natural Language Processing and advanced Machine Learning techniques are getting increasingly sophisticated, the domain of abusive language detection still struggles as large non-English and well-curated datasets are scarce or not publicly available. With this work, we publish and analyse the largest annotated German abusive language comment datasets to date. In contrast to existing datasets, we achieve a high labelling standard by conducting a thorough crowd-based annotation study that complements professional moderators’ decisions, which are also included in the dataset. We compare and cross-evaluate the performance of baseline algorithms and state-of-the-art transformer-based language models, which are fine-tuned on our datasets and an existing alternative, showing the usefulness for the community.",[43,51,53],{"id":52},"dataset","Dataset",[16,55,56,57,61],{},"The aforementioned dataset is publicly available from Zenodo: ",[20,58,59],{"href":59,"rel":60},"https:\u002F\u002Fzenodo.org\u002Frecord\u002F5291339",[24],".",{"title":63,"searchDepth":64,"depth":64,"links":65},"",2,[66,67],{"id":45,"depth":64,"text":46},{"id":52,"depth":64,"text":53},"2021-12-07","The researchers working on the MODERAT! project presented their dataset RP-Mod and RP-Crowd at the NEURIPS 2021 conference.","md","\u002Fmedia\u002Farticles\u002F2021\u002Fneurips-2021\u002Fgathertown-neurips-2021.jpg",{},true,"\u002Farticles\u002F2021\u002Fneurips-2021",[76],{"name":77,"link":78},"Assenmacher, D., Niemann, M., Müller, K., Seiler, M. V., Riehle, D. M., & Trautmann, H. (2021). RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1 (NeurIPS Datasets and Benchmarks 2021), Virtual Event, 1–14.","https:\u002F\u002Fdatasets-benchmarks-proceedings.neurips.cc\u002Fpaper\u002F2021\u002Ffile\u002Fc9e1074f5b3f9fc8ea15d152add07294-Paper-round2.pdf",{"title":5,"description":69},{"loc":74},"articles\u002F2021\u002Fneurips-2021",null,"k9QXLEuyXokv1SDFIRslVe200Z_VSN_VL6LWkNwEURE",1782386010139]