<?xml version="1.0"?>
<News hasArchived="false" page="1" pageCount="1" pageSize="10" timestamp="Mon, 20 Apr 2026 23:43:48 -0400" url="https://beta.my.umbc.edu/groups/csee/posts.xml?tag=malware">
<NewsItem contentIssues="true" id="135903" important="false" status="posted" url="https://beta.my.umbc.edu/groups/csee/posts/135903">
<Title>Talk: MalDICT Benchmark Malware Datasets,12-1 ET Fri. Oct 6</Title>
<Tagline>Data on Behaviors, Platforms, Exploitation, and Packers</Tagline>
<Body>
<![CDATA[
    <div class="html-content"><div><img src="https://www.csee.umbc.edu/wp-content/uploads/sites/659/2023/10/malware.jpg" style="max-width: 100%; height: auto;"></div><div><strong><br></strong></div><div><strong>The UMBC Cyber Defense Lab presents</strong></div><div><br></div><h4>MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers</h4><div><br></div><div><strong>RJ Joyce,  CSEE Department, UMBC</strong></div><div><strong><br></strong></div><div><strong>12-1pm ET, Friday, 6 October 2023, via <a href="https://umbc.webex.com/meet/sherman" rel="nofollow external" class="bo">WebEx</a></strong></div><div><strong><br></strong></div><div><strong>Joint work with Edward Raff, Charles Nicholas, and James Holt</strong></div><div><br></div><div>Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files, and classifying malware by family. Malware, however, can be categorized according to many other types of attributes, and the ability to identify these attributes in newly-emerging malware using machine learning will provide significant value to analysts. In particular, we have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that packed the malware. To obtain labels for training and evaluating ML classifiers on these tasks, we created an antivirus (AV) tagging tool called ClarAVy. ClarAVy's sophisticated AV label parser distinguishes itself from prior AV-based taggers with the ability to parse 882 different AV label formats used by 90 different AV products accurately. We are releasing benchmark datasets for each of these four classification tasks, tagged using ClarAVy and comprising nearly 5.5 million malicious files in total. Our malware behavior dataset includes 75 distinct tags -- nearly seven times more than the only prior benchmark dataset with behavioral tags. To our knowledge, we are the first to release datasets with malware platform, exploitation, and packer tags.</div><div><br></div><div><strong><a href="https://www.linkedin.com/in/rj-joyce/" rel="nofollow external" class="bo">RJ Joyce</a> </strong>(<a href="mailto:joyce8@umbc.edu">joyce8@umbc.edu</a>) is a PhD student at UMBC under the supervision of Dr. Charles Nicholas and Dr. Edward Raff. Presently, RJ works as a data scientist at Booz Allen Hamilton performing research at the intersection of malware analysis and machine learning. RJ is also a visiting lecturer at UMBC and is teaching CMSC-426 Principles of Computer Security course this semester.</div><div><br></div><div>Host: Alan T. Sherman, <a href="mailto:sherman@umbc.edu">sherman@umbc.edu</a>. Support for this event was provided in part by the National Science Foundation under SFS grant DGE-1753681.  The UMBC Cyber Defense Lab meets biweekly Fridays 12-1pm.  All meetings are open to the public.  Upcoming CDL meetings: Oct. 20 (1-2pm) Josh Benaloh (Microsoft), ElectionGuard; Nov. 3, Jason Rheinhart (Sandia), Risk analysis; Nov. 17 (1-2pm) Austin Murdoch (Sixmap); Dec. 1, Enis Golaszewski (UMBC), Automatic cryptographic bindings; Jan. 16-19, 2024, UMBC SFS/CySP Research Study.</div><div><br></div></div>
]]>
</Body>
<Summary>The UMBC Cyber Defense Lab presents     MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers     RJ Joyce,  CSEE Department, UMBC     12-1pm ET, Friday, 6 October...</Summary>
<TrackingUrl>https://beta.my.umbc.edu/api/v0/pixel/news/135903/guest@my.umbc.edu/dae74886999cf28e98b8b45457d022a9/api/pixel</TrackingUrl>
<Tag>antivirus</Tag>
<Tag>cybersecurity</Tag>
<Tag>dataset</Tag>
<Tag>malware</Tag>
<Group token="csee">Computer Science and Electrical Engineering</Group>
<GroupUrl>https://beta.my.umbc.edu/groups/csee</GroupUrl>
<AvatarUrl>https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xsmall.png?1314043393</AvatarUrl>
<AvatarUrl size="original">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/original.png?1314043393</AvatarUrl>
<AvatarUrl size="xxlarge">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xxlarge.png?1314043393</AvatarUrl>
<AvatarUrl size="xlarge">https://assets4-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xlarge.png?1314043393</AvatarUrl>
<AvatarUrl size="large">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/large.png?1314043393</AvatarUrl>
<AvatarUrl size="medium">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/medium.png?1314043393</AvatarUrl>
<AvatarUrl size="small">https://assets2-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/small.png?1314043393</AvatarUrl>
<AvatarUrl size="xsmall">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xsmall.png?1314043393</AvatarUrl>
<AvatarUrl size="xxsmall">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xxsmall.png?1314043393</AvatarUrl>
<Sponsor>UMBC Cyber Defense Lab</Sponsor>
<PawCount>1</PawCount>
<CommentCount>0</CommentCount>
<CommentsAllowed>true</CommentsAllowed>
<PostedAt>Sun, 01 Oct 2023 14:30:22 -0400</PostedAt>
</NewsItem>

<NewsItem contentIssues="false" id="132449" important="false" status="posted" url="https://beta.my.umbc.edu/groups/csee/posts/132449">
<Title>Talk: AVScan2Vec, feature Learning on Antivirus Data, 4/14</Title>
<Tagline>Learns semantics for production-scale Malware Corpora</Tagline>
<Body>
<![CDATA[
    <div class="html-content"><div><strong>The UMBC Cyber Defense Lab presents</strong></div><div><br></div><h5>AVScan2Vec: Feature Learning on Antivirus Scan<br>Data for Production-Scale malware corpora</h5><div><br></div><h5>RJ Joyce</h5><div><strong>UMBC Discovery, Research, and Experimental Analysis of Malware Lab</strong></div><div><br></div><div><strong>12-1pm ET, Friday, 14 April 2023, via <a href="https://umbc.webex.com/meet/sherman" rel="nofollow external" class="bo">WebEx</a></strong></div><div><br></div><div><strong>Joint work with Tirth Patel, Dr. Charles Nicholas, and Dr. Edward Raff</strong></div><div><br></div><div>We introduce <strong>AVScan2Vec</strong>, a sequence-to-sequence autoencoder that can ingest AV scan data, extract semantic meaning, and produce meaningful feature vectors for malware. AVScan2Vec is able to bypass several limitations of prior malware feature-extraction methods, while simultaneously showing noteworthy improvement in several relevant ML tasks. Our implementation of AVScan2Vec in combination with Dynamic Continuous Indexing is especially potent, enabling 10 nearest-neighbor lookup queries in ~16ms on a dataset containing over seven million malware samples. Automation has become increasingly more vital to the field of malware analysis due to manual effort being slow and costly. To improve common tasks such as classification, clustering, and nearest-neighbor lookup of malware, improving malware feature extraction has been a significant research focus. Many approaches rely on features that can only be obtained using prolonged analysis. Due to the enormous quantity and variety of malware, however, applying these feature extraction techniques to a production-size malware corpus would be infeasible. Other, more scalable feature-extraction methods are hindered by static obfuscation, restricted to a single file format, and/or limited in their capacity to identify higher-level malware features. Our work explores the under-recognized potential of antivirus (AV) scan data, which is relatively cheap to acquire and contains rich features.</div><div><br></div><div><a href="https://www.linkedin.com/in/rj-joyce/" rel="nofollow external" class="bo"><strong>RJ Joyce</strong></a> (<a href="mailto:joyce8@umbc.edu">joyce8@umbc.edu</a>) is a PhD student at UMBC under the supervision of Dr. Charles Nicholas and Dr. Edward Raff. Presently, RJ works as a data scientist at Booz Allen Hamilton performing research at the intersection of malware analysis and machine learning. RJ is also a visiting lecturer at UMBC and is teaching the Principles of Computer Security course this semester.</div><div><br></div><div>Host: Alan T. Sherman, <a href="mailto:sherman@umbc.edu">sherman@umbc.edu</a>. Support for this event was provided in part by the National Science Foundation under SFS grant DGE-1753681.  The UMBC Cyber Defense Lab meets biweekly Fridays 12-1pm. All meetings are open to the public.  Upcoming CDL meetings: April 28, Roberto Yus (UMBC), Privacy; May 5, CSEE Research Day (ECS Atrium); May 12, Kia-Won-Tia von Wrex (UMBC), Cyberdawgs</div><div><br></div></div>
]]>
</Body>
<Summary>The UMBC Cyber Defense Lab presents     AVScan2Vec: Feature Learning on Antivirus Scan Data for Production-Scale malware corpora     RJ Joyce  UMBC Discovery, Research, and Experimental Analysis...</Summary>
<TrackingUrl>https://beta.my.umbc.edu/api/v0/pixel/news/132449/guest@my.umbc.edu/0c344d3039da4b8a069a92af2af16780/api/pixel</TrackingUrl>
<Tag>cybersecurity</Tag>
<Tag>embeddings</Tag>
<Tag>malware</Tag>
<Group token="csee">Computer Science and Electrical Engineering</Group>
<GroupUrl>https://beta.my.umbc.edu/groups/csee</GroupUrl>
<AvatarUrl>https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xsmall.png?1314043393</AvatarUrl>
<AvatarUrl size="original">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/original.png?1314043393</AvatarUrl>
<AvatarUrl size="xxlarge">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xxlarge.png?1314043393</AvatarUrl>
<AvatarUrl size="xlarge">https://assets4-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xlarge.png?1314043393</AvatarUrl>
<AvatarUrl size="large">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/large.png?1314043393</AvatarUrl>
<AvatarUrl size="medium">https://assets1-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/medium.png?1314043393</AvatarUrl>
<AvatarUrl size="small">https://assets2-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/small.png?1314043393</AvatarUrl>
<AvatarUrl size="xsmall">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xsmall.png?1314043393</AvatarUrl>
<AvatarUrl size="xxsmall">https://assets3-beta.my.umbc.edu/system/shared/avatars/groups/000/000/099/d117dca133c64bf78a4b7696dd007189/xxsmall.png?1314043393</AvatarUrl>
<Sponsor>UMBC Cyber Defense Lab</Sponsor>
<PawCount>0</PawCount>
<CommentCount>0</CommentCount>
<CommentsAllowed>true</CommentsAllowed>
<PostedAt>Sun, 09 Apr 2023 14:15:09 -0400</PostedAt>
<EditAt>Sun, 09 Apr 2023 14:15:53 -0400</EditAt>
</NewsItem>

</News>
