Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine
Learning
Protect against
tomorrow’s
threats
HackNTU
TrendMicro Datasets
趨勢科技 R&D 林瑞豪
July, 2017
Machine
Learning
Protect against
tomorrow’s
threatsDatasets
Malware Dataset(惡意程式資料)
 18,000
Spam Dataset(垃圾郵件資料)
 200,...
Machine
Learning
Protect against
tomorrow’s
threats
MALWARE DATASET
2017/7/16 3
Machine
Learning
Protect against
tomorrow’s
threatsMalware Dataset
Sample volume: 18,000 PE malware
Sample size: 48 MB
...
Machine
Learning
Protect against
tomorrow’s
threatsFile Information
Each folder contains the information of a
PE file
in...
Machine
Learning
Protect against
tomorrow’s
threatsData File Example: info
62017/7/16
$ cat info
{
"DllCharacteristics": "...
Machine
Learning
Protect against
tomorrow’s
threatsFields of info
72017/7/16
FileSize File size
FileEntropy Entropy of who...
Machine
Learning
Protect against
tomorrow’s
threatsData File Example: sections
82017/7/16
$ cat sections
[
{
"Index": 0,
"...
Machine
Learning
Protect against
tomorrow’s
threatsFields in sections
92017/7/16
Name Section name
VirtualAddress Section ...
Machine
Learning
Protect against
tomorrow’s
threatsData file Example: import
102017/7/16
$ cat import
cygwin1.dll __cxa_at...
Machine
Learning
Protect against
tomorrow’s
threatsMalware Dataset Example
112017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsMalware Dataset Application
One-class malware identification
Unsuperv...
Machine
Learning
Protect against
tomorrow’s
threats
SPAM DATASET
2017/7/16 13
Machine
Learning
Protect against
tomorrow’s
threatsSpam Dataset
Sample volume: 200,000
Sample size: 1.75GB
Received in ...
Machine
Learning
Protect against
tomorrow’s
threatsRead Spam Dataset
Python standard library: email
 https://docs.python...
Machine
Learning
Protect against
tomorrow’s
threatsSpam Mail Example
162017/7/16
Message-ID: <3210276217-
URSBFSAWVWJITSNS...
Machine
Learning
Protect against
tomorrow’s
threatsSpam Dataset Application
Emergent spam topic
Spam identification
1720...
Machine
Learning
Protect against
tomorrow’s
threats
NETWORK IPS DATASET
2017/7/16 18
Machine
Learning
Protect against
tomorrow’s
threatsNetwork IPS Dataset
Attack behavior log of home router
Sample volume:...
Machine
Learning
Protect against
tomorrow’s
threatsDevice Info Fields
device_dev_name
 Apple iPad Mini, Google Nexus 5, ...
Machine
Learning
Protect against
tomorrow’s
threatsEvent Info Fields
 event_protocol_id
 Assigned Internet Protocol Numb...
Machine
Learning
Protect against
tomorrow’s
threatsEvent Rule Fields
 event_rule_category
 Access Control, Web Attack, B...
Machine
Learning
Protect against
tomorrow’s
threatsNetwork IPS Example
Apple iPhone 6 Plus,,c17bdadda83e4200d7ed41b7e6cf5b...
Machine
Learning
Protect against
tomorrow’s
threatsNetwork IPS Dataset Application
Discover network attacking pattern
An...
Machine
Learning
Protect against
tomorrow’s
threats
T-BRAIN
2017/7/16 25
Machine
Learning
Protect against
tomorrow’s
threatsT-Brain
 https://tbrain.nchc.org.tw/
 Dataset
 Brain
 xgboost, Kera...
Machine
Learning
Protect against
tomorrow’s
threatsLogin
272017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsDataset on T-Brain
282017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsDownload Dataset
292017/7/16
Password: TBrain
Machine
Learning
Protect against
tomorrow’s
threatsNew script for dataset
302017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsJupyter Notebook
312017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsHide the script
322017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsSample scripts
332017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsSample script
342017/7/16
Machine
Learning
Protect against
tomorrow’s
threatsT-Brain
352017/7/16
Machine
Learning
Protect against
tomorrow’s
threats
THANK YOU
2017/7/16 36
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

【HITCON Hackathon 2017】 TrendMicro Datasets

Download to read offline

HITCON Hackathon 2017
TrendMicro Datasets

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

【HITCON Hackathon 2017】 TrendMicro Datasets

  1. 1. Machine Learning Protect against tomorrow’s threats HackNTU TrendMicro Datasets 趨勢科技 R&D 林瑞豪 July, 2017
  2. 2. Machine Learning Protect against tomorrow’s threatsDatasets Malware Dataset(惡意程式資料)  18,000 Spam Dataset(垃圾郵件資料)  200,000 Network IPS Dataset(網路入侵防禦系統事 件記錄)  1,000,000 2017/7/16 2
  3. 3. Machine Learning Protect against tomorrow’s threats MALWARE DATASET 2017/7/16 3
  4. 4. Machine Learning Protect against tomorrow’s threatsMalware Dataset Sample volume: 18,000 PE malware Sample size: 48 MB Collected between Aug. 2015~Jan. 2016 Data category:  PE header info: JSON format  section table: JSON format  import table: TSV 2017/7/16 4
  5. 5. Machine Learning Protect against tomorrow’s threatsFile Information Each folder contains the information of a PE file info: File Header & Resource Information sections: Section Table import: Import Table 52017/7/16
  6. 6. Machine Learning Protect against tomorrow’s threatsData File Example: info 62017/7/16 $ cat info { "DllCharacteristics": "0x8000", "TimeDateStamp": 538976288, "BaseOfCode": "0x1000", "FileEntropy": 5.3841451825025688, "ImageVersion": "1.0", "LoaderFlags": "0x0", "SizeOfStackCommit": 4096, "SizeOfUninitializedData": 4608, "SizeOfHeapReserve": 1048576, "LinkerVersion": "2.25", "SizeOfHeapCommit": 4096, "SizeOfStackReserve": 2097152, "OperatingSystemVersion": "4.0", "SizeOfHeaders": 1024, "Subsystem": "0x3", "NumberOfSections": 8, "FileAlignment": "0x200", "SubsystemVersion": "4.0", "BaseOfData": "0x3000", "SizeOfOptionalHeader": 224, "AddressOfEntryPoint": "0x1000", "SectionAlignment": "0x1000", "SizeOfCode": 7168, "ImageBase": "0x400000", "SizeOfInitializedData": 14848, "NumberOfSymbols": 0, "SizeOfImage": 45056, "NumberOfRvaAndSizes": 16, "FileSize": 15886, "Characteristics": "0x32f" } PE header info in JSON format
  7. 7. Machine Learning Protect against tomorrow’s threatsFields of info 72017/7/16 FileSize File size FileEntropy Entropy of whole file AddressOfEntryPoint Entry point address BaseOfCode Beginning of code section BaseOfData Beginning of data section ImageBase Preferred address space in memory TimeDateStamp Low 32 bits of the time stamp of the image NumberOfSections Number of sections NumberOfSymbols Number of symbols in symbol table NumberOfRvaAndSizes Number of directory entries Characteristics characteristics of the image DllCharacteristics DLL characteristics SizeOfOptionalHeader Size of optional headers SizeOfCode Size of code sections SizeOfInitializedData Size of initialized data sections SizeOfUninitializedData Size of uninitialized data sections SizeOfImage Size of the image SizeOfHeaders Size of header sections SizeOfStackReserve Reserved size for stack SizeOfStackCommit Committed size for stack SizeOfHeapReserve Reserved size for heap SizeOfHeapCommit Committed size for heap FileAlignment Section alignment in file SectionAlignment Section alignment in memory LoaderFlags Subsystem Subsystem required to run this image SubsystemVersion Version of subsystem LinkerVersion Version of linker ImageVersion Version of image OperatingSystemVersion Version of OS https://msdn.microsoft.com/en-us/library/windows/desktop/ms680339%28v=vs.85%29.aspx CompanyName ProductName LegalCopyright FileDescription FileVersion ProductVersion
  8. 8. Machine Learning Protect against tomorrow’s threatsData File Example: sections 82017/7/16 $ cat sections [ { "Index": 0, "Name": ".textu0000u0000u0000", "Entropy": 5.8200022539922749, "VirtualSize": 6676, "Flags": "R-X CODE", "RawSize": 7168, "VirtualAddress": "0x1000" }, { "Index": 1, "Name": ".datau0000u0000u0000", "Entropy": 0.057256602241154482, "VirtualSize": 68, "Flags": "RW- IDATA", "RawSize": 512, "VirtualAddress": "0x3000" }, { "Index": 2, "Name": ".rdatau0000u0000", "Entropy": 5.043049159297726, "VirtualSize": 1824, "Flags": "R-- IDATA", "RawSize": 2048, "VirtualAddress": "0x4000" }, …skip… { "Index": 7, "Name": ".rsrcu0000u0000u0000", "Entropy": 4.7784771683762584, "VirtualSize": 1256, "Flags": "RW- IDATA", "RawSize": 1536, "VirtualAddress": "0xa000" } ] Section table in JSON format
  9. 9. Machine Learning Protect against tomorrow’s threatsFields in sections 92017/7/16 Name Section name VirtualAddress Section virtual address VirtualSize Section virtual size RawSize Section raw size Entropy Section entropy Flags Section RWX flags https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/ms809762.aspx?ppud=4
  10. 10. Machine Learning Protect against tomorrow’s threatsData file Example: import 102017/7/16 $ cat import cygwin1.dll __cxa_atexit cygwin1.dll __getreent cygwin1.dll __main cygwin1.dll _dll_crt0@0 cygwin1.dll _fopen64 cygwin1.dll _impure_ptr cygwin1.dll atoi cygwin1.dll callo ccygwin1.dll cygwin_detach_dll cygwin1.dll cygwin_internal cygwin1.dll dll_dllcrt0 cygwin1.dll exit cygwin1.dll fclose cygwin1.dll fflush cygwin1.dll fopen cygwin1.dll fprintf cygwin1.dll free cygwin1.dll fwrite cygwin1.dll getc cygwin1.dll malloc cygwin1.dll posix_memalign cygwin1.dll printf cygwin1.dll putc cygwin1.dll puts cygwin1.dll realloc cygwin1.dll vfprintf KERNEL32.dll GetModuleHandleA KERNEL32.dll GetProcAddress DLL name, function
  11. 11. Machine Learning Protect against tomorrow’s threatsMalware Dataset Example 112017/7/16
  12. 12. Machine Learning Protect against tomorrow’s threatsMalware Dataset Application One-class malware identification Unsupervised malware classification 122017/7/16 https://tbrain.nchc.org.tw/index.php?r=script%2Fview&title_id=6
  13. 13. Machine Learning Protect against tomorrow’s threats SPAM DATASET 2017/7/16 13
  14. 14. Machine Learning Protect against tomorrow’s threatsSpam Dataset Sample volume: 200,000 Sample size: 1.75GB Received in July, 2016 Format: EML Field categories  From address  Subject  Date  Body (MIME) 142017/7/16
  15. 15. Machine Learning Protect against tomorrow’s threatsRead Spam Dataset Python standard library: email  https://docs.python.org/3/library/email- examples.html 152017/7/16
  16. 16. Machine Learning Protect against tomorrow’s threatsSpam Mail Example 162017/7/16 Message-ID: <3210276217- URSBFSAWVWJITSNSTQBQAGZC@fauudpop.chamblee.default.com> From: "Alisa Sharpe" <Sharpe_Alisa@chamblee.default.com> Subject: Re: Enjoy envious stares when you wear our watches To: <removed> Date: Tue, 12 Jul 2016 06:49:09 +0600 Mime-Version: 1.0 Content-Type: text/html; Content-Transfer-Encoding: 7Bit Like a certain brand of watches, but never wanted to pay the price? Solve your dilemma now<br> <a href="hxxp://889457.finewatch2016.ru#FOlmBCUdEp8EJhjsUpA9GmlqFV4 g"style="color:#0B7303;">HOT OFFER!</a>
  17. 17. Machine Learning Protect against tomorrow’s threatsSpam Dataset Application Emergent spam topic Spam identification 172017/7/16
  18. 18. Machine Learning Protect against tomorrow’s threats NETWORK IPS DATASET 2017/7/16 18
  19. 19. Machine Learning Protect against tomorrow’s threatsNetwork IPS Dataset Attack behavior log of home router Sample volume: 1,000,000 Sample size: 250MB Format: CSV Field categories  Device info  Event info  Router IP (Obfuscated) 192017/7/16
  20. 20. Machine Learning Protect against tomorrow’s threatsDevice Info Fields device_dev_name  Apple iPad Mini, Google Nexus 5, Sony PlayStation 4, Synology NAS…etc. device_os_name  Apple iOS, Android, Linux, Wii…etc. device_type_name  Desktop/Laptop, NAS, DVR, IP Camera…etc. device_vendor_name device_hashed_mac 202017/7/16
  21. 21. Machine Learning Protect against tomorrow’s threatsEvent Info Fields  event_protocol_id  Assigned Internet Protocol Numbers by IANA  1:ICMP, 6:TCP, 17:UDP…etc.  https://www.iana.org/assignments/protocol- numbers/protocol-numbers.xhtml  event_self_ipv4  Usually private IP or Obfuscated public IP  event_time  event_flow_outbound_or_inbound  event_role_device_or_router  event_role_server_or_client 212017/7/16
  22. 22. Machine Learning Protect against tomorrow’s threatsEvent Rule Fields  event_rule_category  Access Control, Web Attack, Buffer Overflow  DoS/DDoS, BotNet…etc.  event_rule_name  EXPLOIT Bitcoin/LiteCoin/Dogecoin Mining Activity - 1  WEB Cross-site Scripting (document.cookie) attempt  SHELLCODE NOP Sled…etc.  event_rule_reference  CVE-2005-0211, CVE-2011-2133, CVE-2014- 4116…etc.  event_rule_severity 222017/7/16
  23. 23. Machine Learning Protect against tomorrow’s threatsNetwork IPS Example Apple iPhone 6 Plus,,c17bdadda83e4200d7ed41b7e6cf5b43c62e725c,Apple iOS,Smartphone,"Apple Inc.",6,outbound,device,client,Web Attack,1055396,WEB Cross-site Scripting -9,CVE-2011-2260;CVE-2011- 2710;CVE-2012-0017;CVE-2012-0551;CVE-2012-0719;CVE-2012-1859;CVE- 2012-4939;CVE-2013-5013;CVE-2014-2092;CVE-2013-7051;CVE-2014- 1754;CVE-2014-6325;CVE-2014-6535;CVE-2014-2856;CVE-2014-5360;CVE- 2016-0712;CVE-2016-3212;CVE-2016-6837,5,192.168.1.238,12/28/2016 2:21:04 AM,165.170.147.184 Synology NAS,,3a85f6a9e776fb803e08ed991fe348b984001bfd,Linux,NAS,Synology Inc.,17,outbound,device,client,DoS/DDoS,1130172,DNS DNS Amplification Attacks -1,TA13-088A;CVE-2013-unknown,4,192.168.0.5,12/16/2016 1:39:06 AM,166.29.195.94 Sony PlayStation 3,Game Console,185b7ce02ec4d07df4b48a8d6f94fdcde8b36492,XMB,Game Console,Sony Corporation,6,outbound,device,client,Web Attack,1130054,WEB Directory Traversal -5.a,CVE-2014-1619,4,192.168.1.133,12/23/2016 12:23:17 AM,163.51.28.15 232017/7/16
  24. 24. Machine Learning Protect against tomorrow’s threatsNetwork IPS Dataset Application Discover network attacking pattern Anomaly behavior detection 242017/7/16
  25. 25. Machine Learning Protect against tomorrow’s threats T-BRAIN 2017/7/16 25
  26. 26. Machine Learning Protect against tomorrow’s threatsT-Brain  https://tbrain.nchc.org.tw/  Dataset  Brain  xgboost, Keras-theano, Keras-tensorflow  Pandas, sklearn  Community 262017/7/16
  27. 27. Machine Learning Protect against tomorrow’s threatsLogin 272017/7/16
  28. 28. Machine Learning Protect against tomorrow’s threatsDataset on T-Brain 282017/7/16
  29. 29. Machine Learning Protect against tomorrow’s threatsDownload Dataset 292017/7/16 Password: TBrain
  30. 30. Machine Learning Protect against tomorrow’s threatsNew script for dataset 302017/7/16
  31. 31. Machine Learning Protect against tomorrow’s threatsJupyter Notebook 312017/7/16
  32. 32. Machine Learning Protect against tomorrow’s threatsHide the script 322017/7/16
  33. 33. Machine Learning Protect against tomorrow’s threatsSample scripts 332017/7/16
  34. 34. Machine Learning Protect against tomorrow’s threatsSample script 342017/7/16
  35. 35. Machine Learning Protect against tomorrow’s threatsT-Brain 352017/7/16
  36. 36. Machine Learning Protect against tomorrow’s threats THANK YOU 2017/7/16 36
  • IanLi1

    Jun. 27, 2019

HITCON Hackathon 2017 TrendMicro Datasets

Views

Total views

568

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

12

Shares

0

Comments

0

Likes

1

×