01 Åä¾°
¶ñÒâ´úÂ루Malicious/Unwanted Code£©·ºÖ¸ËùÓлá¶ÔÍøÂç»òϵͳ±¬·¢Íþв»òDZÔÚÍþвµÄÅÌËã»ú´úÂ룬Ôì³ÉÄ¿µÄϵͳÐÅϢй¶¡¢×ÊÔ´ÀÄÓá¢ÆÆËðϵͳÍêÕûÐÔ¼°¿ÉÓÃÐÔ¡¢Î¥½»¶ñµÄϵͳÇå¾²Õ½ÂÔµÈΣº¦¡£
½üÄêÀ´£¬ËäÈ»¶ñÒâ´úÂ뱬·¢Ê½ÔöÌí£¬µ«Çå¾²Ñо¿Ö°Ô±·¢Ã÷£¬Ðí¶àÐÂÐͶñÒâ´úÂë¶¼ÊÇÒÑÓжñÒâ´úÂëµÄ±äÖÖ£¬ÕâЩ±äÖÖÐí¶àÊǺڿÍʹÓñäÐΡ¢¼Ó¿Ç¡¢¶à̬¡¢´úÂëÈÅÂÒµÈÊÖÒÕÖÆÔìµÄ£¬¿ÉÒÔÌÓ±ÜÊÇ·ÇÃûµ¥¡¢ÌØÕ÷ÂëÆ¥ÅäµÈ¹Å°åµÄ¼ì²âÊÖÒÕ[1][10]¡£ÏÖʵÉÏ£¬ÕâЩ±äÖÖÕ÷Ïó·´Ó¦Á˶ñÒâ´úÂëÖ®¼äµÄͬԴ¹ØÏµ¡£ÀýÈ磬2017Äê·ºÆðµÄWannaCryÏÖʵÉÏÓëÒÑÖª²¡¶¾Wcy¾ßÓÐͬԴ¹ØÏµ£¬µ«¸Ã²¡¶¾ÈƹýÁËÖÖÖÖ¼ì²â¹¤¾ß£¬Ôì³ÉÁËÑÏÖØµÄ¾¼ÃËðʧ¡£Òò´Ë£¬·¢Ã÷ͬԴ¹ØÏµÖð½¥³ÉÎªÍøÂçÇå¾²ÁìÓò¹Ø×¢½¹µã¡£
¶ñÒâ´úÂëͬԴÆÊÎö£¨Homology Analysis£©ÊÇָͨ¹ý¶ñÒâ´úÂëÄÚÍâ²¿ÌØÕ÷ÒÔ¼°ÌìÉúºÍÈö²¥µÄ¼ÍÂÉ£¬ÆÊÎö¶ñÒâ´úÂëÖ®¼äÑÜÉúµÄ¹ØÁªÐÔ¡£Í¬Ô´¹ØÏµÓÐÐí¶àÖÖ£¬°üÀ¨£º¼Ò×åͬԴ¡¢¿ª·¢ÕßͬԴ¡¢ÀàÐÍͬԴ»òÕß¹¥»÷ԴͬԴµÈ¡£±¾ÎÄÖ÷Òª¹Ø×¢¶ñÒâ´úÂëÀàÐÍͬԴÆÊÎöÎÊÌâ¡£»ùÓÚ¶ñÒâ´úÂëÀàÐÍͬԴÆÊÎö£¬¿ÉÒÔ×ÊÖú¼ì²â¡¢Ô¤¾¯¶ñÒâ´úÂë£¬ÖÆ¶©Ó¦¼±ÏìÓ¦¼Æ»®ÒÔ¼°Õ¹ÍûÊÂÎñÉú³¤Ç÷ÊÆ¡£
±¾ÎÄÊ×ÏÈÏÈÈÝÁËÏà¹Ø»ù´¡ÖªÊ¶£¬¼òÆÓ»ØÊ×ÁËͬԴÆÊÎöÆÊÎöÊÖÒÕÏÖ×´£¬È»ºóÏÈÈÝÁË»ùÓÚͼÏñ·ÖÀàµÄͬԴÆÊÎöÊÖÒռƻ®Éè¼Æ£¬×îºóͨ¹ýÏêϸʵÑéÑéÖ¤ÁË»ùÓÚͼÏñ·ÖÀàµÄͬԴÆÊÎöÊÖÒÕµÄÓÐÓÃÐÔ¡£
02 »ù´¡ÖªÊ¶
2.1 ÀàÐÍͬԴ
¶ñÒâ´úÂëÀàÐͶàÖÖ¶àÑù£¬°üÀ¨ÅÌËã»ú²¡¶¾¡¢È䳿¡¢Ä¾Âí³ÌÐò¡¢ºóÃųÌÐò¡¢Âß¼Õ¨µ¯µÈ¡£Ã¿Ò»Àà¶ñÒâ´úÂëÍùÍù»áÔÚÒÔÏÂÈý¸ö·½Ãæ±£´æÏàËÆÐÔ¡£
£¨1£©Òªº¦´úÂë¶Î
ָΪÁËʵÏÖijһ¶ñÒ⹦Ч£¬Í¬Ô´¶ñÒâ´úÂëÔÚÒªº¦´úÂëÆ¬¶Ï£¨ÈçDLL×¢Èë¡¢RPC·þÎñ£©¾ßÓÐÏàËÆÐÔ£¬ÕâЩÏàËÆ´úÂëÆ¬¶ÏÒ²±»³ÆÎª»ùÒòÂë¡£
£¨2£©ÏµÍ³º¯ÊýŲÓÃ
¶ñÒâÐÐΪµÄ²Ù×÷ͨ³£ÒÀÀµ¶Ô²Ù×÷ϵͳº¯ÊýµÄŲÓã¬Í¬Ô´¶ñÒâ´úÂëŲÓõĺ¯ÊýÃû³Æ¡¢Æµ´Î¡¢Ë³ÐòµÈ¿ÉÄܱ£´æÀàËÆ¡£
£¨3£©¹¦Ð§ÐÐΪ
ÿһÀà¶ñÒâ´úÂë¶¼ÓÐÕë¶ÔÐÔµÄÆÆËðÐÐΪ£¬ÈçÀÕË÷Èí¼þ»á¶ÁдÓû§Êý¾Ý¡¢Ô¶³Ì¿ØÖÆÄ¾Âí»áÉó²éÆÁÄ»»òÉãÏñÍ·£¬¹¦Ð§ÐÐΪµÄÏàËÆÐÔ·´Ó¦ÔÚÎļþ¡¢Àú³Ì¡¢ÍøÂç¼°×¢²á±íµÈ·½Ãæ¡£
2.2 ͼÏñ»¯
ͼÏñ»¯¼´¶ñÒâ´úÂë×Ö½ÚÔ¼ÄÚÈݵĿÉÊÓ»¯Õ¹Ê¾£¬¶øÁ÷ÄÚÈݰüÀ¨ÁËÒ»¸ö×é×°¶ñÒâ´úÂëÍêÕûµÄÐÅÏ¢£¬ÀýÈçPEÃûÌöñÒâ´úÂëµÄÍ·²¿¡¢Êý¾Ý½Ú¡¢´úÂë½Ú¡¢Î²²¿µÈ¡£
ºÚ¿Í¾³£Ê¹ÓÿªÔ´µÄ¶ñÒâ´úÂëÆ¬¶Ï£¬¾ÓɽøÒ»²½¿ª·¢¡¢°ü×°Ö®ºó£¬ÐγɶñÒâ´úÂë±äÖÖ£»Ïàͬ¹¦Ð§»òÕßͳһ¼Ò×åµÄ¶ñÒâ´úÂ룬Ҳ»á¹²Ïí´úÂëÆ¬¶Ï¡£Òò´Ë£¬¹²ÏíÆ¬¶ÏÌåÏÖΪÏàËÆµÄµÄÁ÷ÄÚÈÝ£¬½ø¶øÓ³ÉäΪͼÏñÖÐÏàËÆµÄÎÆÀí¡£
ͼ1£¬2չʾijЩÁ÷Ã¥Èí¼þ£¨Application£©Á½¸ö¼Ò×åInstallMonsterºÍHacktoolͼÏñ»¯Ð§¹û¡£¿ÉÒÔ¿´µ½InstallMonsterºÍHacktoolÀàµÄʾÀýÑù±¾¼´±ãÀ´×Ô²î±ðµÄ¼Ò×å£¬ÎÆÀíÉÏÈ´¾ßÓÐÏÔ×ÅÏàËÆÐÔ¡£

ͼ1. Application-InstallMonster

ͼ2. Application-Hacktool
3 ÏÖ×´ÆÊÎö
Óë¶ñÒâ´úÂë¼ì²âÊÖÒÕÏàͬ£¬Í¬Ô´ÆÊÎöËù½ÓÄɵÄÌØÕ÷Ò²·ÖΪ¾²Ì¬ÌØÕ÷ºÍ¶¯Ì¬ÌØÕ÷¡£¾²Ì¬ÌØÕ÷°üÀ¨£º¶ñÒâ´úÂëµÄ×é×°½á¹¹ÌØÕ÷¡¢APIÐòÁÐÌØÕ÷¡¢´úÂëÓïÒå¡¢¶þ½øÖÆÄÚÈÝÌØÕ÷µÈ[1]¡£¶¯Ì¬ÌØÕ÷ͨ³£Îª¿ØÖÆÁ÷³Ìͼ¡¢¶ÁÈ¡¼°Ð޸ĵÄÏà¹Ø×ÊÔ´¹¤¾ßÌØÕ÷¡¢API¶¯Ì¬Å²Ó᣻ñÈ¡ÌØÕ÷Ö®ºó£¬¿ÉÒÔ½ÓÄɹØÁªÆÊÎö¡¢»úеѧϰ·ÖÀàÒÔ¼°Í¼ÆÊÎöµÈÒªÁìÍê³ÉËÝÔ´¡£
¹ØÁªÆÊÎö¼´ÅÌËã¶ñÒâ´úÂëÌØÕ÷µÄÏàËÆ¶È£¬ÀýÈçJaccardϵÊý[14,15]¡¢º£Ã÷¾àÀë¡¢ÓàÏÒ¾àÀëµÈ£¬Æ¾Ö¤ÏàËÆ¶ÈÅж϶ñÒâ´úÂëÖ®¼äÊÇ·ñ±£´æ¹ØÁª¡£»ùÓÚÏàËÆ¶ÈÆÊÎö£¬ÒÔ¶ñÒâ´úÂëΪ½Úµã¡¢ÏàËÆ¶ÈΪ±ß£¬¿ÉÒÔ¹¹½¨¶ñÒâ´úÂë¹ØÏµÍøÂ磬Ö÷ÒªÓÃÓÚ×åȺ¿ÉÊÓ»¯£¬½øÒ»²½ËÝÔ´¡¢ÍÚ¾ò´ó×ÚÀëÉ¢¶ñÒâ´úÂëÖ®¼äµÄ¹ØÁª[18]¡£SVM¡¢XGBoost[14]¡¢DBScan[10,15]¡¢Ä£ºý¹þÏ£[14]µÈ³£¼ûµÄ·ÖÀàËã·¨£¬¶¼ÔÚ¶ñÒâ´úÂëͬԴÆÊÎöÖÐÓÐÏìÓ¦µÄÑо¿ÓëÓ¦Óá£
ͼÆÊÎö×÷ÓÃÓÚ¶ñÒâ´úÂëµÄ¿ØÖÆÁ÷³Ìͼ£¬ÌáÈ¡ÊÕÖ§¶È£¨Out/In degree£©¡¢ÖÐÐÄÖÐÐÄÐÔ£¨Betweenness centrality£©¡¢Èº¼¯ÏµÊý£¨Clustering coefficient£©µÈͼ½á¹¹»³±§[16]£¬Í¨¹ýÅÌËãÕâЩ»³±§µÄÏàËÆÐÔ£¬Åж϶ñÒâ´úÂëÊÇ·ñ±£´æ¹ØÁª¡£Ê¹ÓÃͼÆÊÎö»¹¿ÉÒÔÔÚ×åȺÖÐÍÚ¾òÏàËÆ×ÓͼÐγÉ×åȺ»ùÒò£¬Í¨¹ý»ùÒò±ÈÕÕÅж϶ñÒâ´úÂëÊÇ·ñͬԴ[12]¡£ÕÔµÈÈËÔËÓÃÁËͼ¾í»ýÍøÂ磨Graph Convolutional Network£¬GCN£©ÊÖÒÕ£¬¶Ô¶ñÒâ´úÂëµÄAPIŲÓÃͼ¾ÙÐзÖÀ࣬´Ó¶øÆÊÎöͬԴÐÔ[17]¡£
2011Ä꣬NatarajµÈÈËÌá³ö½«¶ñÒâ´úÂëµÄÁ÷ÄÚÈÝת»»³É»Ò¶ÈͼÏñ£¬È»ºóÌáÈ¡GIST¡¢¾Ö²¿¿Õ¼äƽ¾ùÖµµÈÌØÕ÷£¬ÍŽáKNNËã·¨¶Ô¶ñÒâ´úÂë¾ÙÐзÖÀà[3]¡£Ëæºó£¬·ºÆðÁËһЩÑо¿ÑÓÐø¸Ã˼Ð÷£¬ºÃ±È½«×Ö½ÚìØ[4]¡¢APIŲÓÃ[5]¡¢opcode¹þÏ£[7]µÈת»»ÎªÍ¼Ïñ£¬¾í»ýÉñ¾ÍøÂ磨Conventional Neural Network£¬CNN£©[7]¡¢ÊÇ·ÇÆÚÓ°Ïó£¨Long-short Term Memory£¬LSTM£©ÍøÂç[8,9]µÈÉî¶ÈѧϰҪÁìÏà¼Ì±»Ó¦ÓÃÓÚ¶ñÒâ´úÂëͬԴÆÊÎö¡£
½ÓÄÉͼÏñ·ÖÀàÊÖÒÕµÄͬԴÆÊÎöÒªÁ죬²»ÒªÇóÆÊÎöÖ°Ô±¾ß±¸ÄæÏò¹¤³Ìרҵ֪ʶ£¬²¢ÇÒÎÞÐèÈ˹¤ÌáÈ¡ÌØÕ÷£¬Òò´ËÓ¦ÓÃÆðÀ´½ÏÁ¿ÎÞа¡£µÃÒæÓÚÅÌËã»úÊÓ¾õÁìÓòµÄ¿ìËÙÉú³¤£¬ÕâÖÖÒªÁìÒ²Äܹ»È¡µÃ½Ï¸ßµÄ׼ȷÐÔ¡£ÒÔÏÂÖØµãÏÈÈݸÃÀàÊÖÒÕ¡£
4 ¼Æ»®Éè¼Æ
»ùÓÚͼÏñ·ÖÀàµÄͬԴÆÊÎö¼Æ»®ÖУ¬µä·¶µÄʵÏÖÊÖÒÕ¾ÍÊÇ»ùÓÚCNNµÄ¶ñÒâ´úÂëͬԴÆÊÎöÊÖÒռƻ®£¬Ö÷Òª°üÀ¨ÒÔϲ¿·Ö×é³É£º
£¨1£©Êý¾Ý¼¯¹¹½¨£ºÈ·¶¨¶ñÒâ´úÂëµÄÖֱ𻮷ַ½·¨£¬ÍøÂçÑù±¾²¢±ê×¢ÖÖ±ð£¬×÷ΪѵÁ·Êý¾Ý¡£±¾ÎÄÒÔÀàÐÍ×÷ΪÖֱ𻮷ַ½·¨¡£
£¨2£©Í¼Ïñ»¯´¦Öóͷ££º½«ÑµÁ·Ñù±¾×ª»¯ÎªÍ¼Ïñ£¬×÷ΪCNNÍøÂçµÄÊäÈë¡£
£¨3£©CNNÍøÂç¹¹½¨£º¹¹½¨³öCNNÉñ¾ÍøÂç½á¹¹£¨ÈçVGGNet¡¢GoogleNet¡¢ResNetµÈ£©¡£
£¨4£©Ä£×ÓѵÁ·£º½«ÑµÁ·Êý¾ÝÊäÈëCNNÍøÂç¾ÙÐÐѵÁ·£¬»ñµÃ·ÖÀàÄ£×Ó¡£
£¨5£©Ä£×ÓÓ¦Ó㺽«´ý²âÑù±¾Í¼Ïñ»¯£¬ÊäÈë·ÖÀàÄ£×Ó£¬Æ¾Ö¤Êä³öÖÖ±ðÅжÏËùÊôÖÖ±ð¡£Ó¦ÓÃÁ÷³ÌÈçͼ3Ëùʾ£º

ͼ3. »ùÓÚCNNͼÏñ·ÖÀàµÄ¶ñÒâ´úÂëͬԴÆÊÎöÁ÷³Ì
5 ʵÑéÆÊÎö
±¾ÊµÑéÖÐÍøÂçÁË7ÖÖÀàÐ͵ĶñÒâ´úÂëÑù±¾£¬Ïêϸ¼û±í1¡£

±í1. ʵÑéÊý¾Ý¼¯
Êý¾Ý¼¯Æ¾Ö¤±ÈÀý4£º1»®·ÖΪѵÁ·¼¯ºÍ²âÊÔ¼¯¡£»ùÓÚ¹¹½¨CNNÍøÂç½á¹¹£¬¾ÓÉ200´Îµü´úѵÁ·£¬Ä£×ÓËðʧÊÕÁ²ÖÁ0.0088£¬ÑµÁ·×¼È·Âʵִï0.9957¡£Í¼4չʾģ×ÓµÄѵÁ·Àú³Ì¡£

ͼ4. ѵÁ·Àú³Ì
±í2ÁгöÁËÄ£×ÓÔÚ²âÊÔ¼¯Éϵĸ÷Ïî²âÊÔÖ¸±ê£¬×ÜÌå׼ȷÂÊΪ0.93¡£
ͼ5Ϊ²âÊԵĻìÏý¾ØÕó¡£ÔÚʵÑéµÄ7¸öÖÖ±ðÖУ¬Trojan×÷Ϊ½ÏÖØ´óµÄÒ»ÖÖ¶ñÒâ´úÂëÀàÐÍ£¬²âÊÔ׼ȷÐÔ×îµÍ¡£

±í2. ²âÊÔÐÔÄÜ

ͼ5. »ìÏý¾ØÕó
6 ×ܽá
¶ñÒâ´úÂëͬԴÆÊÎö£¬Ò»·½Ãæ¿É×·×Ù¶¨Î»¹¥»÷ȪԴ»ò¹¥»÷Õߣ¬×èÖ¹APT¹¥»÷£¬¶ÔºÚ¿Í±¬·¢ÕðÉå¹¥»÷×÷Óã»ÁíÒ»·½Ã棬¶ñÒâÈí¼þ¼ì²âÊÖÒÕ±£´æÊè©£¬Í¬Ô´ÆÊÎö¿ÉÐÖú¼ì²â¡¢Ìá·À¶ñÒâÈí¼þ¡£±¾ÎÄ̫ͨ¹ýÎö¼°ÑéÖ¤£¬ÒÔΪ»ùÓÚͼÏñ·ÖÀàµÄ¶ñÒâ´úÂëͬԴÆÊÎö¾ß±¸¿ÉÐÐÐÔ¡£µ«¶ñÒâ´úÂëµÄÀàÐÍÖ®¼ä´í×ÛÖØ´ó£¬Ã»ÓÐÃ÷È·µÄ»®·Ö½çÏߣ¬Ò²ÊÇÏÞÖÆ·ÖÀà׼ȷÐÔµÄÔµ¹ÊÔÓÉÖ®Ò»¡£¼Ò×åÏÔÈ»ÊDZÈÀàÐÍÔ½·¢×¼È·µÄÒ»ÖÖͬԴ»®·Ö·½·¨¡£È»¶ø£¬ÓÐЩ¼Ò×åÑù±¾ÊýÄ¿ÖØ´ó£¬ÓÐЩ¼Ò×å½öÓÐÉÙÁ¿¿É×·ËݵÄÑù±¾£¬ÈôÒÔ¼Ò×å»®·ÖÖÖ±ð£¬ÔòÐèÒª½â¾öÑù±¾²»Æ½ºâÎÊÌ⡣ͨ¹ýÆðÔ´Åжϣ¬ÈôÊÇÔöÌíÖֱ𻮷ֵÄϸÁ£¶È£¬·ÖÀà׼ȷÂÊ»á»ñµÃ½øÒ»²½ÌáÉý£¬ËäÈ»ÕâЩÎÊÌâÓдý½øÒ»²½Ì½Ë÷¡£
²Î¿¼ÎÄÏ×
[1]ñÒǬ·å, ÖìÐÅÓî, Áõ¹¦Éê. ¶ñÒâ´úÂëͬԴÅжÏÊÖÒÕ×ÛÊö[J]. ͨѶÊÖÒÕ, 2017, 50(007):1484-1492.
[2]Goldberg L, Goldberg P, Phillips C, et al. Constructing Computer Virus Phylogenies[J]. Journal of
Algorithms,1998,26(01):188-208.
[3]Nataraj L, Karthikeyan S, Jacob G, et al. Malware images: visualization and automatic classification[C]. IEEE Symposium on Visualization for Cyber Security, Pittsburg, PA, USA, ACM. 2011.
[4]Han K S , Lim J H , Kang B , et al. Malware analysis using visualized images and entropy graphs[J]. International Journal of Information Security, 2015, 14(1):1-14.
[5]Kolosnjaji B , Zarras A , Webster G , et al. Deep Learning for Classification of Malware System Call Sequences[C]// Australasian Joint Conference on Artificial Intelligence. Springer International Publishing, 2016.
[6]Ni S , Qian Q , Zhang R . Malware identification using visualization images and deep learning[J]. Computers & Security, 2018, 77(AUG.):871-885.
[7]Raff E , Barker J , Sylvester J , et al. Malware Detection by Eating a Whole EXE. 2017.
[8]Quan, Boydell, Oisin, et al. Deep learning at the shallow end: Malware classification for non-domain experts[J]. Digital investigation: The internatnional journal of digital forensics & incident response, 2018.
[9]Venkatraman S , Alazab M , Vinayakumar R . A hybrid deep learning image-based analysis for effective malware detection[J]. Information Security Technical Report, 2019, 47(Aug.):377-389.
[10]Ç®Óê´å£¬Åí¹ú¾ü£¬ÍõäÞµÈ.¶ñÒâ´úÂëͬԴÐÔÆÊÎö¼°¼Ò×å¾ÛÀà. ÅÌËã»ú¹¤³ÌÓëÓ¦Óã¬2015£¬56£¨18£©£º76-81.
[11]Park L, Yu J, Kang H K, et al. Birds of a Feature: Intrafamily clustering for version identification of packed malware[J]. IEEE systems journal, 2020,14(3):4545-4556.
[12] Zhao B L, Shan Z, Liu F D, et al. Malware homology identification based on a gene perspective[J]. ÐÅÏ¢Óëµç×Ó¹¤³ÌÇ°ÑØ:Ó¢Îİæ, 2019(6):801-815.
[13]Li Y, Sundaramurthy S C, Bards A G, et al. Experimental study of fuzzy hashing in malware clustering analysis[C]. Usenix, Washington DC, USA, 2015: 1-8.
[14]Ahmadi M , Giacinto G , Ulyanov D , et al. Novel feature extraction, selection and fusion for effective malware family classification[DB]. 2015.
[15]Kinable J,Kostakis O.Malware Classification based on Call Graph Clustering[J].Journal of Computer Virology and Hacking Techniques,2011,7(04):233-245.
[16]Jang J W , Woo J , Mohaisen A , et al. Mal-Netminer: Malware Classification Approach Based on Social Network Analysis of System Call Graph[J]. Mathematical Problems in Engineering,2015,(2015-10-1), 2015, 2015(PT.18):731-734.
[17]ÕÔ±þ÷ë, ÃÏêØ, º«½ð,µÈ. »ùÓÚͼ½á¹¹µÄ¶ñÒâ´úÂëͬԴÐÔÆÊÎö[J]. ͨѶѧ±¨, 2017, v.38;No.365(S2):86-93.
[18]Sanders H, Saxe J. Malware data science: Attack detection and attribution[M]. No Starch Press, 2018.
[19]Ronen R , Radu M , Feuerstein C , et al. Microsoft Malware Classification Challenge[DB]. 2018. https://arxiv.org/pdf/1802.10135.pdf.
°æÈ¨ÉùÃ÷
×ªÔØÇëÎñ±Ø×¢Ã÷À´ÓÉ¡£
°æÈ¨ËùÓУ¬Î¥Õ߱ؾ¿¡£
- Òªº¦´Ê±êÇ©£º
- ×ðÁú¿Ê± È˹¤ÖÇÄÜÇå¾² AIÇå¾²Ó¦Óà ¶ñÒâ´úÂëͬԴÆÊÎö