بازشناسایی فعالیت‌های انسان در ویدیو با استفاده از ویژگی‌های FREAK-HOG و ماشین بردار پشتیبان آبشاری

نوع مقاله: مقاله پژوهشی فارسی

نویسندگان

دانشگاه شهید مدنی آذربایجان، دانشکده فناوری اطلاعات و مهندسی کامپیوتر

چکیده

در سال‌های اخیر، بازشناسایی خودکار فعالیت‌های انسان در ویدیو تبدیل به یکی از حوزه‌های مهم تحقیقاتی شده است. دامنه کاربرد این تحقیقات گسترده بوده و در سامانه‌هایی نظیر سامانه‌های نظارتی و امنیتی، رابط‌های کاربریِ واکنش‌گرا، آموزش و مراقبت‎های بهداشتی، استخراج اطلاعات حرکتی-رفتاری مورد استفاده و بهره برداری قرار گرفته است. اما چالش‌هایی نظیر تغییرات شدت روشنایی تصاویر، متحرک بودن پس زمینه و دوربین، شلوغی و ازدحام، پیچیدگی و تنوع فعالیت انجام شونده باعث شده‌اند توسعه‌ سامانه‌هایی که از نظر دقت بازشناسایی مورد اطمینان بوده و در عین حال سرعت عمل قابل قبولی داشته باشند، با مشکل مواجه شود. یکی از روش‌های مرسوم در این حوزه، استفاده از اطلاعات حرکتی نقاط ویژگی استخراج شده در توالی فریم‌ها و توصیف حرکت‌های انجام گرفته در آن و متعاقبا بازشناسایی فعالیت است. در این مقاله با هدف افزایش دقت بازشناسایی فعالیت‌ها، استفاده از یک توصیف‌گر بافتی الهام گرفته شده از شبکیه چشم انسان و ترکیب آن با یک توصیف‌گر ظاهری-حرکتی برای توصیف نقاطِ ویژگی استخراج شده در توالی فریم‌ها پیشنهاد می‌شود. همچنین برای افزایش سرعت ساخت مدل و کاهش هزینه‌های بالاسری ناشی از ترکیب ویژگی‌های پیشنهاد شده، یک رویکرد آبشاری برای ساخت مدل طبقه‌بندی کننده ارائه می‌شود. نتایج آزمایشات انجام گرفته بر روی پایگاه‌داده‌ی بزرگ UCF101 نشان می‌دهد که روش پیشنهادی دقت و سرعت عملکرد بسیار خوبی دارد و کارایی آن قابل مقایسه با آخرین دستاورد‌ها در این حوزه است.

کلیدواژه‌ها


[1]        M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa, “Motion history image: its variants and applications”, Machine Vision and Applications, Vol. 23, No. 2, pp. 255-281, 2012.

[2]        J. K. Aggarwal and M. S. Ryoo, “Human activity analysis: A review”, ACM Computing Surveys (CSUR), Vol. 43, No. 3, p. 16, 2011.

[3]        R. Poppe, “A survey on vision-based human action recognition”, Image and vision computing, Vol. 28, No. 6, pp. 976-990, 2010.

[4]        D. Marr and L. Vaina, “Representation and recognition of the movements of shapes”, Proceedings of the Royal Society of London B: Biological Sciences, Vol. 214, No. 1197, pp. 501-524, 1982.

[5]        Y. M. Lui and J. R. Beveridge, “Tangent bundle for human action recognition”, in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, 2011, pp. 97-102: IEEE.

[6]        D. D. Dawn and S. H. Shaikh, “A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector”, The Visual Computer, Vol. 32, No. 3, pp. 289-306, 2016.

[7]        K. Anuradha and N. Sairam, “Spatio-temporal based approaches for human action recognition in static and dynamic background: a survey”, Indian Journal of Science and Technology, Vol. 9, No. 5, 2016.

[8]        J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden markov model”, in Computer Vision and Pattern Recognition, 1992. Proceedings CVPR'92, 1992 IEEE Computer Society Conference on, 1992, pp. 379-385: IEEE.

[9]        A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates”, IEEE Transactions on pattern analysis and machine intelligence, Vol. 23, No. 3, pp. 257-267, 2001.

[10]     M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes”, in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005, Vol. 2, pp. 1395-1402: IEEE.

[11]     A. Yilmaz and M. Shah, “Recognizing human actions in videos acquired by uncalibrated moving cameras”, in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005, Vol. 1, pp. 150-157: IEEE.

[12]     S. Ali, A. Basharat, and M. Shah, “Chaotic invariants for human action recognition”, in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1-8: IEEE.

[13]     I. Laptev, “On space-time interest points”, International journal of computer vision, Vol. 64, No. 2-3, pp. 107-123, 2005.

[14]     C. Harris and M. Stephens, “A combined corner and edge detector”, in Alvey vision conference, 1988, Vol. 15, No. 50, p. 10.5244: Manchester, UK.

[15]     P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features”, in Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on, 2005, pp. 65-72: IEEE.

[16]     T. Kadir and M. Brady, “Scale saliency: A novel approach to salient feature and scale selection”, 2003.

[17]     A. Oikonomopoulos, I. Patras, and M. Pantic, “Spatiotemporal salient points for visual recognition of human actions”, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 36, No. 3, pp. 710-719, 2005.

[18]     P. R. Beaudet, “Rotationally invariant image operators”, in Proc. 4th Int. Joint Conf. Pattern Recog, Tokyo, Japan, 1978, 1978.

[19]     G. Willems, T. Tuytelaars, and L. Van Gool, “An efficient dense and scale-invariant spatio-temporal interest point detector”, Computer Vision–ECCV 2008, pp. 650-663, 2008.

[20]     H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition”, in BMVC 2009-British Machine Vision Conference, 2009, pp. 124.1-124.11: BMVA Press.

[21]     B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision”, 1981.

[22]     R. Messing, C. Pal, and H. Kautz, “Activity recognition using the velocity histories of tracked keypoints”, in Computer Vision, 2009 IEEE 12th International Conference on, 2009, pp. 104-111: IEEE.

[23]     M. B. Kaaniche and F. Brémond, “Tracking hog descriptors for gesture recognition”, in Advanced Video and Signal Based Surveillance, 2009. AVSS'09. Sixth IEEE International Conference on, 2009, pp. 140-145: IEEE.

[24]     J. Shi, “Good features to track”, in Computer Vision and Pattern Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer Society Conference on, 1994, pp. 593-600: IEEE.

[25]     E. Rosten and T. Drummond, “Machine learning for high-speed corner detection”, Computer Vision–ECCV 2006, pp. 430-443, 2006.

[26]     N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection”, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, Vol. 1, pp. 886-893: IEEE.

[27]     D. G. Lowe, “Object recognition from local scale-invariant features”, in Computer vision, 1999. The proceedings of the seventh IEEE international conference on, 1999, Vol. 2, pp. 1150-1157: Ieee.

[28]     I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies”, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1-8: IEEE.

[29]     N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms of flow and appearance”, in European conference on computer vision, 2006, pp. 428-441: Springer.

[30]     X. Peng, L. Wang, X. Wang, and Y. Qiao, “Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice”, Computer Vision and Image Understanding, Vol. 150, pp. 109-125, 2016.

[31]     F. Perronnin, J. Sánchez, and T. Mensink, “Improving the fisher kernel for large-scale image classification”, Computer Vision–ECCV 2010, pp. 143-156, 2010.

[32]     Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj, “Beyond gaussian pyramid: Multi-skip feature stacking for action recognition”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 204-212.

[33]     M. Bagheri et al., “Keep it accurate and diverse: Enhancing action recognition performance by ensemble learning”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 22-29.

[34]     J. Donahue et al., “Long-term recurrent convolutional networks for visual recognition and description”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625-2634.

[35]     X. Wang, A. Farhadi, and A. Gupta, “Actions~ transformations”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2658-2667.

[36]     A. Alahi, R. Ortiz, and P. Vandergheynst, “Freak: Fast retina keypoint”, in Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, 2012, pp. 510-517: Ieee.

[37]     M. Marszalek, I. Laptev, and C. Schmid, “Actions in context”, in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 2929-2936: IEEE.

[38]     G. D. Field et al., “Functional connectivity in the retina at the resolution of photoreceptors”, Nature, Vol. 467, No. 7316, pp. 673-677, 2010.

[39]     E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to SIFT or SURF”, in Computer Vision (ICCV), 2011 IEEE international conference on, 2011, pp. 2564-2571: IEEE.

[40]     E. Tola, V. Lepetit, and P. Fua, “Daisy: An efficient dense descriptor applied to wide-baseline stereo”, IEEE transactions on pattern analysis and machine intelligence, Vol. 32, No. 5, pp. 815-830, 2010.

[41]     G. Salton, “Automatic information organization and retrieval”, 1968.

[42]     J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases”, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1-8: IEEE.

[43]     J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification”, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 3360-3367: IEEE.

[44]     X. Zhou, K. Yu, T. Zhang, and T. S. Huang, “Image classification using super-vector coding of local image descriptors”, in European conference on computer vision, 2010, pp. 141-154: Springer.

[45]     K. Chatfield, V. S. Lempitsky, A. Vedaldi, and A. Zisserman, “The devil is in the details: an evaluation of recent feature encoding methods”, in BMVC, 2011, Vol. 2, No. 4, p. 8.

[46]     A. Iosifidis, A. Tefas, and I. Pitas, “View-invariant action recognition based on artificial neural networks”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, No. 3, pp. 412-424, 2012.

[47]    ا. فیضی، ع. آقاگل‌زاده، و م. سیدعربی، “شناسایی و دسته‌بندی رفتارها به منظور آشکارسازی رفتارهای غیر معمول با استفاده از مدل مارکوف مخفی”، مجله علمی-پژوهشی رایانش نرم و فناوری اطلاعات، جلد ۵، شماره ۲، تابستان ۱۳۹۵.

[48]     S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques”, ed, 2007.

[49]     C. Cortes and V. Vapnik, “Support-vector networks”, Machine learning, Vol. 20, No. 3, pp. 273-297, 1995.

[50]     X. Ke, H. Jin, X. Xie, and J. Cao, “A distributed SVM method based on the iterative MapReduce”, in Semantic Computing (ICSC), 2015 IEEE International Conference on, 2015, pp. 116-119: IEEE.

[51]     O. Meyer, B. Bischl, and C. Weihs, “Support vector machines on large data sets: Simple parallel approaches”, in Data Analysis, Machine Learning and Knowledge Discovery: Springer, 2014, pp. 87-95.

[52]      K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild”, arXiv preprint arXiv:1212.0402, 2012.

[53]      H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: a large video database for human motion recognition”, in Computer Vision (ICCV), 2011 IEEE International Conference on, 2011, pp. 2556-2563: IEEE.