来源： 计算机学院 | 发表时间： 2019-11-28 | 浏览次数： 71
报告题目：Performance-Enhancing Deep-learning Models for Crowd Counting
报告人简介：Professor Xiangjian He is the Director of Computer Vision and Pattern Recognition Laboratory at the Global Big Data Technologies Centre (GBDTC) at the University of Technology Sydney (UTS). He is an IEEE Senior Member and has been an IEEE Signal Processing Society Student Committee member. He received a UTS Chancellor's Award for Research Excellence, together with his colleagues in a team working for a project funded by SydneyTrains and RMCRC, in 2018. He has also been awarded 'Internationally Registered Technology Specialist' by International Technology Institute (ITI). He led a UTS-PolyU joint research project team wining 1st Runner-Up prize for the 2017 VIP Cup, and the champion for the 2019 VIP Cup, awarded by IEEE Signal Processing Society. He has received many competitive national or regional grants including FIVE grants awarded by Australian Research Council (ARC), five grants awarded by National Natural Science Foundation of China (NSFC, and two GRF grants awarded by Hong Kong Research Grants Council (RGC). In recent years, he has many high quality publications in prestigious journals and in premier international conferences and workshops. He has recently been involved in editing for various international journals. He is an Associated Editor of Springer-Nature Computer Science journal and has been an Advisor of HKIE Transactions.
报告摘要：Counting people or objects with significantly varying scales and densities has attracted much interest from the research community and yet it remains an open problem. Although many attempts have been reported, real world problems, such as huge variation in subjects’ sizes in images and serious occlusion among people, make it still a challenging problem. In this talk, an Adaptive Counting Convolutional Neural Network (A-CCNN), which considers the scale variation of objects in a frame adaptively, is presented. Then, a pruning strategy is presented to remove the irrelevant filters, of which feature maps contain little information, and the negative filters, which are determined by a mask learned on a training dataset. Last, a simple but efficient and effective network, named DENet, is introduced. It is composed of two components, i.e., a detection network (DNet) and an encoder-decoder estimation network (ENet). The proposed models are evaluated on the ShanghaiTech dataset, UCF dataset, WorldExpo’10 dataset etc.