A dragonfly-wings like self-powered magnetic field sensor with vibration ...
High-Density Electroencephalogram Facilitates the Detection of Small Stim...
Amino acid salt induced PbI2 crystal orientation optimization for high-ef...
Superhydrophobic films with high average transmittance in infrared and vi...
Effect of stress control by growth adjustment on the edge thread dislocat...
Strain-induced polarization modulation at GaN/Ti interface for flexible t...
Atomic Evolution Mechanism and Suppression of Edge Threading Dislocations...
Silicon-Based 850 nm GaAs/GaAsP-Strained Quantum Well Lasers with Active ...
Phase-locked single-mode terahertz quantum cascade lasers array
DT-SCNN: dual-threshold spiking convolutional neural network with fewer o...
官方微信
友情链接

Fusing differentiable rendering and language–image contrastive learning for superior zero-shot point cloud classification

2024-07-17


Xie, Jinlong; Cheng, Long; Wang, Gang; Hu, Min; Yu, Zaiyang; Du, Minghua; Ning, Xin Source: Displays, v 84, September 2024; ISSN: 01419382; DOI: 10.1016/j.displa.2024.102773; Article number: 102773; Publisher: Elsevier B.V.

Author affiliation:

School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing; 100029, China

Chinese Academy of Sciences, Institute of Semiconductors, Annlab, Beijing; 100083, China

School of Control and Computer Engineering, North China Electric Power University, Beijing; 102206, China

School of Computing and Data Engineering, NingboTech University, Ningbo; 315100, China

Department of bioengineering, Imperial College London, London; SW7 2AZ, United Kingdom

Department of Ecrossncy, the First Medical Center, Chinese PLA General Hospital, Beijing; 100853, China

Beijing Ratu Technology Co., Ltd, Beijing; 100096, China

Abstract:

Zero-shot point cloud classification involves recognizing categories not encountered during training. Current models often exhibit reduced accuracy on unseen categories without 3D pre-training, emphasizing the need for improved precision and interoperability. We propose a novel approach integrating differentiable rendering with contrastive language–image pre-training. Initially, differentiable rendering autonomously learns representative viewpoints from the data, enabling the transformation of point clouds into multi-view images while preserving key visual information. This transformation facilitates optimized viewpoint selection during training, refining the final feature representation. Features are extracted from the multi-view images and integrated into a global multi-view feature using a cross-attention mechanism. On the textual side, a large language model (LLM) is provided with 3D heuristic prompts to generate 3D-specific text reflecting category-specific traits, from which textual features are derived. The LLM's extensive pre-trained knowledge enables it to capture abstract notions and categorical features relevant to distinct point cloud categories. Visual and textual features are aligned in a unified embedding space, enabling zero-shot classification. Throughout training, the Structural Similarity Index (SSIM) is integrated into the loss function to encourage the model to discern more distinctive viewpoints, reduce redundancy in multi-view imagery, and enhance computational efficiency. Experimental results on the ModelNet10, ModelNet40, and ScanObjectNN datasets demonstrate classification accuracies of 75.68%, 66.42%, and 52.03%, respectively, surpassing prevailing methods in zero-shot point cloud classification accuracy.





关于我们
下载视频观看
联系方式
通信地址

北京市海淀区清华东路甲35号(林大北路中段) 北京912信箱 (100083)

电话

010-82304210/010-82305052(传真)

E-mail

semi@semi.ac.cn

交通地图
版权所有 中国科学院半导体研究所

备案号:京ICP备05085259-1号 京公网安备110402500052 中国科学院半导体所声明