Dual-Dependency Attention Transformer for Fine-Grained Visual Classification
Dual-Dependency Attention Transformer for Fine-Grained Visual Classification
Blog Article
Visual transformers (ViTs) are widely used pom pom cupcake toppers in various visual tasks, such as fine-grained visual classification (FGVC).However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity.The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks.These tasks require dense feature extraction and global dependency modeling.To address this challenge, we propose a dual-dependency attention transformer model.
It decouples global token interactions into two paths.The first is a position-dependency attention pathway based on the intersection of two types of grouped attention.The second is a semantic dependency attention pathway based on dynamic central aggregation.This approach enhances the high-quality semantic modeling of whip carbon magnum discriminative cues while reducing the computational cost to linear computational complexity.In addition, we develop discriminative enhancement strategies.
These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach.Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification.It finds a balance between computational cost and performance.