Publications

Conference Papers

X-JEPA: A Novel Self-Supervised Framework for Cross-Modal Remote Sensing Retrieval via Predictive Semantic Alignment

Published in Proceedings of the Winter Conference on Applications of Computer Vision (WACV) 2026 , 2025

We propose X-JEPA, a predictive self-supervised joint-embedding architecture for cross-modal remote sensing image retrieval (RS‑CMIR). Instead of reconstructing pixels or using contrastive pairs, X‑JEPA learns by forecasting semantic embeddings across modalities, enforcing modality‑invariant alignment through a geometry‑aware Prediction Space Alignment (PSA) loss that preserves latent space structure without requiring paired inputs. Evaluated on large‑scale BEN‑14K (Sentinel‑1/Sentinel‑2) and fMoW (RGB/Sentinel) benchmarks, X‑JEPA achieves up to 11.0% F1 improvement in cross‑modal retrieval and 9.8% in unimodal settings over MAE, SatMAE, CrossMAE, CSMAE‑SESD, CROMA, SkySense, DeCUR, and REJEPA, while remaining comparatively lightweight and parameter‑efficient.

Download Paper

REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025

We propose REJEPA (Retrieval with Joint Embedding Predictive Architecture), a novel RS-CBIR Image Retrieval framework that replaces pixel reconstruction with feature-space prediction.

Recommended citation: Choudhury, S., Salunkhe, Y., Mehrotra, S., & Banerjee, B. (2025). REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 2373-2382).
Download Paper

Yash Salunkhe

Publications

Conference Papers

X-JEPA: A Novel Self-Supervised Framework for Cross-Modal Remote Sensing Retrieval via Predictive Semantic Alignment

REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval