Xfredhd May 2026

The total loss:

[ \big|\langle \tildex_i, \tildex_j\rangle - \langle x_i, x_j\rangle\big| \le \epsilon |x_i|,|x_j| ] xfredhd

a.sanchez@uv.es Abstract The exponential growth of data in scientific, industrial, and social domains has made the analysis of ultra‑high‑dimensional (UHD) datasets a pressing challenge. Conventional dimensionality‑reduction techniques (e.g., PCA, t‑SNE, UMAP) either suffer from prohibitive computational costs or fail to preserve intricate feature‑level relationships when the dimensionality exceeds 10⁶. We introduce XFREDHD (e X treme F eature‑Rich E mbedding for D ata in H igh D imensions), a scalable, end‑to‑end framework that couples a feature‑wise random projection with an adaptive hierarchical auto‑encoder and a graph‑preserving regularizer . XFREDHD reduces dimensionality by up to three orders of magnitude while maintaining > 95 % of pairwise cosine similarity and enabling downstream tasks (classification, clustering, anomaly detection) to achieve state‑of‑the‑art performance. Extensive experiments on synthetic benchmarks, genomics, hyperspectral imaging, and large‑scale recommender‑system logs demonstrate that XFREDHD outperforms existing baselines in both accuracy and runtime (up to 12× speed‑up on a 64‑GPU cluster). We release the open‑source implementation (Apache 2.0) and a curated suite of UHD datasets to foster reproducibility. 1. Introduction High‑dimensional data arise in numerous domains: The total loss: [ \big|\langle \tildex_i, \tildex_j\rangle -

Resulting sketch (\tildeX) ∈ ℝ^N × S is , can be computed on‑the‑fly, and fits comfortably in GPU memory for S ≈ 10³–10⁴. XFREDHD reduces dimensionality by up to three orders

| Domain | Typical Dimensionality | Example | |----------------------------|------------------------|-----------------------------------------| | Genomics & Transcriptomics | 10⁶ – 10⁸ | Single‑cell RNA‑seq expression matrices | | Remote Sensing | 10⁴ – 10⁶ | Hyperspectral cubes (hundreds of bands) | | Recommender Systems | 10⁶ – 10⁹ | User–item interaction tensors | | Natural Language Processing| 10⁵ – 10⁷ | Contextualized token embeddings |

¹ Department of Computer Science, University of Valencia, Spain ² Department of Electrical Engineering, Indian Institute of Technology Delhi, India ³ Data Science Lab, Stanford University, USA

[ \mathcalL = \sum_k=1^3\lambda_k,\mathcalL \textrec^(k) + \lambda_g ,\mathcalL \textGPR ]