期刊文献+

比特串划分多索引的近邻搜索算法 预览

Nearest Neighbor Search Based on Bit String Partition and Multiple Index
在线阅读 下载PDF
收藏 分享 导出
摘要 哈希表示的比特串是解决海量数据相似性搜索问题最有效的方法之一.针对比特串索引方式导致搜索效果低下的问题,提出一种基于比特串划分多索引的近邻搜索算法.首先由于比特串划分本质是一个组合优化问题,采用贪婪的思想给出该问题的近似解;其次在近邻查询阶段,结合多索引结构提出新的查询扩展和融合机制;最后通过采用一种查询自适应的办法优化多索引之间的不平衡性.在MNIST, CIFAR-10, SIFT-1M和GIST-1M数据集上使用Matlab软件进行实验的结果表明,该算法在基于哈希表示的索引结构以及在近邻搜索方面具有有效性和通用性. The bit string represented by hash was one of the most effective methods to solve the similarity search problem of massive data. In view of the problem that the bit string indexing method lead to low search effect, a neighbor search algorithm based on bit string partitioning and multiple index is proposed.Firstly, the essence of bit string partitioning is a combinatorial optimization problem. In this paper, a greedy idea is used to give an approximate solution to the problem. Secondly, in the neighborhood query phase, a new query expansion and fusion mechanism is proposed based on the multi-index structure. Finally, a query adaptive method is used to optimize the imbalance between multiple indexes. On the MNIST, CIFAR-10,SIFT-1M and GIST-1M datasets, experiments and results are presented using Matlab software. The results demonstrate that the proposed method is effective and general in neighborhood search on the index structure based on hash representation.
作者 苗建辉 栗志扬 周泽艳 杨传福 刘朝斌 刘卫江 Miao Jianhui;Li Zhiyang;Zhou Zeyan;Yang Chuanfu;Liu Zhaobin;Liu Weijiang(School of Information Science and Technology, Dalian Maritime University, Dalian 116026)
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2019年第5期771-779,共9页 Journal of Computer-Aided Design & Computer Graphics
基金 国家自然科学基金(61300187,61672379).
关键词 哈希表示 比特串划分 多表索引 查询扩展 近邻搜索 hash representation bit string partition multi-table index query expansion nearest neighbor search
作者简介 苗建辉(1994—),男,硕士研究生,主要研究方向为海量数据处理;通讯作者:栗志扬(1982-),男,博士,副教授,硕士生导师, CCF 会员,论文,主要研究方向为云计算与大数据;(lizy0205@gmail.com);周泽艳(1993-),女,硕士研究生,主要研究方向为图像处理;杨传福(1993-),男,硕士研究生,主要研究方向为并行计算;刘朝斌(1974-),男,博士,教授,硕士生导师, CCF 会员,主要研究方向为云计算与云安全;刘卫江(1969-),男,博士,教授,硕士生导师,主要研究方向为网络安全.
  • 相关文献

参考文献1

二级参考文献52

  • 1Mayer-Sch?nberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Eamon Dolan/Houghton Mifflin Harcourt, 2013. 被引量:1
  • 2Hey T, Tansley S, Tolle K. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond: Microsoft Research, 2009. 被引量:1
  • 3Bryant R E. Data-intensive scalable computing for scientific applications. Comput Sci Engin, 2011, 13: 25-33. 被引量:1
  • 4周志华. 机器学习与数据挖掘. 中国计算机学会通讯, 2007, 3: 35-44. 被引量:1
  • 5Zhou Z H, Chawla N V, Jin Y, et al. Big data opportunities and challenges: Discussions from data analytics perspectives. IEEE Comput Intell Mag, 2014, 9: 62-74. 被引量:1
  • 6Jordan M. Message from the president: The era of big data. ISBA Bull, 2011, 18: 1-3. 被引量:1
  • 7Kleiner A, Talwalkar A, Sarkar P, et al. The big data bootstrap. In: Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, 2012, 1759-1766. 被引量:1
  • 8Shalev-Shwartz S, Zhang T. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, 2014, 64-72. 被引量:1
  • 9Gonzalez J E, Low Y, Gu H, et al. PowerGraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Hollywood, 2012, 17-30. 被引量:1
  • 10Gao W, Jin R, Zhu S, et al. One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, 2013, 906-914. 被引量:1

共引文献28

投稿分析

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈