Skip to content

Welcome to itembed#

This is yet another variation of the well-known word2vec method, proposed by Mikolov et al.1, applied to unordered sequences, which are commonly referred to as itemsets. The contribution of itembed is twofold:

  1. Modifying the base algorithm to handle unordered sequences, which has an impact on the definition of context windows;
  2. Using the two embedding sets introduced in word2vec for supervised learning.

A similar philosophy is described by Wu et al. in StarSpace2 and by Barkan and Koenigstein in item2vec3. itembed uses Numba4 to achieve high performances.

Citation#

If you use this software in your work, it would be appreciated if you would cite this tool, for instance using the following Bibtex reference:

@software{itembed,
  author = {Johan Berdat},
  title = {itembed},
  url = {https://github.com/sdsc-innovation/itembed},
  version = {0.5.1},
  date = {2024-02-28},
}

  1. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. 2013. arXiv:1301.3781

  2. Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. StarSpace: embed all the things! 2017. arXiv:1709.03856

  3. Oren Barkan and Noam Koenigstein. Item2vec: neural item embedding for collaborative filtering. 2017. arXiv:1603.04259

  4. Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: a LLVM-based python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 1–6. 2015. URL: https://doi.org/10.1145/2833157.2833162