Welcome to itembed
#
This is yet another variation of the well-known word2vec method, proposed by Mikolov et al.1, applied to unordered sequences, which are commonly referred to as itemsets.
The contribution of itembed
is twofold:
- Modifying the base algorithm to handle unordered sequences, which has an impact on the definition of context windows;
- Using the two embedding sets introduced in word2vec for supervised learning.
A similar philosophy is described by Wu et al. in StarSpace2 and by Barkan and Koenigstein in item2vec3.
itembed
uses Numba4 to achieve high performances.
Citation#
If you use this software in your work, it would be appreciated if you would cite this tool, for instance using the following Bibtex reference:
@software{itembed,
author = {Johan Berdat},
title = {itembed},
url = {https://github.com/sdsc-innovation/itembed},
version = {0.5.1},
date = {2024-02-28},
}
-
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. 2013. arXiv:1301.3781. ↩
-
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. StarSpace: embed all the things! 2017. arXiv:1709.03856. ↩
-
Oren Barkan and Noam Koenigstein. Item2vec: neural item embedding for collaborative filtering. 2017. arXiv:1603.04259. ↩
-
Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: a LLVM-based python JIT compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 1–6. 2015. URL: https://doi.org/10.1145/2833157.2833162. ↩