Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
Features
- All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),
- Easy to plug in your own input corpus/datastream (trivial streaming API)
- Easy to extend with other Vector Space algorithms (trivial transformation API)
- Efficient multicore implementations of popular algorithms
- Can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers
- Documentation available
Categories
Machine LearningLicense
GNU Library or Lesser General Public License version 3.0 (LGPLv3)Follow gensim
Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services
Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of gensim!