-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Numo Gem for performing SVD #198
Conversation
@mattr- appreciate all the PR reviews you've already done for me! I still have this PR marked as a draft, but I think it's ready for an initial round of feedback if you have time to do a review. Here are a couple things to focus on:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick 👀
Looks good so far.
test/lsi/lsi_test.rb
Outdated
# require_relative '../test_helper' | ||
# require 'debug' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it might be debugging code left over. Would you mind removing it?
**Background:** The slow step of LSI is computing the SVD (singular value decomposition) of a matrix. Even with a relatively small collection of documents (say, about 20 blog posts), the native ruby implementation is too slow to be usable (taking hours to complete). To work around this problem, classifier-reborn allows you to optionally use the `gsl` gem to make use of the [Gnu Scientific Library](https://www.gnu.org/software/gsl/) when performing matrix calculations. Computations with this gem perform orders of magnitude faster than the ruby-only matrix implementation, and they're fast enough that using LSI with Jekyll finishes in a reasonable amount of time (seconds). Unfortunately, [rb-gsl](https://github.com/SciRuby/rb-gsl) is unmaintained -- there's a commit on main that makes it compatible with Ruby 3, but nobody has released the gem so the only way to use rb-gsl with Ruby 3 right now is to specify the git hash in your Gemfile. See SciRuby/rb-gsl#67. This will be increasingly problematic because Ruby 2.7 is now in [security maintenance](https://www.ruby-lang.org/en/news/2022/04/12/ruby-2-7-6-released/) and will become end of life in less than a year. Notably, `rb-gsl` depends on the [narray](https://github.com/masa16/narray#new-version-is-under-development---rubynumonarray) gem. `narray` is deprecated, and the readme suggests using `Numo::NArray` instead. **Changes:** In this PR, my goal is to provide an alternative matrix implementation that can perform singular value decomposition quickly and works with Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3 without depending on the unmaintained/unreleased gsl gem. There aren't many gems that provide fast matrix support for ruby, but [Numo](https://github.com/ruby-numo) seems to be more actively maintained than rb-gsl, and Numo has a working Ruby 3 implementation that can perform a singular value decomposition, which is exactly what we need. This requires [numo-narray](https://github.com/ruby-numo/numo-narray) and [numo-linalg](https://github.com/ruby-numo/numo-linalg). My goal is to allow users to (optionally) use classifier-reborn with Numo/Lapack the same way they'd use it with GSL. That is, the user should install the `numo-narray` and `numo-linalg` gems (with their required C libraries), and classifier-reborn will detect and use these if they are found.
@mattr- This is ready for review! No rush 🙂 I addressed your previous comment, added a little polish, and updated the docs since your last review. Also, I tested this on a personal jekyll site (i.e. with |
@jekyllbot: merge +minor |
In jekyll#198, I added support for using [Numo::Linalg](https://github.com/ruby-numo/numo-linalg) as the linear algebra backend for classifier-reborn. At that time, I updated the docs with instructions for installing Numo, but the macOS docs were a little vague because I hadn't tested them myself. Since then, I've been able to verify the instructions on macOS and clarify a few steps. So this commit updates the docs for installing Numo on macOS. The gem installation arguments I'm using come from the [Numo docs](https://github.com/ruby-numo/numo-linalg/blob/master/doc/select-backend.md).
Background:
The slow step of LSI is computing the SVD (singular value decomposition)
of a matrix. Even with a relatively small collection of documents (say,
about 20 blog posts), the native ruby implementation is too slow to be
usable (taking hours to complete).
To work around this problem, classifier-reborn allows you to optionally
use the
gsl
gem to make use of the Gnu ScientificLibrary when performing matrix
calculations. Computations with this gem perform orders of magnitude
faster than the ruby-only matrix implementation, and they're fast enough
that using LSI with Jekyll finishes in a reasonable amount of time
(seconds).
Unfortunately, rb-gsl is
unmaintained -- there's a commit on main that makes it compatible with
Ruby 3, but nobody has released the gem so the only way to use rb-gsl
with Ruby 3 right now is to specify the git hash in your Gemfile. See
SciRuby/rb-gsl#67. This will be increasingly
problematic because Ruby 2.7 is now in security
maintenance
and will become end of life in less than a year.
Notably,
rb-gsl
depends on thenarray
gem.
narray
is deprecated, and the readme suggests usingNumo::NArray
instead.Changes:
In this PR, my goal is to provide an alternative matrix implementation
that can perform singular value decomposition quickly and works with
Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3
without depending on the unmaintained/unreleased gsl gem. There aren't
many gems that provide fast matrix support for ruby, but
Numo seems to be more actively
maintained than rb-gsl, and Numo has a working Ruby 3 implementation
that can perform a singular value decomposition, which is exactly what
we need. This requires
numo-narray and
numo-linalg.
My goal is to allow users to (optionally) use classifier-reborn with
Numo/Lapack the same way they'd use it with GSL. That is, the user
should install the
numo-narray
andnumo-linalg
gems (with theirrequired C libraries), and classifier-reborn will detect and use these
if they are found.