Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big variation in number of hits when js_render is true #31

Open
beauchette opened this issue Mar 28, 2023 · 1 comment
Open

Big variation in number of hits when js_render is true #31

beauchette opened this issue Mar 28, 2023 · 1 comment

Comments

@beauchette
Copy link

Description

I have a website with thousands of pages. Most of them are rendered by the server, but some of them use javascript "plugins" and there's no SSR for these features.

So I just installed Typesense 0.24.0 and typesense-docsearch-scraper from git.

My first attempt gets 110k hits in my collection, the second gets me 98k (which is weird, there is not that many changes in an hour) and then anyway, I added "js_render": true to my config, and it only got me 90 hits.

What could cause this ?

Also, there is a lot variation in the number of hits, I tried launching it with js_wait = 2 and got 194 hits, and then another time without even touching config.json and got 90 hits again.

Steps to reproduce

I just set up a new typesense 0.24.0, and the last typesense-docsearch-scraper from git and tried it against my website.

Expected Behavior

I'd expect two consecutive execution to get approximately the same number of hits and I'd expect js_render true to cause more hits, because there's more content

Actual Behavior

The discrepancies on the number of hits I described.

Metadata

Typesense: 0.24.0
typesense-docsearch-scraper: latest from git
OS: vanilla Ubuntu 22.04

@beauchette
Copy link
Author

it's due to chrome 112 with its appropriate chrome driver that fails really fast.

On another hand I still don't understand, with js_render=false, How I can have between 75k and 100k records at times, while the site doesn't really change more than 2/3 pages a day

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant