Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: dbzer0.com wordpress blog not being fetched #5147

Closed
5 tasks done
DraconicNEO opened this issue Oct 30, 2024 · 17 comments
Closed
5 tasks done

[Bug]: dbzer0.com wordpress blog not being fetched #5147

DraconicNEO opened this issue Oct 30, 2024 · 17 comments
Labels
area: federation support federation via activitypub bug Something isn't working

Comments

@DraconicNEO
Copy link

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

When I try to search for the content of the dbzer0.com wordpress blog it simply just doesn't show up in search. I know that the blog is federated over activitypub since it pulls them up on Mastodon and other services. So it seems like it's a bug in Lemmy's wordpress federation.

Steps to Reproduce

  1. Attempt to search for post i.e. this in your Lemmy instance
  2. Wait for search to complete
  3. No results

Technical Details

N/A (Not an Admin)

Version

BE 0.19.6-beta-13

Lemmy Instance URL

pawb.social

@DraconicNEO DraconicNEO added the bug Something isn't working label Oct 30, 2024
@Nutomic Nutomic added the area: federation support federation via activitypub label Oct 31, 2024
@Nutomic
Copy link
Member

Nutomic commented Nov 6, 2024

@pfefferle I looked into this, and the problem is that dbzer0.com returns HTML data when Lemmy attempts to fetch from it. Strangely it works as expected when fetching the same url with curl -H "Accept: application/activity+json" which is the same header used by Lemmy. I already talked with the blog author and he says there is no unusual configuration. The same problem also happens with some other Wordpress sites I tested, but not with your site pfefferle.org.

Do you have any idea what might cause this?

@pfefferle
Copy link

pfefferle commented Nov 6, 2024

Can you point me to the code that runs the fetch?

I checked some of the common issues, but the site seems to work as expected and I can confirm that it works on Mastodon and also on Misskey.

@Nutomic
Copy link
Member

Nutomic commented Nov 7, 2024

https://github.com/LemmyNet/activitypub-federation-rust/blob/main/src/fetch/mod.rs#L98

Nothing unusual either, and I havent heard about any problems like this with other platforms.

@DraconicNEO
Copy link
Author

@db0 Is there anything you know about this issue that you think might help, I mean you do run dbzer0.com so you could check and see if there's anything unusual about the configurations in your site.

@db0
Copy link
Contributor

db0 commented Nov 10, 2024

I already did and I sent the config I'm using to the lemmy devs

@freamon
Copy link

freamon commented Nov 15, 2024

It's not the post itself that's returning HTML instead of JSON, it's the URL for the 'audience' field in that post.

On pfefferles's blog, attributedTo resolves to a Person, so Mastodon can use it, and audience resolves to a Group, so Lemmy can see it.
On dbzer0's blog, attributedTo resolves to a Person, so Mastodon can use it, but audience doesn't even resolve to JSON, so Lemmy can't see it (and it doesn't affect Mastodon because it doesn't care about 'audience')

db0's blog is arguably misconfigured, because if you guess what the 'audience' field should be based on pfefferles's blog, and try to get https://dbzer0.com/@dbzer0.com instead of https://dbzer0.com/?author=0, then it properly resolves to a Group.

@db0
Copy link
Contributor

db0 commented Nov 16, 2024

db0's blog is arguably misconfigured, because if you guess what the 'audience' field should be based on pfefferles's blog, and try to get https://dbzer0.com/@dbzer0.com instead of https://dbzer0.com/?author=0, then it properly resolves to a Group.

I am just using the apub plugin on a vanilla wordpress. I haven't set anything fancy so I don't see how I'm "misconfigured". If someone can actually point to the wrong configuration I can change it.

@DraconicNEO
Copy link
Author

DraconicNEO commented Nov 16, 2024

What I don't understand is how it can work flawlessly on Mastodon but not at all on Lemmy, it's clearly federating out the data.

@freamon
Copy link

freamon commented Nov 16, 2024

@DraconicNEO - maybe it will help to re-create what's happening, using the command line.

You search for the post:
curl --header 'accept: application/activity+json' https://dbzer0.com/blog/how-did-we-move-from-forums-to-reddit-facebook-groups-and-discord/ | jq .

This gives you a nice JSON file. Looking at the 'attributedTo' field, you see it's by https://dbzer0.com/blog/author/db0/. You don't have that person in your DB, so you fetch the details:

curl --header 'accept: application/activity+json' https://dbzer0.com/blog/author/db0/ | jq .

This also gives you a nice JSON file. If you're Mastodon, well you've got the post, and you've got the author, so you're happy.

If you're Lemmy though, you also want a community, because every post on Lemmy is in a community, so you look at the 'audience' field. That says it belongs in https://dbzer0.com/?author=0. You don't have that community in your DB, so you try to fetch the details:

curl --header 'accept: application/activity+json' https://dbzer0.com/?author=0

This doesn't provide a nice JSON file. Instead it offers up a bunch of HTML. Lemmy can't use that, so it can't put it in the right community, so it gives up.

This isn't Lemmy's fault. It's doing exactly what it would do when searching for any other post, it's just that the JSON representation of db0's blog post is providing bad information. Why it's doing that, I don't know, but I'd suggest that this Issue doesn't belong to Lemmy, it belongs to WordPress.

@DraconicNEO
Copy link
Author

DraconicNEO commented Nov 17, 2024

This isn't Lemmy's fault. It's doing exactly what it would do when searching for any other post, it's just that the JSON representation of db0's blog post is providing bad information. Why it's doing that, I don't know, but I'd suggest that this Issue doesn't belong to Lemmy, it belongs to WordPress.

Technically speaking Lemmy's wordpress integration is really a hack, Wordpress blogs are not communities. They just set it up this way because Lemmy's devs did not want to have any kind of Microblog functionality but people really wanted access to Wordpress blogs, which are basically Microblogs just longer, so they made them appear as communities in Lemmy even though they are not entities with the Group flag.

@freamon
Copy link

freamon commented Nov 17, 2024

they are not entities with the Group flag.

Yeah they are:
curl --header 'accept: application/activity+json' https://pfefferle.org/@pfefferle.org | jq .type

curl --header 'accept: application/activity+json' https://dbzer0.com/@dbzer0.com | jq .type

The response is "Group" both times.

@pfefferle
Copy link

@db0 have you updated the plugin to the latest version? (4.2.0)

@db0
Copy link
Contributor

db0 commented Nov 18, 2024

@pfefferle Yes I'm on the latest version.

OK I've gone through my settings and started experimenting with various switches.

Switching to this option

image

Seems to have changed the attributedTo field. I checked and I can now discover the community https://lemmy.dbzer0.com/c/[email protected] and one of the posts within it as well.

@pfefferle
Copy link

that's weird! maybe a migration glitch in one of the WordPress plugin updates. Happy that it works now and sorry if it was because of me 🥸

@db0
Copy link
Contributor

db0 commented Nov 18, 2024

All good https://lemmy.dbzer0.com/post/31734668

if you want me to help you troubleshoot why only that particular setting works, lemme know. The only thing I can imagine somehow causing this is the fact that I have multiple users on my blog.

@pfefferle
Copy link

a, maybe there is still a bug in the code, because the audience field should not be set if the blog-user is disabled. I will check that!

@DraconicNEO
Copy link
Author

@pfefferle Yes I'm on the latest version.

OK I've gone through my settings and started experimenting with various switches.

Switching to this option

image

Seems to have changed the attributedTo field. I checked and I can now discover the community https://lemmy.dbzer0.com/c/[email protected] and one of the posts within it as well.

Good to see that it's now working, guess we can consider this issue resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: federation support federation via activitypub bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants