[-] marsara9@lemmy.world 45 points 1 year ago

Thanks for the shout-out.

But FYI I've run into some bugs that's preventing new content from being indexed. So you won't see anything new (from about a week ago) until I can find a new method to fetch new posts.

[-] marsara9@lemmy.world 49 points 1 year ago* (last edited 1 year ago)

Playing devil's advocate for a bit... So these are just cross-posts. Which existed even on Reddit. ...I assume they weren't handled in any way in Sync or Reddit?

But let's say this is fixed... What to do about the multiple comments threads? How would you reconcile them with each other? Especially since the user can choose different ways to sort the comments as well. Would all of this logic normally handled by the Lemmy back-end now need to run on your phone? Also how do you choose which post / instance to actually display and which ones to hide?

Btw, I'm not trying to dismiss the idea. Just want to call out some of the technical problems that might come up trying to implement such a feature. As well as ask questions to try and determine exactly how such a feature is expected to work.

[-] marsara9@lemmy.world 29 points 1 year ago

With ActivityPub all of the primary ids contain the domain of the hosting server. So if you lose your domain none of the other instances know that you're the authority on those communities, posts, comments or users. So essentially federation breaks with all of the old data.

[-] marsara9@lemmy.world 12 points 1 year ago

Unless you have an account there's no easy way to get access to the content on the page. Once you have an account there's technically nothing stopping you from just saving the HTML file to your computer.

Something else you can try though, assuming you don't have an account, is to just turn off JavaScript. If the site lets you partially load the content and then asks you to create an account to read more, they usually just block the content by having JavaScript add an opaque overlay. With JavaScript disabled, obviously it's not there to add the overlay and you're able to keep reading.

154

I keep see people complaining about not being able to find active communities that match their interests. So I've added a new feature to https://www.search-lemmy.com/ that allows you to search posts for a particular topic and then it tells you which communities have the most posts matching your search query.

And assuming that you've set your home instance correctly, those links will even open up in your home instance, so that you can subscribe directly to them.

For example, if you search for 'linux' (https://www.search-lemmy.com/find-communities/results?query=linux&page=1) it gives you a link to each community, tells you which instance it's on and how many matches it found for your query.

All of the same filters that you can use on the normal search can be used here as well. So if you just want to find the best community that mentions linux on lemmy.world (https://www.search-lemmy.com/find-communities/results?query=linux+instance%3Alemmy.world&page=1), you can filter by just that instance. Click on the Search Tips button to see a list of all of the available filters.

P.S. I'm aware of https://lemmyverse.net/ etc... and while those are great as well, this allows you to search to see what people are actually talking about on the various communities.

Again, if you have any feature requests or find any bugs, PLEASE reach out or ideally go to my github (https://github.com/marsara9/lemmy-search) and log a bug there.

248

A couple days ago I updated https://search-lemmy.com/ to 0.4.0.

New features, that several people were asking for:

  • The UI has been overhauled and it should be much easier to find your home instance now.
  • Search itself has been overhauled. Increase search performance significantly. I also automatically search for related terms as well. You may now see fewer search results, but ideally they should be more relevant. You can also now include basic syntax like:
    • quotes: "some terms that must be together"
    • negative terms: cat -dog (shows posts about cats that don't mention dogs)
    • either or: cat OR dog (shows posts about either cats or dogs). The default search behavior is now an implicit AND, but order doesn't matter.
  • I've added several new filters that you can use including:
    • !safeoff -- Disables safe search allowing NSFW posts to appear in the search results (NSFW is now hidden by default)
    • since:YYYY-MM-DD -- shows only posts that have occurred since the specified date
    • until:YYYY-MM-DD -- same as above but in reverse. It will only posts up to the given date.
  • I've removed the preferred-instance query parameter from the results URL so it should be easier to share links to search results now.
  • The date the post was created or last updated is now displayed in the search results.

Bug Fixes:

  • Site performance should now be stable. Fixed a bug related to the database pool that was causing the site to hang.
  • Fixed a bug that would cause broken links.
  • Fixed various bugs with the crawler causing posts to be missed.

Known Issues:

  • If you set your home-instance to a fairly small instance, the number of search results is also relatively small. Once (https://github.com/LemmyNet/lemmy/issues/3259) is resolved. I should be able to show links regardless of what your home instance is set to, allowing you to search the entire Fediverse.
  • Currently searching only looks at the post title and body. Comments aren't indexed either. This also is dependent on the above issue on Lemmy itself.

Finally some things to note:

I've started to refactor the code to abstract away Lemmy from the actual search engine. As I now start to prepare to search other Fediverse instances like Kbin, and maybe even Mastodon, etc...

[-] marsara9@lemmy.world 52 points 1 year ago

IMHO federation doesn't bring any real benefits to git and introduces a lot of risks.

The git protocol, if you will, already allows developers to backup and move their repositories as needed. And the primary concern with source control is having a stable and secure place to host it. GitHub already provides that, free of charge.

Introducing federation, how do you control who can and cannot make changes to your codebase? How do you ensure you maintain access if a server goes down?

So while it's nice that you can self host and federate git with GitLab, what value does that provide over the status quo? And how do those benefits outweigh the risks outlined above?

[-] marsara9@lemmy.world 13 points 1 year ago

https://www.search-lemmy.com/

https://www.github.com/marsara9/lemmy-search

It only works for Lemmy, for now. And please feel free to post any feature requests or bugs to GitHub as it's still fairly new.

You can also check my comment/post history for more details.

[-] marsara9@lemmy.world 21 points 1 year ago* (last edited 1 year ago)

https://www.search-lemmy.com

http://www.github.com/marsara9/lemmy-search

Just add community:[!nostupidquestions@lemmy.world](/c/nostupidquestions@lemmy.world) at the end of your query.

[-] marsara9@lemmy.world 14 points 1 year ago

It might not have been crawled yet. The search engine will periodically search for new content but this isn't instant. So it may take a day or two to find it.

[-] marsara9@lemmy.world 16 points 1 year ago

Not yet but I can add this feature

[-] marsara9@lemmy.world 14 points 1 year ago

Ya, now if everyone can stop finding bugs! So I can take some time off. /jk

[-] marsara9@lemmy.world 43 points 1 year ago

Yes but that search doesn't take you to the instance that you are logged into already. Which is one of my main goals with this site. While that did give me the inspiration for this and has the power of Google behind it, it lacks knowledge about how the fediverse actually works.

[-] marsara9@lemmy.world 14 points 1 year ago* (last edited 1 year ago)

I've already got some complaints about that. You can see one of the issues raised on GitHub.

At the moment, I'm only picking up mastodon posts that are federated to Lemmy, but you can't choose Mastodon as a preferred-instance, yet. When and/if I decide to add Mastodon support, I'll reach out to the admins over there to get feedback first.

Edit and note to any server admin: If you want to block the crawler from hitting your site, just add lemmy-search to your robots.txt and crawling will be prevented. But this doesn't stop cross-federation posts from being picked up on another instance.

1882
submitted 1 year ago* (last edited 1 year ago) by marsara9@lemmy.world to c/fediverse@lemmy.world

I shared bits and pieces of this before, but it's officially up and running now: https://www.search-lemmy.com/

This is an enhanced search engine for Lemmy. With a few primary goals:

  • You can choose a preferred instance. After choosing what your primary instance is, and performing a search ALL links will open in that instance.
  • This aims to be a replacement for using site:reddit.com in Google, but just for the fediverse.
  • You can filter the search results by:
    • Instance -- This will filter the results to only show communities that belong to a particular instance. Just type something like instance:lemmy.wrold or instance:https://lemmy.world/. This is separate from your preferred instance, such that you can search for posts on lemmy.world while still opening them on lemmy.ml.
    • Community -- You can refine the search by a specific community. You use the same syntax that you'd use here community:[!fediverse@lemmy.world](/c/fediverse@lemmy.world).
    • Author -- Similar to the above you can also filter by a specific author such as: author:@marsara9@lemmy.world.
  • The entire thing is open-source. You can view the code and even host your own instance... See more details here: https://github.com/marsara9/lemmy-search.

NOTE: This only supports Lemmy instances for now. Other fediverse type instances may be in the future depending on how this works out.

I've been working on this over just the last few weeks, so it hasn't had a chance to crawl much of the fediverse yet. For now it only supports lemmy.world and lemmy.ml but other preferred-instances will come online as time goes by.

If anyone finds any bugs, and I'm sure you will, or if anyone has any suggestions PLEASE raise an issue on GitHub for me to track. Lastly, if anyone wants to help contribute please feel free to reach out.

NOTE TO SERVER ADMINS: You can prevent your site from being crawled by adding lemmy-search to your robots.txt for the user-agent.

view more: next ›

marsara9

joined 1 year ago