(The pic is sample output for an arbitrary query on “vegan vege pesc”. Irrelevant side note: there is no free-world venue for pescatarians.. just one in L/W that scrolled off the screen)
CF
The federation is not wholly decentralised, obviously, when giant centralised fiefdoms like Facebook “Threads”™ and Cloudflare hook in their technofeudal variety of oppressive infra and abuse their power.
Each post submission begins with finding a relevant venue for the content and it must be consistent with my sense of ethics. Cloudflare is automatically nixed because it’s inherently centralised in a walled garden (regardless of the user count for any given node). CF is a non-starter for an open, free, and fair society (fair implying power balance, equality, transparency, etc).
My script queries the catalog of communities for relevant venues. It still prints the Cloudflare walled garden because it’s useful to see what names match my regex queries, which sometimes helps form a better query. It’s the only thing #LemmyWorld is good for (a shit-ton of community names with redundant variations of the same subject matter). Those results are in red, tagged with a thundercloud (🌩 ), and printed first (because when they scroll off the terminal I don’t typically care to scroll up to see them).
non-CF
CF is not the only issue. Some non-CF nodes are centralized due to uncontrolled growth to disproportionately large sizes. I don’t cancel them hard-and-fast like CF nodes, but they get treated with low “last resort” favorability. They have the warning symbol (⚠) and are in yellow.
Is my math decent?
My script began by filtering on total user count. Then I realised dead or dormant users probably should not count because such users don’t really contribute to a node’s disproportionate power over a population. It’s active users that matter. But if the number of active users in a day are filtered on, that’s too dynamic for deciding where my post can live for a month or however long it is relevant. So I took the users_active_half_year
count. Is that sensible?
What constitutes an “active” user, simply logging in, or commenting?
The line is drawn at 2 standard deviations above the average -- after tossing outliers. Nodes with less than 5 active users in ½ a year are likely 1-person nodes which do not influence the average. The average is around 320 active ½yr users per node. The standard deviation is ~702 users. My statistical competence is rusty for sure, but I’m a bit bothered by a standard deviation that’s more than double the mean. Seems like a variation so wild it should perhaps be disregarded. Nonetheless, I opted to flag nodes that exceed ~1724 users_active_half_year
.
The pseudocode looks like this:
avg=$(sqlite3 "$db" 'select round(avg([counts.users_active_half_year])) from node_tbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4')
variance=$(sqlite3 "$db" 'select avg(([counts.users_active_half_year] - subtbl.aua) * ([counts.users_active_half_year] - subtbl.aua)) as var from node_tbl, (select avg([counts.users_active_half_year]) as aua from node_tbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4) as subtbl where tags not like "%cloudflare%" and [counts.users_active_half_year] > 4;')
sqlite3 "$db" "select case when baseurl in (select baseurl from node_tbl where [counts.users_active_half_year] > $avg+sqrt($variance)*2) then '$yellow⚠' else '$cyan' end||baseurl||'$reset',name from community_tbl where (name like '%${1}%' or desc like '%${1}%') and baseurl not in (select baseurl from node_tbl where tags like '%cloudflare%') order by baseurl,name"
Code is ugly because sqlite does not have a stdev builtin function.
My other thought is to cut slack for closed nodes because at least they are expected to shrink. To list the possible figures to filter on, this is a record for lemmy.ml (the biggest non-Cloudflare node):
record for lemmy.ml
url = https://lemmy.ml/
baseurl = lemmy.ml
name = Lemmy
desc = A community of privacy and FOSS enthusiasts, run by Lemmy’s developers
downvotes = 1
nsfw = 1
create_admin = 0
private = 0
fed = 1
version = 0.19.12
open = 1
usage.users.total = 54790
usage.users.activeHalfyear = 4201
usage.users.activeMonth = 2125
usage.localPosts = 167331
usage.localComments = 818559
counts.site_id = 1
counts.users = 54790
counts.posts = 167331
counts.comments = 818559
counts.communities = 4608
counts.users_active_day = 947
counts.users_active_week = 1496
counts.users_active_month = 2125
counts.users_active_half_year = 4201
icon = https://lemmy.ml/pictrs/image/fa6d9660-4f1f-4e90-ac73-b897216db6f3.png
banner =
langs = ["all"]
date = 2019-04-20T18:53:54.608882Z
published = 1555786434000
time = 1751974533970
score =
uptime.domain = lemmy.ml
uptime.latency = 0.034
uptime.countryname = France
uptime.uptime_alltime = 99.04
uptime.date_created =
uptime.date_updated = 2021-10-29 15:09:21
uptime.date_laststats = 2025-04-11 21:03:25
uptime.score = 100
uptime.status = 1
isSuspicious = 0
metrics.usersTotal = 54790
metrics.usersMonth = 2125
metrics.usersWeek = 1496
metrics.totalActivity = 985890
metrics.localPosts = 167331
metrics.localComments = 818559
metrics.averageUsers = 50720.8825256975
metrics.biggestJump = 225
metrics.averagePerMinute = 0.02475
metrics.userActivityScore = 0.055574151274483
metrics.activityUserScore = 17.9939770031028
metrics.userActiveMonthScore = 25.7835294117647
tags = []
susReason = []
trust.lastCrawled = 1751974533970
trust.baseurl = lemmy.ml
trust.metrics.usersTotal = 54790
trust.metrics.usersMonth = 2125
trust.metrics.usersWeek = 1496
trust.metrics.totalActivity = 985890
trust.metrics.localPosts = 167331
trust.metrics.localComments = 818559
trust.metrics.averageUsers = 50720.8825256975
trust.metrics.biggestJump = 225
trust.metrics.averagePerMinute = 0.02475
trust.metrics.userActivityScore = 0.055574151274483
trust.metrics.activityUserScore = 17.9939770031028
trust.metrics.userActiveMonthScore = 25.7835294117647
trust.users = 54790
trust.name = Lemmy
trust.base = lemmy.ml
trust.actor_id = https://lemmy.ml/
trust.tags = []
trust.guarantor = fediseer.com
trust.endorsements = 17
trust.score = 598.1875
trust.reasons = []
blocks.incoming = 0
blocks.outgoing = 0
blocked = []
Some communities missing from the Lemmyverse DB - why?
Anyone know why some slrpnk.net communities are in the Lemmyverse DB, and some are not? E.g. why is !nolawns@slrpnk.net missing, despite many others from the same node that are included?
More importantly, what’s the fix apart from crawling all the nodes (which would probably be unwelcome)? Is there another open DB apart from Lemmyverse? There is fediverse.space and fediverse.observer, but they don’t appear to be sharing their data.