1
2
submitted 11 months ago by ace@lemmy.ananace.dev to c/btrfs@lemmy.ml

The fscrypt work continues to steadily plod along, really hoping that there won't need to be many more version of the patchset, especially seeing as a bunch of the non-BTRFS-specific work has already landed.

2
3
submitted 11 months ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
3
6
submitted 11 months ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
4
3
submitted 11 months ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml

cross-posted from: https://feddit.uk/post/4577666

Was looking at how to set up snapper on Fedora 39 and came across the ever knowledgable Stephens tech talks video. It does balance, setting up snapper, sub-volume management in a really cool GUI tool.

edit updated the link as the GitHub page was apparently ood, but it is in most repo's

5
1
duperemove speedups (trofi.github.io)
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
6
1
submitted 1 year ago by ace@lemmy.ananace.dev to c/btrfs@lemmy.ml

Looks like it's v2 time.

The btrfs-progs -side patch is here.

7
1
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
8
1
submitted 1 year ago* (last edited 1 year ago) by u202307011927@feddit.de to c/btrfs@lemmy.ml

Update:

With the native Manjaro installer I succeeded in making my disk encrypted. But it's below the btrfs layer (btrfs sits inside the encryption)

9
2
submitted 1 year ago by geoff@lemm.ee to c/btrfs@lemmy.ml

Just wanted to share some love for this filesystem.

I’ve been running a btrfs raid1 continuously for over ten years, on a motley assortment of near-garbage hard drives of all different shapes and sizes. None of the original drives are still in it, and that server is now on its fourth motherboard. The data has survived it all!

It’s grown to 6 drives now, and most recently survived the runtime failure of a SATA controller card that four of them were attached to. After replacing it, I was stunned to discover that the volume was uncorrupted and didn’t even require repair.

So knock on wood — I’m not trying to tempt fate here. I just want to say thank you to all the devs for their hard work, and add some positive feedback to the heap since btrfs gets way more than it’s fair share of flak, which I personally find to be undeserved. Cheers!

10
1
Btrfs progs release 6.3.3 (lore.kernel.org)
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml

Hi,

btrfs-progs version 6.3.3 have been released. This is a bugfix release.

There are two bug fixes, the rest is CI work, documentation updates and some preparatory work. Due to no other significant changes queued, the release 6.4 will be most likely skipped.

Changelog:

  • add btrfs-find-root to btrfs.box
  • replace: properly enqueue if there's another replace running
  • other:
    • CI updates, more tests enabled, code coverage, badges
    • documentation updates
    • build warning fixes
11
1
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
12
1
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
This is a changeset adding encryption to btrfs. It is not complete; it
does not support inline data or verity or authenticated encryption. It
is primarily intended as a proof that the fscrypt extent encryption
changeset it builds on work. 

As per the design doc refined in the fall of last year [1], btrfs
encryption has several steps: first, adding extent encryption to fscrypt
and then btrfs; second, adding authenticated encryption support to the
block layer, fscrypt, and then btrfs; and later adding potentially the
ability to change the key used by a directory (either for all data or
just newly written data) and/or allowing use of inline extents and
verity items in combination with encryption and/or enabling send/receive
of encrypted volumes. As such, this change is only the first step and is
unsafe.

This change does not pass a couple of encryption xfstests, because of
different properties of extent encryption. It hasn't been tested with
direct IO or RAID. Because currently extent encryption always uses inline
encryption (i.e. IO-block-only) for data encryption, it does not support
encryption of inline extents; similarly, since btrfs stores verity items
in the tree instead of in inline encryptable blocks on disk as other
filesystems do, btrfs cannot currently encrypt verity items. Finally,
this is insecure; the checksums are calculated on the unencrypted data
and stored unencrypted, which is a potential information leak. (This
will be addressed by authenticated encryption).

This changeset is built on two prior changesets to fscrypt: [2] and [3]
and should have no effect on unencrypted usage.

[1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
[2]
https://lore.kernel.org/linux-fscrypt/cover.1687988119.git.sweettea-kernel@dorminy.me/
[3]
https://lore.kernel.org/linux-fscrypt/cover.1687988246.git.sweettea-kernel@dorminy.me
13
1
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml
This changeset adds extent-based data encryption to fscrypt.
Some filesystems need to encrypt data based on extents, rather than on
inodes, due to features incompatible with inode-based encryption. For
instance, btrfs can have multiple inodes referencing a single block of
data, and moves logical data blocks to different physical locations on
disk in the background. 

As per discussion last year in [1] and later in [2], we would like to
allow the use of fscrypt with btrfs, with authenticated encryption. This
is the first step of that work, adding extent-based encryption to
fscrypt; authenticated encryption is the next step. Extent-based
encryption should be usable by other filesystems which wish to support
snapshotting or background data rearrangement also, but btrfs is the
first user. 

This changeset requires extent encryption to use inlinecrypt, as
discussed previously. There are two questionable parts: the
forget_extent_info hook is not yet in use by btrfs, as I haven't yet
written a test exercising a race where it would be relevant; and saving
the session key credentials just to enable v1 session-based policies is
perhaps less good than 

This applies atop [3], which itself is based on kdave/misc-next. It
passes most encryption fstests with suitable changes to btrfs-progs, but
not generic/580 or generic/595 due to different timing involved in
extent encryption. Tests and btrfs progs updates to follow.


[1] https://docs.google.com/document/d/1janjxewlewtVPqctkWOjSa7OhCgB8Gdx7iDaCDQQNZA/edit?usp=sharing
[2] https://lore.kernel.org/linux-fscrypt/80496cfe-161d-fb0d-8230-93818b966b1b@dorminy.me/T/#t
[3]
https://lore.kernel.org/linux-fscrypt/cover.1687988119.git.sweettea-kernel@dorminy.me/

14
1
submitted 1 year ago* (last edited 1 year ago) by Atemu@lemmy.ml to c/btrfs@lemmy.ml
btrfs quota groups (qgroups) are a compelling feature of btrfs that
allow flexible control for limiting subvolume data and metadata usage.
However, due to btrfs's high level decision to tradeoff snapshot
performance against ref-counting performance, qgroups suffer from
non-trivial performance issues that make them unattractive in certain
workloads. Particularly, frequent backref walking during writes and
during commits can make operations increasingly expensive as the number
of snapshots scales up. For that reason, we have never been able to
commit to using qgroups in production at Meta, despite significant
interest from people running container workloads, where we would benefit
from protecting the rest of the host from a buggy application in a
container running away with disk usage.  This patch series introduces a simplified version of qgroups called
simple quotas (squotas) which never computes global reference counts
for extents, and thus has similar performance characteristics to normal,
quotas disabled, btrfs. The "trick" is that in simple quotas mode, we
account all extents permanently to the subvolume in which they were
originally created. That allows us to make all accounting 1:1 with
extent item lifetime, removing the need to walk backrefs. However, this sacrifices the ability to compute shared vs. exclusive usage. It also
results in counter-intuitive, though still predictable and simple,
accounting in the cases where an original extent is removed while a
shared copy still exists. Qgroups is able to detect that case and count
the remaining copy as an exclusive owner, while squotas is not. As a
result, squotas works best when the original extent is immutable and
outlives any clones.

==Format Change==
In order to track the original creating subvolume of a data extent in
the face of reflinks, it is necessary to add additional accounting to
the extent item. To save space, this is done with a new inline ref item.
However, the downside of this approach is that it makes enabling squota
an incompat change, denoted by the new incompat bit SIMPLE_QUOTA. When
this bit is set and quotas are enabled, new extent items get the extra
accounting, and freed extent items check for the accounting to find
their creating subvolume. In addition, 1:1 with this incompat bit,
the quota status item now tracks a "quota enablement generation" needed
for properly handling deleting extents with predate enablement.

==API==
Squotas reuses the api of qgroups. The only difference is that when you
enable quotas via `btrfs quota enable`, you pass the `--simple` flag.
Squotas will always report exclusive == shared for each qgroup. Squotas
deal with extent_item/metadata_item sizes and thus do not do anything
special with compression. Squotas also introduce auto inheritance for
nested subvols. The API is documented more fully in the documentation
patches in btrfs-progs.

==Testing methodology==
Using updated btrfs-progs and fstests (relevant matching patch sets to
be sent ASAP)
btrfs-progs: https://github.com/boryas/btrfs-progs/tree/squota-progs
fstests: https://github.com/boryas/fstests/tree/squota-test

I ran '-g auto' on fstests on the following configurations:
1a) baseline kernel/progs/fstests.
1b) squota kernel baseline progs/fstests.
2a) baseline kernel/progs/fstests. fstests configured to mkfs with quota
2b) squota kernel/progs/fstests. fstests configured to mkfs with squota

I compared 1a against 1b and 2a against 2b and detected no regressions.
2a/2b both exhibit regressions against 1a/1b which are largely issues
with quota reservations in various complicated cases. I intend to run
those down in the future, but they are not simple quota specific, as
they are already broken with plain qgroups.

==Performance Testing==
I measured the performance of the change using fsperf. I ran with 3
configurations using the squota kernel:
- plain mkfs
- qgroup mkfs
- squota mkfs
And added a new performance test which creates 1000 files in a subvol,
creates 100 snapshots of that subvol, then unshares extents in files in
the snapshots. I measured write performance with fio and btrfs commit
critical section performance side effects with bpftrace on
'wait_current_trans'.

The results for the test which measures unshare perf (unshare.py) with
qgroup and squota compared to the baseline:

group test results
unshare results
          metric              baseline       current        stdev            diff
========================================================================================
avg_commit_ms                     162.13        285.75          3.14     76.24%
bg_count                              16            16             0      0.00%
commits                           378.20           379          1.92      0.21%
elapsed                           201.40        270.40          1.34     34.26%
end_state_mount_ns           26036211.60   26004593.60    2281065.40     -0.12%
end_state_umount_ns             2.45e+09      2.55e+09   20740154.41      3.93%
max_commit_ms                     425.80           594         53.34     39.50%
sys_cpu                             0.10          0.06          0.06    -42.15%
wait_current_trans_calls         2945.60       3405.20         47.08     15.60%
wait_current_trans_ns_max       1.56e+08      3.43e+08   32659393.25    120.07%
wait_current_trans_ns_mean    1974875.35   28588482.55    1557588.84   1347.61%
wait_current_trans_ns_min            232           232         25.88      0.00%
wait_current_trans_ns_p50            718           740         22.80      3.06%
wait_current_trans_ns_p95     7711770.20      2.21e+08   17241032.09   2761.19%
wait_current_trans_ns_p99    67744932.29      2.68e+08   41275815.87    295.16%
write_bw_bytes                 653008.80     486344.40       4209.91    -25.52%
write_clat_ns_mean            6251404.78    8406837.89      39779.15     34.48%
write_clat_ns_p50             1656422.40    1643315.20      27415.68     -0.79%
write_clat_ns_p99               1.90e+08      3.20e+08       2097152     68.62%
write_io_kbytes                   128000        128000             0      0.00%
write_iops                        159.43        118.74          1.03    -25.52%
write_lat_ns_max                7.06e+08      9.80e+08   47324816.61     38.88%
write_lat_ns_mean             6251503.06    8406936.06      39780.83     34.48%
write_lat_ns_min                    3354          4648        616.06     38.58%

squota test results
unshare results
          metric              baseline       current        stdev            diff
========================================================================================
avg_commit_ms                     162.13        164.16          3.14      1.25%
bg_count                              16             0             0   -100.00%
commits                           378.20        380.80          1.92      0.69%
elapsed                           201.40        208.20          1.34      3.38%
end_state_mount_ns           26036211.60   25840729.60    2281065.40     -0.75%
end_state_umount_ns             2.45e+09      3.01e+09   20740154.41     22.80%
max_commit_ms                     425.80        415.80         53.34     -2.35%
sys_cpu                             0.10          0.08          0.06    -23.36%
wait_current_trans_calls         2945.60       2981.60         47.08      1.22%
wait_current_trans_ns_max       1.56e+08      1.12e+08   32659393.25    -27.86%
wait_current_trans_ns_mean    1974875.35    1064734.76    1557588.84    -46.09%
wait_current_trans_ns_min            232           238         25.88      2.59%
wait_current_trans_ns_p50            718           746         22.80      3.90%
wait_current_trans_ns_p95     7711770.20       1567.60   17241032.09    -99.98%
wait_current_trans_ns_p99    67744932.29   49880514.27   41275815.87    -26.37%
write_bw_bytes                 653008.80        631256       4209.91     -3.33%
write_clat_ns_mean            6251404.78    6476816.06      39779.15      3.61%
write_clat_ns_p50             1656422.40       1581056      27415.68     -4.55%
write_clat_ns_p99               1.90e+08      1.94e+08       2097152      2.21%
write_io_kbytes                   128000        128000             0      0.00%
write_iops                        159.43        154.12          1.03     -3.33%
write_lat_ns_max                7.06e+08      7.65e+08   47324816.61      8.38%
write_lat_ns_mean             6251503.06    6476912.76      39780.83      3.61%
write_lat_ns_min                    3354          4062        616.06     21.11%

And the same, but only showing results where the deviation was outside
of a 95% confidence interval for the mean (default significance
highlighting in fsperf):
qgroup test results
unshare results
          metric              baseline       current        stdev            diff
========================================================================================
avg_commit_ms                     162.13        285.75          3.14     76.24%
elapsed                           201.40        270.40          1.34     34.26%
end_state_umount_ns             2.45e+09      2.55e+09   20740154.41      3.93%
max_commit_ms                     425.80           594         53.34     39.50%
wait_current_trans_calls         2945.60       3405.20         47.08     15.60%
wait_current_trans_ns_max       1.56e+08      3.43e+08   32659393.25    120.07%
wait_current_trans_ns_mean    1974875.35   28588482.55    1557588.84   1347.61%
wait_current_trans_ns_p95     7711770.20      2.21e+08   17241032.09   2761.19%
wait_current_trans_ns_p99    67744932.29      2.68e+08   41275815.87    295.16%
write_bw_bytes                 653008.80     486344.40       4209.91    -25.52%
write_clat_ns_mean            6251404.78    8406837.89      39779.15     34.48%
write_clat_ns_p99               1.90e+08      3.20e+08       2097152     68.62%
write_iops                        159.43        118.74          1.03    -25.52%
write_lat_ns_max                7.06e+08      9.80e+08   47324816.61     38.88%
write_lat_ns_mean             6251503.06    8406936.06      39780.83     34.48%
write_lat_ns_min                    3354          4648        616.06     38.58%

squota test results
unshare results
          metric              baseline       current        stdev            diff
========================================================================================
elapsed                           201.40        208.20          1.34      3.38%
end_state_umount_ns             2.45e+09      3.01e+09   20740154.41     22.80%
write_bw_bytes                 653008.80        631256       4209.91     -3.33%
write_clat_ns_mean            6251404.78    6476816.06      39779.15      3.61%
write_clat_ns_p50             1656422.40       1581056      27415.68     -4.55%
write_clat_ns_p99               1.90e+08      1.94e+08       2097152      2.21%
write_iops                        159.43        154.12          1.03     -3.33%
write_lat_ns_mean             6251503.06    6476912.76      39780.83      3.61%

Particularly noteworthy are the massive regressions to
wait_current_trans in qgroup mode as well as the solid regressions to
bandwidth, iops and write latency. The regressions/improvements in
squotas are modest in comparison in line with the expectation. I am
still investigating the squota umount regression, particularly whether
it is in the umount's final commit and represents a real performance
problem with squotas.

Link: https://github.com/boryas/btrfs-progs/tree/squota-progs
Link: https://github.com/boryas/fstests/tree/squota-test
Link: https://github.com/boryas/fsperf/tree/unshare-victim
15
1
submitted 1 year ago by Atemu@lemmy.ml to c/btrfs@lemmy.ml

Hi,

there are mainly core changes, refactoring and optimizations. Performance is improved in some areas, overall there may be a cumulative improvement due to refactoring that removed lookups in the IO path or simplified IO submission tracking.

No merge conflicts. Please pull, thanks.

Core:

  • submit IO synchronously for fast checksums (crc32c and xxhash), remove high priority worker kthread

  • read extent buffer in one go, simplify IO tracking, bio submission and locking

  • remove additional tracking of redirtied extent buffers, originally added for zoned mode but actually not needed

  • track ordered extent pointer in bio to avoid rbtree lookups during IO

  • scrub, use recovered data stripes as cache to avoid unnecessary read

  • in zoned mode, optimize logical to physical mappings of extents

  • remove PageError handling, not set by VFS nor writeback

  • cleanups, refactoring, better structure packing

  • lots of error handling improvements

  • more assertions, lockdep annotations

  • print assertion failure with the exact line where it happens

  • tracepoint updates

  • more debugging prints

Performance:

  • speedup in fsync(), better tracking of inode logged status can avoid transaction commit

  • IO path structures track logical offsets in data structures and does not need to look it up

User visible changes:

  • don't commit transaction for every created subvolume, this can reduce time when many subvolumes are created in a batch

  • print affected files when relocation fails

  • trigger orphan file cleanup during START_SYNC ioctl

Notable fixes:

  • fix crash when disabling quota and relocation

  • fix crashes when removing roots from drity list

  • fix transacion abort during relocation when converting from newer profiles not covered by fallback

  • in zoned mode, stop reclaiming block groups if filesystem becomes read-only

  • fix rare race condition in tree mod log rewind that can miss some btree node slots

  • with enabled fsverity, drop up-to-date page bit in case the verification fails

16
1
Btrfs progs release 6.3.2 (lore.kernel.org)
submitted 1 year ago* (last edited 1 year ago) by Atemu@lemmy.ml to c/btrfs@lemmy.ml

Changelog:

  • build: fix mkfs on big endian hosts
  • mkfs: don't print changed defaults notice with --quiet
  • scrub: fix wrong stats of processed bytes in background and foreground mode
  • convert: actually create free-space-tree instead of v1 space cache
  • print-tree: recognize and print CHANGING_FSID_V2 flag (for the metadata_uuid change in progress)
  • other:
    • documentation updates

btrfs

0 readers
1 users here now

founded 1 year ago
MODERATORS