This was previously tried at 19-apr-2020 in bc70882bd3
in UnrealIRCd 5.0.5. Sadly it had to be reverted immediately with a quick 5.0.5.1
release, all because of a PCRE2 100% CPU usage. Since then that bug has been fixed,
plus another bug. I'm now readding it "as an option" that is marked experimental.
Hopefully people test it out and can report back if it works well and then we can
make it the default someday.
This makes it a runtime setting so makes it much easier to switch back/forth if
there are any issues without recompiling anything. Had to use a bit more code now
though to handle the recompiling of spamfilters if the setting is changed.
Original issue was https://bugs.unrealircd.org/view.php?id=5187
* [Spamfilter](https://www.unrealircd.org/docs/Spamfilter) can be made UTF8-aware.
* This is experimental, to enable: `set { spamfilter { utf8 yes; } }``
* Case insensitive matches will then work better. For example, with extended
Latin, a spamfilter on `ę` then also matches `Ę`.
* Other PCRE2 features such as [\p](https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC5)
can then be used. For example you can then set a spamfilter with the regex
`\p{Arabic}` to block all Arabic script.
Please do use these new tools with care. Blocking an entire language
or script is quite a drastic measure.
* As a consequence of this we require PCRE2 10.36 or newer. If your system
PCRE2 is older than this will mean the UnrealIRCd-shipped-library version
will be compiled and `./Config` may take a little longer than usual.
This fixes the fix in 8d228f5dbe.
(--enable-opt in sodium enables additional CPU-specific optimization,
--enable-opt in jansson does not exist and raised a warning)
Clang 16 makes -Wimplicit-function-declaration error by default.
Unfortunately, this can lead to misconfiguration or miscompilation of software as configure
tests may then return the wrong result.
We also fix -Wstrict-prototypes while here as it's easy to do and it prepares us for C23.
This fixes a possible crash when using RPC with unix domain sockets,
reported by Valware.
This also adds a configure check so we use our own strlncat if the
C library does not have one, e.g. some non-Linux.
When you set this to 'yes' you get more options...
See next (modified) copy-paste from April 2020, which had to be reverted
because PCRE2 was broken. Now it's an opt-in and hopefully matured a bit.
This means:
* Case insensitive matches work better in UTF8 now, such as extended Latin.
For example, a spamfilter on "ę" now also matches "Ę", while previously
it did not catch this.
* Other PCRE2 features such as https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC5
are now available. For example you can now set a spamfilter with the regex
\p{Arabic} to block all Arabic script, or
\p{Cyrillic} to block all Cyrillic script (such as Russian)
Use these new tools with care, of course. Blocking an entire language,
or script, is quite a drastic measure.
All of this was possible because of the new PCRE2_MATCH_INVALID_UTF
compile time option which was introduced in PCRE2 10.34. Now, that
version turned out to be buggy. As recent as PCRE 10.36 some major bugs
were fixed. This also means we now require at least PCRE2 10.36 version
so everyone can benefit from this new spamfilter UTF8 feature, IF they
enable set::spamfilter::utf8-support, that is.
Many systems come with older PCRE2 versions so this means we will
fall back to the shipped PCRE2 version in UnrealIRCd. This means
./Config will take a little longer to compile things.
For packagers (rpm/deb/ports): if you choose to patch configure to
not require such a recent PCRE2, then please do not allow enabling
of set::spamfilter::utf8-support since it will likely cause crashes
and misbehavior. Check PCRE2 changelog, CTRL+F at PCRE2_MATCH_INVALID_UTF
"./unrealircd reloadtls" and there is now also a "./unrealircd status"
The output is colorized if the terminal supports it (just like on the
boot screen) and also the exit status is 0 for success and non-0 for
failure. The purpose of all this is that you can easily detect rehash
errors on the command line.
These three commands communicate to UnrealIRCd via the new control
UNIX socket, which is in ~/data/unrealircd.ctl.
This also does a lot of other stuff because we now have an internal
tool called bin/unrealircdctl which is called by ./unrealircd for
some of the commands to communicate to the unrealircd.ctl socket.
Later on more of the existing functionality may be moved to that
tool and we may also provide it on Windows in CLI mode so people
have more of the same functionality as on *NIX.