1
0
mirror of https://github.com/unrealircd/unrealircd.git synced 2026-07-02 09:06:39 +02:00
Commit Graph

59 Commits

Author SHA1 Message Date
Bram Matthys 6bbcdfd1b3 Add spamfilter::rule (preconditions), add context to crule parser,
and add the first functions: online_time() and reputation().

The more interesting stuff will follow later...
2023-07-06 16:14:26 +02:00
Bram Matthys 4b4562516c Another attempt at UTF8-aware spamfilter.
This was previously tried at 19-apr-2020 in bc70882bd3
in UnrealIRCd 5.0.5. Sadly it had to be reverted immediately with a quick 5.0.5.1
release, all because of a PCRE2 100% CPU usage. Since then that bug has been fixed,
plus another bug. I'm now readding it "as an option" that is marked experimental.
Hopefully people test it out and can report back if it works well and then we can
make it the default someday.

This makes it a runtime setting so makes it much easier to switch back/forth if
there are any issues without recompiling anything. Had to use a bit more code now
though to handle the recompiling of spamfilters if the setting is changed.

Original issue was https://bugs.unrealircd.org/view.php?id=5187

* [Spamfilter](https://www.unrealircd.org/docs/Spamfilter) can be made UTF8-aware.
  * This is experimental, to enable: `set { spamfilter { utf8 yes; } }``
  * Case insensitive matches will then work better. For example, with extended
    Latin, a spamfilter on `ę` then also matches `Ę`.
  * Other PCRE2 features such as [\p](https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC5)
    can then be used. For example you can then set a spamfilter with the regex
    `\p{Arabic}` to block all Arabic script.
    Please do use these new tools with care. Blocking an entire language
    or script is quite a drastic measure.
  * As a consequence of this we require PCRE2 10.36 or newer. If your system
    PCRE2 is older than this will mean the UnrealIRCd-shipped-library version
    will be compiled and `./Config` may take a little longer than usual.
2023-03-22 09:00:31 +01:00
Bram Matthys 329fd07f3a Revert set::spamfilter::utf8-support from yesterday.
This will be for a later release, needs more thought and work.
2022-01-06 18:03:26 +01:00
Bram Matthys dedff543b5 Add option set::spamfilter::utf8-support which defaults to 'no' for now.
When you set this to 'yes' you get more options...
See next (modified) copy-paste from April 2020, which had to be reverted
because PCRE2 was broken. Now it's an opt-in and hopefully matured a bit.

This means:
* Case insensitive matches work better in UTF8 now, such as extended Latin.
  For example, a spamfilter on "ę" now also matches "Ę", while previously
  it did not catch this.
* Other PCRE2 features such as https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC5
  are now available. For example you can now set a spamfilter with the regex
  \p{Arabic} to block all Arabic script, or
  \p{Cyrillic} to block all Cyrillic script (such as Russian)
  Use these new tools with care, of course. Blocking an entire language,
  or script, is quite a drastic measure.

All of this was possible because of the new PCRE2_MATCH_INVALID_UTF
compile time option which was introduced in PCRE2 10.34. Now, that
version turned out to be buggy. As recent as PCRE 10.36 some major bugs
were fixed. This also means we now require at least PCRE2 10.36 version
so everyone can benefit from this new spamfilter UTF8 feature, IF they
enable set::spamfilter::utf8-support, that is.

Many systems come with older PCRE2 versions so this means we will
fall back to the shipped PCRE2 version in UnrealIRCd. This means
./Config will take a little longer to compile things.

For packagers (rpm/deb/ports): if you choose to patch configure to
not require such a recent PCRE2, then please do not allow enabling
of set::spamfilter::utf8-support since it will likely cause crashes
and misbehavior. Check PCRE2 changelog, CTRL+F at PCRE2_MATCH_INVALID_UTF
2022-01-05 18:08:52 +01:00
Bram Matthys fcf020b99e It's raining consts... 2021-09-11 09:56:22 +02:00
Bram Matthys f085173d46 More const char * stuff... mostly in conf.c but also elsewhere. 2021-09-10 15:01:23 +02:00
Bram Matthys 66a51fb659 Massive conversions from 'char *' to 'const char *' and 'char **' to 'const char **' 2021-09-10 12:46:31 +02:00
Bram Matthys 8d2f20ef41 Newlog: debug.c, match.c, module.c, random.c and then for
api-*.c log out of space in all circumstances.
2021-08-11 17:45:01 +02:00
Bram Matthys 05aeba9ba9 Get rid of Debug(()) function calls. I never use it anyway. 2021-07-12 18:54:38 +02:00
Bram Matthys d2efe01d9b Revert "UTF8 support in spamfilter. We now ship with PCRE2 10.34 and require this"
This reverts commit bc70882bd3.
2020-05-29 08:25:47 +02:00
Bram Matthys bc70882bd3 UTF8 support in spamfilter. We now ship with PCRE2 10.34 and require this
version or newer on the sytem, otherwise we fall back to shipped version.

This fixes https://bugs.unrealircd.org/view.php?id=5187 among others.
It means:
* Case insensitive matches work better in UTF8 now, such as extended Latin.
  For example, a spamfilter on "ę" now also matches "Ę", while previously
  it did not catch this.
* Other PCRE2 features such as https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC5
  are now available. For example you can now set a spamfilter with the regex
  \p{Arabic} to block all Arabic script, or
  \p{Cyrillic} to block all Cyrillic script (such as Russian)
  Use these new tools with care, of course. Blocking an entire language,
  or script, is quite a drastic measure.

All of this was possible because of the new PCRE2_MATCH_INVALID_UTF
compile time option which was introduced in PCRE2 10.34.
This also means we now require at least that PCRE2 version so
everyone can benefit from this new spamfilter UTF8 feature.
Many systems come with older PCRE2 versions so this means we will
fall back to the shipped PCRE2 version in UnrealIRCd. This means
./Config will take a little longer to compile things.

Although there is no indication as of now, but if this feature would
break things heavily then it might get reverted or configurable.
This is also why it was added just after 5.0.4 release and not right
before it, it needs some heavy testing.
2020-04-19 17:45:38 +02:00
GottemHams fac16fe1c0 match_* functions actually return 1 on match and not 0 :D 2019-12-22 14:48:04 +01:00
Bram Matthys 24c60fd85e Fix some doxygen tags (eg @notes to @note) 2019-10-26 09:33:09 +02:00
Bram Matthys 33c176e59e Juse in case pcre2_get_error_message() fails... 2019-10-11 11:17:29 +02:00
Bram Matthys f2e3712d62 Remove various if's and such that are now unneeded
This is part 5 of the memory function / caller changes.
2019-09-14 17:23:07 +02:00
Bram Matthys 9fc1e758ab Mass change of dst = strdup(str) to safe_strdup(dst,str) but with a manual
audit since 'dst' must now be initialized memory.
There's still a raw_strdup() if you insist.

This is step 2 of X of memory allocation changes
2019-09-14 16:58:01 +02:00
Bram Matthys de87b439b7 Update memory allocation routines. Step 1 of X. 2019-09-14 16:52:53 +02:00
Bram Matthys 7c6358024c Add 'natural order' string comparison to core: strnatcmp and strnatcasecmp
extern int strnatcmp(char const *a, char const *b);
extern int strnatcasecmp(char const *a, char const *b);
This will be handy for version comparisons. For example they will
return -1 (=lower) for things like ("1.4.9", "1.4.10"), unlike strcmp.

Also, some loosely related spelling fixes elsewhere.
2019-09-14 08:12:47 +02:00
Bram Matthys 70410b3f33 Remove unused variables (67 files done, will do rest another time). 2019-09-12 17:57:01 +02:00
Bram Matthys 23116d344a Give structs the same name as the typedefs. Rename aClient to Client,
aChannel to Channel, and some more. Third party module coders will
love this. But.. it makes things more logical and the doxygen output
will look more clean and logical as well.
(More changes will follow)
2019-09-11 09:48:00 +02:00
Bram Matthys 5e4c481d93 Yes, strcasecmp is always available, configure. 2019-09-09 16:30:02 +02:00
Bram Matthys ca2239827e Get rid of NICK_GB2312/NICK_GBK/NICK_GBK_JAP in config.h. I am not aware
of anyone actually using these. So running with this was rather untested
(if it worked at all, which I doubt).
2019-09-09 16:20:26 +02:00
Bram Matthys 0d2d4d5bca Rename match() and _match() to match_simple() -AND- invert the return value
of match_simple() and match_esc(). So, developers, be aware, this is how
you should use the function in a correct way:
if (match_simple("*fun*", str))
    printf("It was fun\n");

Rationale:
I've always been annoyed by the inversed logic, even though it was similar
to strcmp. So I've reverted it.
I could have chosen to maintain match() rather than this match_simple()
name, but this way I force (3rd party module) devs to update their function,
while otherwise everything would mysteriously fail due to the inverted logic.
2019-08-17 09:20:49 +02:00
Bram Matthys e1fcc3a667 Rename match() and _match() both to match_simple()
and get rid of the "bahamut optimized version".
Stage 1 of 2.
2019-08-17 09:15:34 +02:00
Bram Matthys c01c9248f5 Revert e428c77c47 (only to try again later) 2019-08-17 09:05:09 +02:00
Bram Matthys e428c77c47 match() -> match_nuh() and _match -> match_simple() 2019-08-17 08:56:18 +02:00
Bram Matthys 1108b58951 Remove old TRE regex engine. Hasn't been maintained since 2010
and has various outstanding crash and 100% CPU issues.
We have been encouraging the PCRE2 engine since the start of
UnrealIRCd 4 already.
TRE is being phased out of U4 by the end of the year, so we can
safely remove it in U5 already.
2019-05-25 10:42:46 +02:00
Bram Matthys 5c30d1af6d * Badword blocks now use PCRE2 if using regex at all (rare,
usually the fast badwords system is used instead)
* Code deduplication in src/modules/{chanmodes,usermodes}/censor.c
  to src/match.c -- which may be moved later again to efuncs.
* Add --without-tre:
  This means USE_TRE will be enabled by default right now
  but if using --without-tre it will be undef'ed. This so we
  can prepare for the TRE phase-out in 2020.
* Remove include/badwords.h, put contents in include/struct.h
2019-04-05 18:19:23 +02:00
Bram Matthys 704487e124 Fix numerous crash bugs in server to server code.
In 3.2.x we didn't fix these bugs since servers are trusted and
should send correct commands. In 4.0.x we changed this so we would
fix them when we come across such issues at normal priority (not
consider them security issues). I now took it a step further and
actively checked/looked for these issues and a bunch of them were
found. Almost all are NULL pointer dereferences, with some exceptions.
* S2S: MODE: check conv_param return value (NULL ptr crash)
* S2S: MODE: floodprot: More checks (NULL ptr crash)
* S2S: MODE: OOB write of NULL (write NULL past last element in an array)
* S2S: NICK: old compat fixes (NULL ptr crash)
* S2S: PROTOCTL: Check for double SID=
* S2S: SERVER: require at least 3 parameters (NULL ptr crash)
* S2S: SJOIN: require at least 3 parameters (NULL ptr crash)
* S2S: SJOIN: Fix OOB read (read 1 byte past buffer)
* S2S: TKL: validate set_at and expire_at (NULL ptr crash)
* S2S: TKL: require at least 9 parameters for spamf, not 8 (NULL ptr crash)
* S2S: TKL: ignore invalid spamfilter matching type (remove abort() call)
* S2S: TOPIC: querying for topic is not permitted (NULL ptr crash)
* S2S: UID: require 12 parameters (NULL ptr crash)
* S2S: WATCH: this is not a server command (NULL ptr crash)
* Fix OOB read (1 byte beyond string) for timevals. This was reachable
  from config code, TKL (S2S) and /*LINE (Oper). In practice no crash.
* MODE: make code less confusing (effectively no change)
* TRACE: remove strange output in case of 0 lines of output
* Fix unimportant memory leak on boot (#4713, reported by dg)
* Fix small memory leak upon 'DNS i' (oper only command)
* Always work on a copy in clean_ban_mask(). This fixes a bug that could
  result in a strlcpy(buf, buf, sizeof(buf)). So, overlapping strings,
  which is undefined behavior.
2017-10-29 11:20:52 +01:00
Bram Matthys ec9db8fd5f Move match_user() to module (efunc in m_tkl) 2017-03-18 15:00:34 +01:00
Bram Matthys 03b74f6163 Include string.h / silence warnings. 2016-12-30 15:30:59 +01:00
Bram Matthys 73ec3e3305 Fix IPv6 ban bug + fix a crash bug 2016-07-28 14:15:09 +02:00
Bram Matthys 67c998dc9f Adding a GLINE or KLINE on usermask@ did not have any effect. Reported by soretna (#4680).
Tizen, DBoyz and Valdebrick helped tracing the issue.
Removed MATCH_USE_IDENT since it had no useful purpose.. for all cases one has to check identd first and then non-identd anyway.
2016-05-22 15:44:28 +02:00
Bram Matthys 58b864edd5 Re-do CIDR and at the same time all the user matching stuff. Introducing match_user(mask, acptr, options): this should be used everywhere rather than the many DIY routines everywhere that create a nick!user@host and then run a match() on it.
The match_user() function is not been fully tested yet, at this point I'm happy we can compile again.
2015-07-28 13:26:03 +02:00
Bram Matthys 3cfee0f384 fix a number of /REHASH memleaks 2015-07-10 10:40:07 +02:00
Bram Matthys e1b7c34c96 Fix various warnings, including one reported by Adam: possible crash in aliases (introduced 1-2wks ago) 2015-06-07 22:07:00 +02:00
Bram Matthys 0eb9c9a36b PCRE2: enable JIT, free when no longer needed, fix & improve error message when an invalid regex is specified 2015-06-01 10:09:25 +02:00
Bram Matthys ecd06aa530 Now actually use PCRE2. 2015-06-01 09:51:33 +02:00
Travis McArthur 3b98eac4a9 Remove unnecessary gotos 2015-05-31 21:46:32 -04:00
Bram Matthys 58bd3cf60b Preparations for #4356 (experimental / on-going):
* add general matching framework (aMatch type, unreal_match_xxx functions)
* change spamfilter { } block syntax
* add support for simple wildcard matching (non-regex, just '?' and '*')
This is the initial commit so the new lib is not in yet, 'regex' is not
functional (but 'posix' and 'simple' are working), linking has not been
fully tested and no warnings are printed yet. IOTW: work in progress!
2015-05-30 21:11:11 +02:00
Bram Matthys fa9cf506e7 - The '?' wildcard was completely broken in 3.2.4, reported by tabrisnet (#0002797). 2006-02-05 17:20:36 +00:00
Bram Matthys dc19350c70 - Redid glob matching. Escaping is now ripped out for normal bans (as it should be), this
means no longer weird issues with +b *\* etc not banning nicks with \ in it.
  ExtBan ~c/~r get special treatment and will use our match_esc [match with escaping]
  routine, that way you can ban channels such as "#f*ck" via "+b ~c:#f\*ck".
  Fix triggered by bugreport of vonitsanet (#0002782).
2006-01-30 20:14:39 +00:00
Bram Matthys 6e70facb1e - Fixed(?) bug due to match() rewrite: we now use our old rules with escaping again, due to
the switchover we were accidently using different ones which caused funny kill messages
  like "You were killed by a.b.c (a!a.b.c (SOMENICK[N\A](?) <- d.e.f))." This also broke
  some bans in pre2/rc1. Bug reported by HERZ (#0002772).
2006-01-26 14:02:21 +00:00
Bram Matthys 5f272b56e7 - Switched over to an older match() routine based on hybrid, this one is a bit less optimized
but is actually understandable and has less bugs. This fixes +b ~c:#c\*t not properly
  matching #c*t, reported by Jason (#0002752). Initial results look good, but this needs
  some good testing ;).
2006-01-23 22:05:50 +00:00
Bram Matthys 8a9bae11fa - Made '?*' work correctly in wildcard matches, reported by Bugz (#2585). 2005-07-05 20:26:18 +00:00
Bram Matthys 8650c97cd3 - No longer cutoff nick upon illegal character -- just reject the whole nick. The nick is
still cutoff if the nick is too long. Basically this is the same way as Hybrid does it
  so it should work ok :).
- Added nick character system. This allows you to choose which (additional) characters
  to allow in nicks via set::allowed-nickchars. See unreal32docs.html -> section 3.16
  for a list of available languages and more info on how to use it.
  Current list: dutch, french, german, italian, spanish, euro-west, chinese-trad,
  chinese-simp, chinese-ja, chinese.
  If you wonder why your language is not yet included or why a certain mistake is present,
  then please understand that we are most likely not experienced (at all) in your language.
  If you are a native of your language (or know the language well), and your language
  is not included yet or you have some corrections, then contact syzop@vulnscan.org or
  report it as a bug on http://bugs.unrealircd.org/
2005-02-19 20:47:41 +00:00
codemastr 2b3fda5a10 Documented the default behavior of snomasks when /mode nick +s is used and added 'const' to the functions in match.c 2004-11-05 21:26:38 +00:00
Bram Matthys 426fbd9663 - Added "extended bans". An idea from SorceryNet ircd.
These bans look like ~<type>:<stuff>. Currently the following bans are available:
  ~q: quiet bans (ex: ~q:*!*@blah.blah.com). People matching these bans can join
      but are unable to speak, unless they have +v or higher.
  ~c: channel bans (ex: ~c:#idiots). People in #idiots are unable to join the channel.
  ~r: gecos (realname) bans (ex: ~r:*Stupid_bot_script*). If the realname of a user
      matches this then (s)he is unable to join.
      NOTE: an underscore ('_') matches both a space (' ') and an underscore ('_'),
            so this ban would match 'Stupid bot script v1.4'.

  These bantypes can also be used in the channel exception list (+e).
  +e ~r:*w00t* makes anyone with 'w00t' in their realname able to join,
  and +e ~c:#admin makes anyone in #admin able to join, etc..

  This system allows modules to add extended bantypes too.

  This feature requires some additional testing, also the module interface will
  probably be changed in the next few weeks, and perhaps more extended bans will
  be added before next release.. we'll see...
2003-12-19 23:39:30 +00:00
Bram Matthys 45e2b69a07 - Fixed a match() bug
In case of a mask like '*\' it was trying to read out of bounds data.
2003-03-09 03:07:59 +00:00
codemastr 3095782cfd Various fixes 2003-01-14 21:25:04 +00:00