view contrib/unicode2nginx/unicode-to-nginx.pl @ 7914:9cf043a5d9ca

Request body: reading body buffering in filters. If a filter wants to buffer the request body during reading (for example, to check an external scanner), it can now do so. To make it possible, the code now checks rb->last_saved (introduced in the previous change) along with rb->rest == 0. Since in HTTP/2 this requires flow control to avoid overflowing the request body buffer, so filters which need buffering have to set the rb->filter_need_buffering flag on the first filter call. (Note that each filter is expected to call the next filter, so all filters will be able set the flag if needed.)
author Maxim Dounin <mdounin@mdounin.ru>
date Sun, 29 Aug 2021 22:22:02 +0300
parents 8752257e883f
children
line wrap: on
line source

#!/usr/bin/perl -w

# Convert unicode mappings to nginx configuration file format.

# You may find useful mappings in various places, including
# unicode.org official site:
#
# http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
# http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT

# Needs perl 5.6 or later.

# Written by Maxim Dounin, mdounin@mdounin.ru

###############################################################################

require 5.006;

while (<>) {
	# Skip comments and empty lines

	next if /^#/;
	next if /^\s*$/;
	chomp;

	# Convert mappings

	if (/^\s*0x(..)\s*0x(....)\s*(#.*)/) {
		# Mapping <from-code> <unicode-code> "#" <unicode-name>
		my $cs_code = $1;
		my $un_code = $2;
		my $un_name = $3;

		# Produce UTF-8 sequence from character code;

		my $un_utf8 = join('',
			map { sprintf("%02X", $_) }
			unpack("U0C*", pack("U", hex($un_code)))
		);

		print "    $cs_code  $un_utf8 ; $un_name\n";

	} else {
		warn "Unrecognized line: '$_'";
	}
}

###############################################################################