view contrib/unicode2nginx/unicode-to-nginx.pl @ 7604:7aa20af4ac00

Rewrite: fixed segfault with rewritten URI and "alias". The "alias" directive cannot be used in the same location where URI was rewritten. This has been detected in the "rewrite ... break" case, but not when the standalone "break" directive was used. This change also fixes proxy_pass with URI component in a similar case: location /aaa/ { rewrite ^ /xxx/yyy; break; proxy_pass http://localhost:8080/bbb/; } Previously, the "/bbb/yyy" would be sent to a backend instead of "/xxx/yyy". And if location's prefix was longer than the rewritten URI, a segmentation fault might occur.
author Ruslan Ermilov <ru@nginx.com>
date Mon, 16 Dec 2019 15:19:01 +0300
parents 8752257e883f
children
line wrap: on
line source

#!/usr/bin/perl -w

# Convert unicode mappings to nginx configuration file format.

# You may find useful mappings in various places, including
# unicode.org official site:
#
# http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT
# http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT

# Needs perl 5.6 or later.

# Written by Maxim Dounin, mdounin@mdounin.ru

###############################################################################

require 5.006;

while (<>) {
	# Skip comments and empty lines

	next if /^#/;
	next if /^\s*$/;
	chomp;

	# Convert mappings

	if (/^\s*0x(..)\s*0x(....)\s*(#.*)/) {
		# Mapping <from-code> <unicode-code> "#" <unicode-name>
		my $cs_code = $1;
		my $un_code = $2;
		my $un_name = $3;

		# Produce UTF-8 sequence from character code;

		my $un_utf8 = join('',
			map { sprintf("%02X", $_) }
			unpack("U0C*", pack("U", hex($un_code)))
		);

		print "    $cs_code  $un_utf8 ; $un_name\n";

	} else {
		warn "Unrecognized line: '$_'";
	}
}

###############################################################################