NAME utf8::all - turn on Unicode - all of it VERSION version 0.022 SYNOPSIS use utf8::all; # Turn on UTF-8, all of it. open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here print length 'føø bār'; # 7 UTF-8 characters my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main) DESCRIPTION The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions. utf8::all goes further: * charnames are imported so \N{...} sequences can be used to compile Unicode characters based on names. * On Perl v5.11.0 or higher, the use feature 'unicode_strings' is enabled. * use feature fc and use feature unicode_eval are enabled on Perl 5.16.0 and higher. * Filehandles are opened with UTF-8 encoding turned on by default (including STDIN, STDOUT, STDERR). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to set binmode $filehandle. * @ARGV gets converted from UTF-8 octets to Unicode characters (when utf8::all is used from the main package). This is similar to the behaviour of the -CA perl command-line switch (see perlrun). * readdir, readlink, readpipe (including the qx// and backtick operators), and glob (including the <> operator) now all work with and return Unicode characters instead of (UTF-8) octets. Lexical Scope The pragma is lexically-scoped, so you can do the following if you had some reason to: { use utf8::all; open my $out, '>', 'outfile'; my $utf8_str = 'føø bār'; print length $utf8_str, "\n"; # 7 print $out $utf8_str; # out as utf8 } open my $in, '<', 'outfile'; # in as raw my $text = do { local $/; <$in>}; print length $text, "\n"; # 10, not 7! Instead of lexical scoping, you can also use no utf8::all to turn off the effects. Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file handles is always global! UTF-8 Errors utf8::all will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error. For glob, readdir, and readlink, one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK". ATTRIBUTES $utf8::all::UTF8_CHECK By default utf8::all marks decoding errors as fatal (default value for this setting is Encode::FB_CROAK). If you want, you can change this by setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the encoding errors as warnings, and Encode::FB_DEFAULT will completely ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is always enforced. Important: Only controls the handling of decoding errors in glob, readdir, and readlink. INTERACTION WITH AUTODIE If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012 . Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 and GH #7 . BUGS Please report any bugs or feature requests on the bugtracker website . When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature. COMPATIBILITY The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The readlink and readdir functions and glob operators will therefore not be replaced on these systems. SEE ALSO * File::Find::utf8 for fully utf-8 aware File::Find functions. * Cwd::utf8 for fully utf-8 aware Cwd functions. AUTHORS * Michael Schwern * Mike Doherty * Hayo Baan COPYRIGHT AND LICENSE This software is copyright (c) 2009 by Michael Schwern ; he originated it. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.