Manual Pages for UNIX Darwin command on man Tcl_UtfToUniCharDString
MyWebUniversity

Manual Pages for UNIX Darwin command on man Tcl_UtfToUniCharDString

Utf(3) Tcl Library Procedures Utf(3)

NAME

TclUniChar, TclUniCharCaseMatch, TclUniCharNcasecmp, TclUniCharToUtf, TclUtfToUniChar, TclUniCharToUtfDString,

TclUtfToUniCharDString, TclUniCharLen, TclUniCharNcmp, TclUtfChar-

Complete, TclNumUtfChars, TclUtfFindFirst, TclUtfFindLast,

TclUtfNext, TclUtfPrev, TclUniCharAtIndex, TclUtfAtIndex, TclUtf-

Backslash - routines for manipulating UTF-8 strings.

SYNOPSIS

##iinncclluuddee <>

typedef ... TclUniChar; int TTccllUUnniiCChhaarrTTooUUttff(ch, buf) int TTccllUUttffTTooUUnniiCChhaarr(src, chPtr) char * | TTccllUUnniiCChhaarrTTooUUttffDDSSttrriinngg(uniStr, numChars, dstPtr) | TclUniChar * | TTccllUUttffTTooUUnniiCChhaarrDDSSttrriinngg(src, len, dstPtr) | int TTccllUUnniiCChhaarrLLeenn(uniStr) int TTccllUUnniiCChhaarrNNccmmpp(uniStr, uniStr, num) int | TTccllUUnniiCChhaarrNNccaasseeccmmpp(uniStr, uniStr, num) | int | TTccllUUnniiCChhaarrCCaasseeMMaattcchh(uniStr, uniPattern, nocase) | int TTccllUUttffNNccmmpp(src, src, num) int TTccllUUttffNNccaasseeccmmpp(src, src, num) int TTccllUUttffCChhaarrCCoommpplleettee(src, len) int TTccllNNuummUUttffCChhaarrss(src, len) CONST char * | TTccllUUttffFFiinnddFFiirrsstt(src, ch) | CONST char * | TTccllUUttffFFiinnddLLaasstt(src, ch) | CONST char * | TTccllUUttffNNeexxtt(src) | CONST char * | TTccllUUttffPPrreevv(src, start) | TclUniChar TTccllUUnniiCChhaarrAAttIInnddeexx(src, index) CONST char * | TTccllUUttffAAttIInnddeexx(src, index) | int TTccllUUttffBBaacckkssllaasshh(src, readPtr, dst) AARRGGUUMMEENNTTSS

char *buf (out) Buffer in which the UTF-8 rep-

resentation of the TclUniChar is stored. At most TCLUTFMAX bytes are stored in the buffer.

int ch (in) The TclUniChar to be con-

verted or examined. TclUniChar *chPtr (out) Filled with the TclUniChar represented by the head of the

UTF-8 string.

CONST char *src (in) Pointer to a UTF-8 string.

CONST TclUniChar *uniStr (in) A null-terminated Unicode

string.

CONST TclUniChar *uniPattern(in) A null-terminated Unicode

string.

int len (in) The length of the UTF-8 string

in bytes (not UTF-8 charac-

ters). If negative, all bytes up to the first null byte are used. int numChars (in) The length of the Unicode string in characters. Must be greater than or equal to 0.

TclDString *dstPtr (in/out) A pointer to a previously-ini-

tialized TTccllDDSSttrriinngg. unsigned long num (in) The number of characters to compare. CONST char *start (in) Pointer to the beginning of a

UTF-8 string.

int index (in) The index of a character (not

byte) in the UTF-8 string.

int *readPtr (out) If non-NULL, filled with the

number of bytes in the back-

slash sequence, including the backslash character.

char *dst (out) Buffer in which the bytes rep-

resented by the backslash sequence are stored. At most TCLUTFMAX bytes are stored in the buffer. | int nocase (in) || Specifies whether the match |

should be done case-sensitive |

(0) or case-insensitive (1).

DESCRIPTION

These routines convert between UTF-8 strings and TclUniChars. A

TclUniChar is a Unicode character represented as an unsigned, fixed-

size quantity. A UTF-8 character is a Unicode character represented as

a varying-length sequence of up to TCLUTFMAX bytes. A multibyte

UTF-8 sequence consists of a lead byte followed by some number of trail

bytes. TTCCLLUUTTFFMMAAXX is the maximum number of bytes that it takes to represent

one Unicode character in the UTF-8 representation.

TTccllUUnniiCChhaarrTTooUUttff stores the TclUniChar ch as a UTF-8 string in start-

ing at buf. The return value is the number of bytes stored in buf.

TTccllUUttffTTooUUnniiCChhaarr reads one UTF-8 character starting at src and stores

it as a TclUniChar in *chPtr. The return value is the number of bytes read from src.. The caller must ensure that the source buffer is long enough such that this routine does not run off the end and dereference

non-existent or random memory; if the source buffer is known to be

null-terminated, this will not happen. If the input is not in proper

UTF-8 format, TTccllUUttffTTooUUnniiCChhaarr will store the first byte of src in

*chPtr as a TclUniChar between 0x0000 and 0x00ff and return 1.

TTccllUUnniiCChhaarrTTooUUttffDDSSttrriinngg converts the given Unicode string to UTF-8,

storing the result in a previously-initialized TTccllDDSSttrriinngg. You must

specify the length of the given Unicode string. The return value is a

pointer to the UTF-8 representation of the Unicode string. Storage for

the return value is appended to the end of the TTccllDDSSttrriinngg.

TTccllUUttffTTooUUnniiCChhaarrDDSSttrriinngg converts the given UTF-8 string to Unicode,

storing the result in the previously-initialized TTccllDDSSttrriinngg. you may

either specify the length of the given UTF-8 string or "-1", in which

case TTccllUUttffTTooUUnniiCChhaarrDDSSttrriinngg uses ssttrrlleenn to calculate the length. The

return value is a pointer to the Unicode representation of the UTF-8

string. Storage for the return value is appended to the end of the TTccllDDSSttrriinngg. The Unicode string is terminated with a Unicode null character. TTccllUUnniiCChhaarrLLeenn corresponds to ssttrrlleenn for Unicode characters. It

accepts a null-terminated Unicode string and returns the number of Uni-

code characters (not bytes) in that string.

TTccllUUnniiCChhaarrNNccmmpp and TTccllUUnniiCChhaarrNNccaasseeccmmpp correspond to ssttrrnnccmmpp and ssttrrnn-

ccaasseeccmmpp, respectively, for Unicode characters. They accepts two null-

terminated Unicode strings and the number of characters to compare. Both strings are assumed to be at least len characters long.

TTccllUUnniiCChhaarrNNccmmpp compares the two strings character-by-character

according to the Unicode character ordering. It returns an integer greater than, equal to, or less than 0 if the first string is greater than, equal to, or less than the second string respectively. TTccllUUnniiCChhaarrNNccaasseeccmmpp is the Unicode case insensitive version. TTccllUUnniiCChhaarrCCaasseeMMaattcchh is the Unicode equivalent to TTccllSSttrriinnggCCaasseeMMaattcchh. |

It accepts a null-terminated Unicode string, a Unicode pattern, and a |

boolean value specifying whether the match should be case sensitive and | returns whether the string matches the pattern.

TTccllUUttffNNccmmpp corresponds to ssttrrnnccmmpp for UTF-8 strings. It accepts two

null-terminated UTF-8 strings and the number of characters to compare.

(Both strings are assumed to be at least len characters long.)

TTccllUUttffNNccmmpp compares the two strings character-by-character according

to the Unicode character ordering. It returns an integer greater than, equal to, or less than 0 if the first string is greater than, equal to, or less than the second string respectively.

TTccllUUttffNNccaasseeccmmpp corresponds to ssttrrnnccaasseeccmmpp for UTF-8 strings. It is

similar to TTccllUUttffNNccmmpp except comparisons ignore differences in case when comparing upper, lower or title case characters.

TTccllUUttffCChhaarrCCoommpplleettee returns 1 if the source UTF-8 string src of length

len bytes is long enough to be decoded by TTccllUUttffTTooUUnniiCChhaarr, or 0 other-

wise. This function does not guarantee that the UTF-8 string is prop-

erly formed. This routine is used by procedures that are operating on a byte at a time and need to know if a full TclUniChar has been seen.

TTccllNNuummUUttffCChhaarrss corresponds to ssttrrlleenn for UTF-8 strings. It returns

the number of TclUniChars that are represented by the UTF-8 string

src. The length of the source string is len bytes. If the length is negative, all bytes up to the first null byte are used.

TTccllUUttffFFiinnddFFiirrsstt corresponds to ssttrrcchhrr for UTF-8 strings. It returns a

pointer to the first occurrence of the TclUniChar ch in the null-ter-

minated UTF-8 string src. The null terminator is considered part of

the UTF-8 string.

TTccllUUttffFFiinnddLLaasstt corresponds to ssttrrrrcchhrr for UTF-8 strings. It returns a

pointer to the last occurrence of the TclUniChar ch in the null-termi-

nated UTF-8 string src. The null terminator is considered part of the

UTF-8 string.

Given src, a pointer to some location in a UTF-8 string, TTccllUUttffNNeexxtt

returns a pointer to the next UTF-8 character in the string. The

caller must not ask for the next character after the last character in the string if the string is not terminated by a null character.

Given src, a pointer to some location in a UTF-8 string (or to a null

byte immediately following such a string), TTccllUUttffPPrreevv returns a

pointer to the closest preceding byte that starts a UTF-8 character.

This function will not back up to a position before start, the start of

the UTF-8 string. If src was already at start, the return value will

be start. TTccllUUnniiCChhaarrAAttIInnddeexx corresponds to a C string array dereference or the Pascal Ord() function. It returns the TclUniChar represented at the

specified character (not byte) index in the UTF-8 string src. The

source string must contain at least index characters. Behavior is undefined if a negative index is given. TTccllUUttffAAttIInnddeexx returns a pointer to the specified character (not byte)

index in the UTF-8 string src. The source string must contain at least

index characters. This is equivalent to calling TTccllUUttffNNeexxtt index times. If a negative index is given, the return pointer points to the first character in the source string.

TTccllUUttffBBaacckkssllaasshh is a utility procedure used by several of the Tcl com-

mands. It parses a backslash sequence and stores the properly formed

UTF-8 character represented by the backslash sequence in the output

buffer dst. At most TCLUTFMAX bytes are stored in the buffer. TTccllUUttffBBaacckkssllaasshh modifies *readPtr to contain the number of bytes in the backslash sequence, including the backslash character. The return value is the number of bytes stored in the output buffer. See the TTccll manual entry for information on the valid backslash sequences. All of the sequences described in the Tcl manual entry are supported by TTccllUUttffBBaacckkssllaasshh. KKEEYYWWOORRDDSS utf, unicode, backslash Tcl 8.1 Utf(3)




Contact us      |      About us      |      Term of use      |       Copyright © 2000-2019 MyWebUniversity.com ™