Value options for ASScript.
kASRomanScript | Roman.
|
kASJapaneseScript | Japanese.
|
kASTraditionalChineseScript | Traditional Chinese.
|
kASKoreanScript | Korean.
|
kASArabicScript | Arabic.
|
kASHebrewScript | Hebrew.
|
kASGreekScript | Greek.
|
kASCyrillicScript | Cyrillic.
|
kASRightLeftScript | RightLeft.
|
kASDevanagariScript | Devanagari.
|
kASGurmukhiScript | Gurmukhi.
|
kASGujaratiScript | Gujarati.
|
kASOriyaScript | Oriya.
|
kASBengaliScript | Bengali.
|
kASTamilScript | Tamil.
|
kASTeluguScript | Telugu.
|
kASKannadaScript | Kannada.
|
kASMalayalamScript | Malayalam.
|
kASSinhaleseScript | Sinhalese.
|
kASBurmeseScript | Burmese.
|
kASKhmerScript | Khmer
|
kASThaiScript | Thai
|
kASLaotianScript | Laotian.
|
kASGeorgianScript | Georgian.
|
kASArmenianScript | Armenian.
|
kASSimplifiedChineseScript | Simplified Chinese.
|
kASTibetanScript | Tibetan.
|
kASMongolianScript | Mongolian.
|
kASGeezScript | Ge'ez.
|
kASEastEuropeanRomanScript | East European Roman.
|
kASVietnameseScript | Vietnamese.
|
kASExtendedArabicScript | Extended Arabic.
|
kASEUnicodeScript | Unicode.
|
kASDontKnowScript = - 1 | Unknown.
|
kASTextFilterIdentity | Does nothing.
|
kASTextFilterLineEndings | Normalizes line endings (equivalent to ASTextNormalizeEndOfLine()).
|
kASTextFilterUpperCase | Makes all text upper case. DEPRECATED: Case is not a reliably localizable concept. Do not use this.
|
kASTextFilterLowerCase | Makes all text lower case. DEPRECATED: Case is not a reliably localizable concept. Do not use this.
|
kASTextFilterXXXDebug | Changes any ASText to "XXX" (for debugging).
|
kASTextFilterUpperCaseDebug | Makes all text except
scanf format strings upper case. |
kASTextFilterLowerCaseDebug | Makes all text except
scanf format strings lower case. |
kASTextFilterRemoveAmpersands | Removes stand-alone ampersands, and turns
& & into & |
kASTextFilterNormalizeFullWidthASCIIVariants | Changes any full width ASCII variants to their lower-ASCII version. For example,
0xFF21 (full width 'A' ) becomes 0x0041 (ASCII 'A' ) |
kASTextRemoveLineEndings | Removes line endings and replaces them with spaces.
|
kASTextFilterRsvd1 = 1000 | Reserved. Do not use.
|
kASTextFilterUnknown = - 1 | An invalid filter type.
|
typedef
const
struct
_t_ASTextRec
*
ASConstText
;
typedef
ASUns16
ASCountryCode
;
CHARSET
id
. In UNIX, Acrobat currently only supports English, so the only valid ASHostEncoding is 0
(Roman). See ASScript. typedef
ASInt32
ASHostEncoding
;
typedef
ASUns16
ASLanguageCode
;
For value options see ASScripts.
typedef
ASInt32
ASScript
;
An opaque object holding encoded text.
An ASText object represents a Unicode string. ASText objects can also be used to convert between Unicode and various platform-specific text encodings, as well as conversions between various Unicode formats such as UTF-16 or UTF-8. Since it is common for a Unicode string to be repeatedly converted to or from the same platform-specific text encoding, ASText objects are optimized for this operation. For example, they can cache both the Unicode and platform-specific text strings.
There are several ways of creating an ASText object depending on the type and format of the original text data. The following terminology is used throughout this API to describe the various text formats:
Text Format | Description
|
---|---|
Encoded
| A multi-byte string terminated with a single
0 character and coupled with a specific host encoding indicator. On Mac OS, the text encoding is specified using a script code. On Windows, the text encoding is specified using a CHARSET code. On UNIX the only valid host encoding indicator is 0 , which specifies text in the platform's default Roman encoding. On all platforms, Asian text is typically specified using multi-byte strings. |
ScriptText
| A multi-byte string terminated with a single
0 character and coupled with an ASScript code. This is merely another way of specifying the Encoded case; the ASScript code is converted to a host encoding using ASScriptToHostEncoding(). |
Unicode
| Text specified using UTF-16 or UTF-8. In the UTF-16 case, the bytes can be in either big-endian format or the endian-ness that matches the platform, and are always terminated with a single ASUns16
0 value. In the UTF-8 case, the text is always terminated with a trailing 0 byte. Unicode usage in this case is straight Unicode without the 0xFE 0xFF prefix or language and country codes that can be encoded inside a PDF document. |
A string of text pulled out of a PDF document. This will either be a big-endian Unicode string pre-appended with the bytes
0xFE 0xFF , or a string in PDFDocEncoding. In this case, the Unicode string may have embedded language and country identifiers. ASText objects strip language and country information out of the PDText string and track them separately. See below for more details. |
ASText objects can also be used to accomplish encoding and format conversions; you can request a string in any of the formats specified above. In all cases the ASText code attempts to preserve all characters. For example, if you attempt to concatenate two strings in separate host encodings, the implementation may convert both to Unicode and perform the concatenation in Unicode space.
When creating a new ASText object or putting new data into an existing object, the implementation will always copy the supplied data into the ASText object. The original data is yours to do with as you wish (and release if necessary).
The size of ASText data is always specified in bytes. For example, the len
argument to ASTextFromSizedUnicode() specifies the number of bytes in the string, not the number of Unicode characters.
Host encoding and Unicode strings are always terminated with a NULL
character (which consists of one NULL
byte for host encoded strings and two NULL
bytes for Unicode strings). You cannot create a string with an embedded NULL
character, even using the calls which take an explicit length parameter.
The Getxxx
calls return pointers to data held by the ASText object. You cannot free or manipulate this data directly. The GetxxxCopy
calls return data you can manipulate and that you are responsible for freeing.
An ASText object can have language and country codes associated with it. A language code is a 2-character ISO 639 language code. A country code is a 2- character ISO 3166 country code. In both cases the 2-character codes are packed into an ASUns16 value: the first character is packed in bits 8-15, and the second character is packed in bits 0-7. These language and country codes can be encoded into a UTF-16 variant of PDText encoding using an escape sequence. See the description of "Common Data Structures" in ISO 32000-1:2008, Document Management-Portable Document Format-Part 1: PDF 1.7, section 7.9, page 84.
You can find this document on the web store of the International Standards Organization (ISO).
The ASText calls will automatically parse the language and country codes embedded inside a UTF-16 PDText object, and will also author appropriate escape sequences to embed the language and country codes (if present) when generating a UTF-16 PDText object.
typedef
struct
_t_ASTextRec
*
ASText
;
typedef
ASEnum16
ASTextFilterType
;
Holds a single 16-bit value from a UTF-16 encoded Unicode string. It is typically used to point to the beginning of an UTF-16 string. For example: ASUTF16Val
*utf16String
=
...
This data type is not large enough to hold any arbitrary Unicode character. Use ASUnicodeChar to pass individual Unicode characters.
typedef
ASUns16
ASUTF16Val
;
typedef
ASUns32
ASUTF32Val
;
typedef
ASUns8
ASUTF8Val
;
typedef
ASUTF16Val
ASUniChar
;
typedef
ASUns32
ASUnicodeChar
;
For value options see UTFOptions.
typedef
ASEnum16
ASUnicodeFormat
;
ASText
ASTextEvalProc(
ASCab
params
);
Determines whether the given byte is a lead byte of a multi-byte character, and how many tail bytes follow.
When parsing a string in a host encoding, you must keep in mind that the string could be in a variable length multi-byte encoding. In such an encoding (for example, Shift-JIS) the number of bytes required to represent a character varies on a character-by-character basis. To parse such a string you must start at the beginning and, for each byte, determine whether that byte represents a character or is the first byte of a multi-byte character. If the byte is a lead byte for a multi-byte character, you must also compute how many bytes will follow the lead byte to make up the entire character. Currently the API provides a call (PDHostMBLen()) that performs these computations, but only if the encoding in question is the operating system encoding (as returned by PDGetHostEncoding()). ASHostMBLen() allows you to determine this for any byte in any host encoding.
Note: ASHostMBLen() cannot confirm whether the required number of trailing bytes actually follow the first byte. If you are parsing a multi-byte string, make sure your code will stop at the first NULL
(zero) byte even if it appears immediately after the lead byte of a multi-byte character.
ASInt32
ASHostMBLen(
ASHostEncoding
encoding
,
ASUns8
byte
);
encoding | The host encoding type.
|
byte | The first byte of a multi-byte character.
|
1
for a two-byte character and 0
for a one-byte character. For Roman encodings, the return value will always be 0
. NULL
-terminated. ASBool
ASIsValidUTF8(
const
ASUns8
*
cIn
,
ASCount
cInLen
);
cIn | The string.
|
cInLen | The length of the string in bytes, not including the
NULL byte at the end. |
CHARSET
id
. On Mac OS, the host encoding is a script code. ASScript
ASScriptFromHostEncoding(
ASHostEncoding
osScript
);
osScript | The host encoding type.
|
CHARSET
id
. On Mac OS, the host encoding is a script code. ASHostEncoding
ASScriptToHostEncoding(
ASScript
asScript
);
asScript | The script value.
|
Compares two ASConstText objects, ignoring language and country information. The comparison is case-sensitive.
Various exceptions may be raised.
ASInt32
ASTextCaseSensitiveCmp(
ASConstText
str1
,
ASConstText
str2
);
str1 | First text object.
|
str2 | Second text object.
|
str1
<
str2
, a positive number if str1
>
str2
, and 0
if they are equal. from
text to the end of the to
text, altering to
but not from
. It does not change the language or country of to
unless it has no language or country, in which case it acquires the language and country of from
. void
ASTextCat(
ASText
to
,
ASConstText
from
);
to | IN/OUT The encoded text to which
from is appended. |
from | IN/OUT The encoded text to be appended to
to . |
void
ASTextCatMany(
ASText
to
,
...
);
to |
Compares two ASText objects. This routine can be used to sort text objects using the default collating rules of the underlying operating system before presenting them to the user. The comparison is case-sensitive. The results are suitable for displaying a sorted list of strings to the user in his chosen language and according to the rules of the platform on which the application is running. The results can vary based on the platform and user locale. If you want to compare strings in a way that is consistent across locales and platforms (but not suitable for displaying sorted strings to a user) see ASTextCaseSensitiveCmp().
Various exceptions may be raised.
ASInt32
ASTextCmp(
ASConstText
str1
,
ASConstText
str2
);
str1 | The first text object.
|
str2 | The second text object.
|
str1
<
str2
, a positive number if str1
>
str2
, and 0
if they are equal. from
to to
, along with the country and language. void
ASTextCopy(
ASText
to
,
ASConstText
from
);
to | IN/OUT The destination text object.
|
from | IN/OUT The source text object.
|
void
ASTextDestroy(
ASText
str
);
str | IN/OUT A text object.
|
ASText
ASTextDup(
ASConstText
str
);
str | A text object.
|
is raised if
str is NULL . |
"%keyone%%keytwo%"
, the value is replaced with the concatenation of the values of the keys keyone
and keytwo
in the ASCab passed in. void
ASTextEval(
ASText
theText
,
ASCab
params
);
theText | A text object containing percent-quoted expressions to replace.
|
params | The ASCab containing the key/value pairs to use for text replacement.
|
if
theText is NULL . |
void
ASTextFilter(
ASText
text
,
ASTextFilterType
filter
);
text | A text object modified by the method.
|
filter | The filter to run on the text object.
|
if
text is NULL or if an invalid filter is specified. |
NULL
-terminated multi-byte string in the specified host encoding. ASText
ASTextFromEncoded(
const
char
*
str
,
ASHostEncoding
encoding
);
str | The input string.
|
encoding | The host encoding.
|
ASText
ASTextFromInt32(
ASInt32
num
);
num | A number of type ASInt32.
|
0xFEFF
prepended to the front or a PDFDocEncoding string. In either case the string is expected to have the appropriate NULL
termination. If the PDText is in UTF-16, it may have embedded language and country information; this will cause the ASText object to have its language and country codes set to the values found in the string. ASText
ASTextFromPDText(
const
char
*
str
);
str | A string.
|
NULL
-terminated multi-byte string of the specified script. This is a wrapper around ASTextFromEncoded(); the script is converted to a host encoding using ASScriptToHostEncoding(). ASText
ASTextFromScriptText(
const
char
*
str
,
ASScript
script
);
str | A string.
|
script | The specified script.
|
ASText
ASTextFromSizedEncoded(
const
char
*
str
,
ASTArraySize
len
,
ASHostEncoding
encoding
);
str | A string.
|
len | The length in bytes.
|
encoding | The specified host encoding.
|
is raised if
len < 0 . |
0xFEFF
prepended to the front or a PDFDocEncoding string. If the PDText is in UTF-16, it may have embedded language and country information; this will cause the ASText object to have its language and country codes set to the values found in the string. The length
parameter specifies the size, in bytes, of the string. The string must not contain embedded NULL
characters. ASText
ASTextFromSizedPDText(
const
char
*
str
,
ASTArraySize
length
);
str | A string.
|
length | The length in bytes.
|
ASText
ASTextFromSizedScriptText(
const
char
*
str
,
ASTArraySize
len
,
ASScript
script
);
str | A string.
|
len | The length in bytes.
|
script | The specified script.
|
Creates a new text object from the specified Unicode string. This string is not expected to have 0xFE
0xFF
prepended, or country/language identifiers.
The string cannot contain an embedded NULL
character.
ASText
ASTextFromSizedUnicode(
const
ASUTF16Val
*
ucs
,
ASUnicodeFormat
format
,
ASTArraySize
len
);
ucs | The Unicode string
|
format | The Unicode format of
ucs . |
len | The length of
ucs in bytes. |
is raised if
len < 0 . |
NULL
-terminated Unicode string. This string is not expected to have 0xFE
0xFF
prepended, or country/language identifiers. ASText
ASTextFromUnicode(
const
ASUTF16Val
*
ucs
,
ASUnicodeFormat
format
);
ucs | A Unicode string.
|
format | The Unicode format used by
ucs . |
ASText
ASTextFromUns32(
ASUns32
num
);
num | IN/OUT A value of type ASUns32.
|
Returns the best host encoding for representing the text. The best host encoding is the one that is least likely to lose characters during the conversion from Unicode to host. If the string can be represented accurately in multiple encodings (for example, it is low-ASCII text that can be correctly represented in any host encoding), ASTextGetBestEncoding() returns the preferred encoding based on the preferredEncoding
parameter.
Various exceptions may be raised.
ASHostEncoding
ASTextGetBestEncoding(
ASConstText
str
,
ASHostEncoding
preferredEncoding
);
str | An ASText string.
|
preferredEncoding | The preferred encoding. There is no default.
|
//
If
you
prefer
to
use
the
application's
language
encoding:
ASHostEncoding
bestEncoding
=
ASTextGetBestEncoding
(text,
AVAppGetLanguageEncoding());
//
If
you
prefer
to
use
the
operating
system
encoding:
ASHostEncoding
bestEncoding
=
ASTextGetBestEncoding
(text,
(ASHostEncoding)PDGetHostEncoding());
//
If
you
want
to
favor
Roman
encodings:
ASHostEncoding
hostRoman
=
ASScriptToHostEncoding
(kASRomanScript);
ASHostEncoding
bestEncoding
=
ASTextGetBestEncoding
(text,
hostRoman);
ASScript
ASTextGetBestScript(
ASConstText
str
,
ASScript
preferredScript
);
str | IN/OUT An ASText string.
|
preferredScript | IN/OUT The preferred host script. There is no default.
|
ASCountryCode
ASTextGetCountry(
ASConstText
text
);
text | IN/OUT An ASText object.
|
const
char
*
ASTextGetEncoded(
ASConstText
str
,
ASHostEncoding
encoding
);
str | IN/OUT An ASText object.
|
encoding | IN/OUT The specified host encoding.
|
NULL
-terminated string corresponding to the text in str
. char
*
ASTextGetEncodedCopy(
ASConstText
str
,
ASHostEncoding
encoding
);
str | An ASText object.
|
encoding | The specified encoding.
|
str
. The client owns the resulting information and is responsible for freeing it using ASfree(). is raised if memory could not be allocated for the copy.
|
ASLanguageCode
ASTextGetLanguage(
ASConstText
text
);
text | An ASText object.
|
Returns the text in a form suitable for storage in a PDF file. If the text can be represented using PDFDocEncoding, it is; otherwise it is represented in big-endian UTF-16 format with 0xFE
0xFF
prepended to the front and any country/language codes embedded in an escape sequence right after 0xFE
0xFF
.
You can determine if the string is Unicode by inspecting the first two bytes. The Unicode case is used if the string has a language and country code set. The resulting string is NULL
-terminated as appropriate. That is, one NULL
byte is used for PDFDocEncoding, two are used for UTF-16.
Various exceptions may be raised.
char
*
ASTextGetPDTextCopy(
ASConstText
str
,
ASTArraySize
*
len
);
str | A string.
|
len | The length in bytes of the resulting string, not counting the
NULL bytes at the end. |
Converts the Unicode string in the ASText object to the appropriate script, and returns a pointer to the converted text. The memory to which it points is owned by the ASText object and must not be altered or destroyed by the client. The memory may also become invalid after subsequent operations are applied to the ASText object.
Various exceptions may be raised.
const
char
*
ASTextGetScriptText(
ASConstText
str
,
ASScript
script
);
str | IN/OUT A string.
|
script | IN/OUT The writing script.
|
char
*
ASTextGetScriptTextCopy(
ASConstText
str
,
ASScript
script
);
str | A string.
|
script | A writing script.
|
is raised if memory could not be allocated for the copy.
|
Returns a pointer to a string in kUTF16HostEndian format (see ASUnicodeFormat). The memory to which this string points is owned by the ASText object, and may not be valid after additional operations are performed on the object.
The Unicode text returned will not have 0xFE
0xFF
prepended or any language or country codes.
const
ASUTF16Val
*
ASTextGetUnicode(
ASConstText
str
);
str | A string.
|
Returns a pointer to a NULL
-terminated string in the specified Unicode format. The memory to which this string points is owned by the client, which can modify it at will and is responsible for destroying it using ASfree.
The Unicode text returned will not have 0xFE
0xFF
prepended or any language or country codes.
ASUTF16Val
*
ASTextGetUnicodeCopy(
ASConstText
str
,
ASUnicodeFormat
format
);
str | A string.
|
format | The Unicode format.
|
is raised if memory could not be allocated for the copy.
|
0
-length string. ASBool
ASTextIsEmpty(
ASConstText
str
);
str | A string.
|
void
ASTextMakeEmpty(
ASText
str
);
ASText
object (converts it into an empty string). It clears the released storage (for security strings). void
ASTextMakeEmptyClear(
ASText
str
);
ASText
ASTextNew(
void
);
\\r
and \\n
are replaced with \\r\\n
. void
ASTextNormalizeEndOfLine(
ASText
text
);
text | An object of type ASText.
|
Replaces all occurrences of toReplace
in src
with the text specified in replacement
. This uses an ASText string to indicate the toReplace
string; ASTextReplaceASCII() uses a low ASCII Roman string to indicate the text to replace.
Various exceptions may be raised.
void
ASTextReplace(
ASText
src
,
ASConstText
toReplace
,
ASConstText
replacement
);
src | Source text.
|
toReplace | Text in source text to replace.
|
replacement | Text used in replacement.
|
Replaces all occurrences of toReplace
in src
with the text specified in replacement
. ASTextReplace() uses an ASText string to indicate the toReplace string; this uses a low-ASCII Roman string to indicate the text to replace.
This call is intended for formatting strings for the user interface. For example, it can be used for replacing a known sequence such as '%1'
with other text. Be sure to use only low ASCII characters, which are safe on all platforms. Avoid using backslash and currency symbols.
Various exceptions may be raised.
void
ASTextReplaceASCII(
ASText
src
,
const
char
*
toReplace
,
ASConstText
replacement
);
src | The ASText object containing the text.
|
toReplace | The text to replace.
|
replacement | The replacement text.
|
Replaces all occurrences of characters contained in the list pszBadCharList
in the text with the specified replacement character.
Various exceptions may be raised.
void
ASTextReplaceBadChars(
ASText
str
,
const
char
*
pszBadCharList
,
char
replaceChar
);
str | The text in which to replace characters.
|
pszBadCharList | A list of characters to replace, in sorted order with no duplicates.
|
replaceChar | The character with which to replace any character appearing in the list.
|
void
ASTextSetCountry(
ASText
text
,
ASCountryCode
country
);
text | IN/OUT An ASText object.
|
country | IN/OUT Country code.
|
void
ASTextSetEncoded(
ASText
str
,
const
char
*
text
,
ASHostEncoding
encoding
);
str | IN/OUT An ASText object to hold the string.
|
text | IN/OUT A pointer to the text string.
|
encoding | IN/OUT The type of encoding.
|
is raised if
text is NULL . |
void
ASTextSetLanguage(
ASText
text
,
ASLanguageCode
language
);
text | IN/OUT An ASText object.
|
language | IN/OUT The language code.
|
0xFEFF
prepended to the front or a PDFDocEncoding string. In either case the string is expected to have the appropriate NULL
termination. If the PDText is in UTF-16, it may have embedded language and country information; this will cause the ASText object to have its language and country codes set to the values found in the string. void
ASTextSetPDText(
ASText
str
,
const
char
*
text
);
str | A string.
|
text | A text string.
|
NULL
-terminated multi-byte string of the specified script. This is a wrapper around ASTextFromEncoded(); the script is converted to a host encoding using ASScriptToHostEncoding(). void
ASTextSetScriptText(
ASText
str
,
const
char
*
text
,
ASScript
script
);
str | IN/OUT A string.
|
text | IN/OUT A pointer to the text string.
|
script | IN/OUT The writing script.
|
void
ASTextSetSizedEncoded(
ASText
str
,
const
char
*
text
,
ASTArraySize
len
,
ASHostEncoding
encoding
);
str | IN/OUT A string.
|
text | IN/OUT A pointer to the text string.
|
len | IN/OUT The length of the text string.
|
encoding | IN/OUT The host encoding type.
|
is raised if
text is NULL . |
0xFEFF
prepended to the front or a PDFDocEncoding string. In either case the length
parameter indicates the number of bytes in the string. The string should not be NULL
-terminated and must not contain any NULL
characters. If the PDText is in UTF-16, it may have embedded language and country information; this will cause the ASText object to have its language and country codes set to the values found in the string. void
ASTextSetSizedPDText(
ASText
str
,
const
char
*
text
,
ASTArraySize
length
);
str | A string.
|
text | A pointer to a text string.
|
length | The length of the text string.
|
void
ASTextSetSizedScriptText(
ASText
str
,
const
char
*
text
,
ASTArraySize
len
,
ASScript
script
);
str | IN/OUT A string.
|
text | IN/OUT A pointer to the text string.
|
len | IN/OUT The length of the text string.
|
script | IN/OUT The writing script.
|
is raised if
text is NULL . |
void
ASTextSetSizedUnicode(
ASText
str
,
const
ASUTF16Val
*
ucsValue
,
ASUnicodeFormat
format
,
ASTArraySize
len
);
str | (Filled by the method) A string.
|
ucsValue | A Unicode string.
|
format | The Unicode format.
|
len | The length of the string in bytes.
|
NULL
-terminated Unicode string. This string is not expected to have 0xFE
0xFF
prepended or embedded country/language identifiers. void
ASTextSetUnicode(
ASText
str
,
const
ASUTF16Val
*
ucsValue
,
ASUnicodeFormat
format
);
str | (Filled by the method) A string.
|
ucsValue | A Unicode string.
|
format | The Unicode format.
|
void
ASUCS_GetPasswordFromUnicode(
ASUTF16Val
*
inPassword
,
void
*
*
outPassword
,
ASBool
useUTF
);
inPassword | |
outPassword | |
useUTF | IN A flag for controlling the conversion. Prior to Acrobat 9.0, passwords were converted from host code-page encoding (8-bit mode) to
PDFDocEncoding . If useUTF == false , this routine does the same, starting from 16-bit Unicode. With encryption, Acrobat 9.0 and later allows Unicode passwords, normalized and converted to UTF-8 encoding. If useUTF == true , such a Unicode password is what is returned. |