Uniscribe: The Missing Documentation & Examples

作者: admin 分类: C++ 发布时间: 2013-03-14 09:52 ė3,062 浏览数 6没有评论

Uniscribe: The Missing Documentation & Examples

Index

Introduction

Microsoft created an extremely powerful API called Uniscribe that allows applications to do typography of scripts that may have complex rules for transforming the input string (a list of Unicode code points) to the proper thing that should be rendered on the screen.

Unfortunately, Microsoft did not document this library very well, gave no examples, and blessed it with an extremely complex API. I have attempted to document and give examples for some aspects of the Uniscribe library that I am familiar with in the hopes that it will be useful to other developers.

This document comes from my contribution to getting Uniscribe to work in Google Chrome. You can see the production versions of the code in: UniscribeHelper.h and UniscribeHelper.cpp. This document was written from memory without referencing the Google Chrome code (to avoid leaks) before it was released, so there may be bugs or typos present here that are not in the production code. It’s the only browser other than IE to get Arabic justification using Kashidas correct, and it’s the only browser other than Safari to get extra character spacing in Hebrew correct. There are bugs, however. The main limitation is that it doesn’t handle font (face, color, style, etc.) changes in the middle of shaped words.

Note: In the examples, I use Unicode characters rather than images, and count on your browser to be able to display complex scripts properly. This may not be the case for all systems. It at least the case for the newest versions of Firefox and IE on Windows 2000 and above.

Why should you use Uniscribe?

You want to be able to render right-to-left and left-to-right languages in the same layout.
You want to be able to handle the complex ligaturization rules of languages like Arabic. The transformations can be quite dramatic, for example: Incorrect, left-to-right layout without ligatures: ة ي د و ع س ل ا Correct, right-to-left with ligatures: السعودية (assuming your browser can render Arabic properly).
You want to be able to handle combining characters like Hebrew vowel points or even combining accents that can be used in Latin-based languages. For example, é can be represented by the single code point U+00E9, or by the combination of U+0065 e and U+0301 ´ (combining acute accent).
Use nice ligatures or other decoration that your font has available. Many fonts have a ligature for “fi” (compare fi and ﬁ, assuming your font has this glyph) and “fl”, and some high-end OpenType fonts have fancier ligatures for pairs like “Th” that can look very nice (for example, see the Glyph Complement Sheet for Adobe Minion Pro [pdf]).

Overview

Uniscribe works in two modes. In the more basic mode, the programmer calls ScriptStringAnalyze on their input, calls any number of other ScriptString… functions to get information about the text, ScriptStringOut to draw the string, and ScriptStringFree to free the internal data for the string.

This basic mode is not discussed here. It is slightly easier to use than the non-basic mode, and I also have no experience with it, so I have nothing to add other than the MSDN documentation for Uniscribe functions.

Instead, I document the parts of the more complex do-it-yourself API that I am familiar with. The important thing to know is that you are either in basic mode, in which case all your functions start with ScriptString…, or your are in do-it-yourself mode where you can not call these functions. The approximate outline for do-it-yourself mode is as follows:

Call ScriptItemize on your input string. This will itentify the “runs” in the string that consist of a single direction of text. Most of the rest of the Uniscribe functions operate on these runs individually, so you’ll have to keep track of them yourself.
Call ScriptLayout to convert your list of runs to the order that they should appear on the screen, from left to right. This allows you to have a sequence of runs that are right to left, embedded in runs that are left to right.
Call ScriptShape with the text of each run to convert it to a series of glyph indices (these are internal references to the font you selected that identifies the glyph to use). One character in the input may be composed of 0, 1, or more than one glyphs.
Call ScriptPlace with the glyph indices of each run to find out where they should be placed relative to each other. After this, you will also be able to measure the width of your run.
Optional: call ScriptJustify to fill the text out to a given width.
Optional: call ScriptCPtoX and ScriptXtoCP as needed to convert between character offsets and pixel positions in a run.
Call ScriptTextOut to draw the placed glyphs on the screen.

Disclaimer

I wrote this documentation and examples based on what I learned when using Uniscribe. It is likely I am incorrect about some aspects of the library, and there are surely errors in the examples, which have never been compiled. Use at your own risk! If something isn’t working the way you expect, don’t automatically assume my code is correct. If you do find errors, please email me and I’ll try to fix them.

ScriptItemize

MSDN Documentation for ScriptItemize

Parameters

HRESULT ScriptItemize(
const WCHAR*	pwcInChars,
int	cInChars,
int	cMaxItems,
const SCRIPT_CONTROL*	psControl,
const SCRIPT_STATE*	psState,
SCRIPT_ITEM*	pItems,
int*	pcItems);

pwcInChars, cInChars: Easy: the input string and its length.
cMaxItems: The number of spaces in the pItems array that Uniscribe can write into. If this number is insufficient, it will return E_OUTOFMEMORY.; It appears pre XP SP2 Uniscribe versions have a buffer overflow in this function (see Mozilla bug 366643). The workaround is to always give it a buffer one larger than you report. The documentation mysteriously says that you must give it one more byte than you would expect, which might be a workaround for this same bug. I am unsure if this one extra byte is sufficient to work around the problem so I would always just give it an additional full item.
psControl, psState: Pointer to a SCRIPT_CONTROL that sets application-level preferences for the type of formatting to do, and a SCRIPT_STATE that tells Uniscibe about the surrounding context of the input. Most times, the only important thing to do is set SCRIPT_STATE.uBidiLevel to the value indicating the direction of the surrounding text so that Uniscribe can handle your additional run correctly. The rest of the values will most often be 0 for normal use.; Important: psControl must not be NULL. The MSDN documents say it can be if you don’t want to set any options, but it seems some RTL code inside Uniscribe doesn’t run if you don’t provide a SCRIPT_CONTROL pointer, and you will have some rendering bugs (for example, some punctuation will be rendererd LTR even when it is inside an RTL run). Instead, provide a pointer to a structure initialized to all 0s when you don’t have any options.
pItems, pcItems: A pointer to an array of SCRIPT_ITEM structures that contain the information about the run. On success, the number of items written to the list (corresponding to each run of text) will be in *pcItems.; It seems that Uniscribe often requires many more items structures internally than it will actually return. For example, it may require an array of 10 items to not return E_OUTOFMEMORY, even though it actually returns 3 items in the end.; Uniscribe will write one more SCRIPT_ITEM structure to the array than it reports, meaning the output items will always be at least one less than the maximum number of input items you report.

Example input and output

Here is an example of an input array that produces pcItems = 3 as so:

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
H	e	l	l	o		ا	ل	س	ع	و	د	ي	ة	!
item[0].iCharPos = 0 item[0].a.fRTL = false						item[1].iCharPos = 6 item[1].a.fRTL = true								item[2].iCharPos = 14 item[2].a.fRTL = false

There will also be a magical item[3].iCharPos = 15 so you can tell that the last run has only one character in it. The first and the last run are left-to-right, but the middle run is Arabic so it will have item[1].a.fRTL = true.

Example code

// Fills the given items array with the items for the input text.
// Returns true on success.
bool callSciptItemize(const wchar_t* text, int text_len,
                      std::vector&lt;SCRIPT_ITEM&gt;* items)
{
  // Most applications won’t need to set any control flags. 
  <a href="http://msdn2.microsoft.com/en-us/library/ms776493.aspx"><span style="color: #0000ff;">SCRIPT_CONTROL</span></a> control;
  ZeroMemory(&amp;control, sizeof(SCRIPT_CONTROL));

  // Initial state, you will probably want to keep this updated as you process
  // runs in order so that you can always give it the correct direction of the
  // surrounding text.
  <a href="http://msdn2.microsoft.com/en-us/library/ms776530.aspx"><span style="color: #0000ff;">SCRIPT_STATE</span></a> state;
  ZeroMemory(&amp;state, sizeof(SCRIPT_STATE));
  state.uBidiLevel = 0;  // 0 means that the surrounding text is left-to-right.

  int max_items = 16;
  while (true) {
    // Make enough room for the output.
    items-&gt;resize(max_items);

    // We subtract one from max_items to work around a buffer overflow on some
    // older versions of Windows.
    int generated_items = 0;
    HRESULT hr = ScriptItemize(text, textlen, max_items - 1, &amp;control,
                               &amp;state, &amp;(*items)[0], &amp;generated_items);
    if (SUCCEEDED(hr)) {
      // It generated some items, so resize the array. Note that we add
      // one to account for the magic last item.
      items-&gt;resize(generated_items + 1);
      return true;
    }
    if (hr != E_OUTOFMEMORY) {
      // Some kind of error.
      return false;
    }

    // The input array isn't big enough, double and loop again.
    max_items *= 2;
  }
}

// Fills the given items array with the items for the input text.

// Returns true on success.

bool callSciptItemize(const wchar_t* text, int text_len,

std::vector<SCRIPT_ITEM>* items)

{

// Most applications won’t need to set any control flags.

<a href="http://msdn2.microsoft.com/en-us/library/ms776493.aspx">SCRIPT_CONTROL</a> control;

ZeroMemory(&control, sizeof(SCRIPT_CONTROL));

// Initial state, you will probably want to keep this updated as you process

// runs in order so that you can always give it the correct direction of the

// surrounding text.

<a href="http://msdn2.microsoft.com/en-us/library/ms776530.aspx">SCRIPT_STATE</a> state;

ZeroMemory(&state, sizeof(SCRIPT_STATE));

state.uBidiLevel = 0; // 0 means that the surrounding text is left-to-right.

int max_items = 16;

while (true) {

// Make enough room for the output.

items->resize(max_items);

// We subtract one from max_items to work around a buffer overflow on some

// older versions of Windows.

int generated_items = 0;

HRESULT hr = ScriptItemize(text, textlen, max_items - 1, &control,

&state, &(*items)[0], &generated_items);

if (SUCCEEDED(hr)) {

// It generated some items, so resize the array. Note that we add

// one to account for the magic last item.

items->resize(generated_items + 1);

return true;

}

if (hr != E_OUTOFMEMORY) {

// Some kind of error.

return false;

}

// The input array isn't big enough, double and loop again.

max_items *= 2;

}

ScriptLayout

MSDN Documentation for ScriptLayout

ScriptLayout tells you what order the runs returned by ScriptItemize should appear on the screen. If you are only dealing with left-to-right text, then the order that the runs appear on the screen is the same order as the input text, so there is nothing that needs to be done. However, if one or more runs is right-to-left, there may be some shuffling that needs to happen to get things in the correct order. This function tells you the mapping between logical (in your input text) to visual (on the screen).

ScriptLayout is fairly straightforward and is documented pretty well on MSDN, so I’ll skip the documentation and go to the example. I will use these mapping arrays in other examples below.

Example input and output

Logical order from `ScriptItemize`:	Run one, LTR	Run two, RTL	Run three, RTL	Run four, LTR
Desired screen order:	Run one, LTR	Run three, RTL	Run two, RTL	Run four, LTR

Example code

Assuming we have our “input” array in items from ScriptItemize, we can construct two lookup tables that allow us to convert between logical and visual run indices:

// Output arrays
std::vector&lt;int&gt; visual_to_logical;
std::vector&lt;int&gt; logical_to_visual;

// Construct the "embedding level" array for our list of runs that tell
// Uniscribe what direction they are. Here, we do NOT count the magic last item
// that is empty, we manually add it to the end of the lookup tables to keep
// everything consistent (it is always at the end). I'm not sure what
// ScriptLayout does with this item, so I prefer to handle it myself.
std::vector&lt;BYTE&gt; directions;
directions.resize(items.size());
for (int i = 0; i &lt; items.size() - 1; i++)
  directions[i] = items[i].a.s.uBidiLevel;

visual_to_logical.resize(items.size() - 1);
logical_to_visual.resize(items.size() - 1);
ScriptLayout(items.size(), &amp;directions[0],
             &amp;visual_to_logical[0], &amp;logical_to_visual[0]);

// Now add the magic last item back
visual_to_logical.push_back(items.size() - 1);
logical_to_visual.push_back(items.size() - 1);

// Output arrays

std::vector<int> visual_to_logical;

std::vector<int> logical_to_visual;

// Construct the "embedding level" array for our list of runs that tell

// Uniscribe what direction they are. Here, we do NOT count the magic last item

// that is empty, we manually add it to the end of the lookup tables to keep

// everything consistent (it is always at the end). I'm not sure what

// ScriptLayout does with this item, so I prefer to handle it myself.

std::vector<BYTE> directions;

directions.resize(items.size());

for (int i = 0; i < items.size() - 1; i++)

directions[i] = items[i].a.s.uBidiLevel;

visual_to_logical.resize(items.size() - 1);

logical_to_visual.resize(items.size() - 1);

ScriptLayout(items.size(), &directions[0],

&visual_to_logical[0], &logical_to_visual[0]);

// Now add the magic last item back

visual_to_logical.push_back(items.size() - 1);

logical_to_visual.push_back(items.size() - 1);

ScriptShape

MSDN Documentation for ScriptShape

ScriptShape computes which glyphs to use for a given run that has already been identified with ScriptItemize. With the output of this function, you can call ScriptPlace to compute how the glyphs should be arranged. You can’t do much with just the glyphs, so I treat ScriptShape and ScriptPlace as a pair of functions that are always called together.

Parameters

HRESULT ScriptShape(
HDC	hdc,
SCRIPT_CACHE*	psc,
const WCHAR*	pwcChars,
int	cChars,
int	cMaxGlyphs,
SCRIPT_ANALYSIS*	psa,
WORD*	pwOutGlyphs,
WORD*	pwLogClust,
SCRIPT_VISATTR*	psva,
int*	pcGlyphs);

hdc, psc: An optional HDC with the desired font selected into it, plus the SCRIPT_CACHE of that font from a previous call. See MSDN’s explanation of how caching works.
pwcChars, cChars: The characters and length of the run as identified by ScriptItemize.
cMaxGlyphs: The size of the per-glyph buffers that you are giving it, these are the pwOutGlyphs and psva arrays. If these buffers are not big enough, the function will return E_OUTOFMEMORY and you will need to re-call with bigger buffers.
psa: A pointer to the SCRIPT_ANALYSIS structure that ScriptItemize computed for this run. This structure in in SCRIPT_ITEM.a for each run.; You can modify this SCRIPT_ANALYSIS structure after getting it from ScriptItemize but before giving it to ScriptShape to override run parameters. For example, Mozilla disables shaping by setting SCRIPT_ANALYSIS.eString = SCRIPT_UNDEFINED if ScriptShape fails. This might be a good approach if you find that ScriptShape fails with USP_E_SCRIPT_NOT_IN_FONT and you don’t have a good alternative font to use that might support the script. If you do this, note Mozilla bug 341500. Apparently, Uniscribe occasionally crashes if you disable shaping and there is a UTF-16 surrogate pair (representing a character above U+FFFF). Mozilla’s solution is to generate a new input string with UTF-16 surrogates (search for GenerateAlternativeString in gfxWindowsFonts.cpp) replaced by U+FFFD (the Unicode “replacement character”).
pwOutGlyphs: A pointer to an array that will receive the glyph indices. This should be cMaxGlyphs long.
pwLogClust: A pointer to an array that will receive the log information, it should be the same size as the input. It will tell you, for each character in the input, the index of the first glyph in pwOutGlyphs that was generated from it.; Some input characters won’t generate a glyph or may share a glyph with another character. Other input characters will generate more than one glyph. Also be aware that for RTL runs, the indices in pwLogClust will count backwards, since the glyph indices will be put in screen order from left to right.; See the example input and output below for more information.
psva: A pointer to an array that will receive SCRIPT_VISATTR structures for each character. These tell you information for each glyph, such as whether it is the first glyph in a cluster (see the example below). There are other flags that may be interesting to you (see the MSDN documentation).
pcGlyphs: On success, indicates how many glyphs were actually written to pwOutGlyphs and psva.

Example input and output

Let’s say you are processing a run consisting of the word “fiancé”, and that the font you are using maps “fi” to a single-glyph ligature, and “é” to two glyphs, one for the “e” and one for the accent.

Per-character information: The log tells you, for each input character, which glyph is the first glyph it generated.

	0	1	2	3	4	5
Input `pwcChars`:	f	i	a	n	c	é
Output `pwLogClust`:	0	0	1	2	3	4

Per-glyph information: You can see that the fClusterStart flag is set whenever the glyph is the first glyph in a “cluster.” A cluster is something that the user would think of as one letter or logical unit. Here, each cluster corresponds to one input character, but that is not necessarily the case. If a combining accent was used for the “e” instead, the input would have been two case of combining accents, for example, the input could have been two code points, but the cluster would still have been the same as in this example.

	0	1	2	3	4	5
Output `pwOutGlyphs`:	ﬁ	a	n	c	e	´
`SCRIPT_VISATTR[x].fClusterStart`:	1	1	1	1	1	0

Example code

How to loop over the runs identified by callScriptItemize above.

HFONT hfont = <i>initialize your font</i>;
SCRIPT_CACHE cache = NULL;  // Initialize to NULL, will be filled lazily.

// Don't use the last item because it is a dummy that points
// to the end of the string.
for (size_t i = 0; i &lt; items.size() - 1; i++) {
  std::vector&lt;WORD&gt; logs;
  std::vector&lt;WORD&gt; glyphs;
  std::vector&lt;SCRIPT_VISATTR&gt; visattr;
  callScriptShape(&amp;input[items[i].iCharPos],                 // Beginning of this run.
                  items[i+1].iCharPos - items[i].iCharPos  // Length of this run.
                  hfont, &amp;script_cache, &amp;items[i].a,
                  &amp;logs, &amp;glyphs, &amp;visattr);
}

// Need to tell Uniscribe to delete the cache we were using. If you are going
// to keep the HFONT around, you should probably also keep the cache.
ScriptFreeCache(&amp;cache);
DeleteObject(hfont);

HFONT hfont = initialize your font;

SCRIPT_CACHE cache = NULL; // Initialize to NULL, will be filled lazily.

// Don't use the last item because it is a dummy that points

// to the end of the string.

for (size_t i = 0; i < items.size() - 1; i++) {

std::vector<WORD> logs;

std::vector<WORD> glyphs;

std::vector<SCRIPT_VISATTR> visattr;

callScriptShape(&input[items[i].iCharPos], // Beginning of this run.

items[i+1].iCharPos - items[i].iCharPos // Length of this run.

hfont, &script_cache, &items[i].a,

&logs, &glyphs, &visattr);

}

// Need to tell Uniscribe to delete the cache we were using. If you are going

// to keep the HFONT around, you should probably also keep the cache.

ScriptFreeCache(&cache);

DeleteObject(hfont);

How to turn each of those runs into a list of glyphs with ScriptShape:

// Called with the array output by callScriptItemize, this will 
bool callScriptShape(wchar_t* input, int input_length,       // IN: characters
                     HFONT hfont, SCRIPT_CACHE* script_cache,// IN: font info
                     SCRIPT_ANALYSIS* analysis,              // IN: from ScriptItemize
                     std::vector&lt;WORD&gt;* logs,                // OUT: one per character
                     std::vector&lt;WORD&gt;* glyphs,              // OUT: one per glyph
                     std::vector&lt;SCRIPT_VISATTR&gt;* visattr);  // OUT: one per glyph
{
  // Initial size guess for the number of glyphs recommended by Uniscribe
  glyphs-&gt;resize(input_length * 3 / 2 + 16);  
  visattr-&gt;resize(glyphs-&gt;size());

  // The logs array is the same size as the input.
  logs-&gt;resize(input_length);

  HDC temp_dc = NULL;  // Don't give it a DC unless we have to.
  HFONT old_font = NULL;
  HRESULT hr;
  while (true) {
    int glyphs_used;
    hr = ScriptShape(temp_dc, script_cache, input, input_length, analysis
                     logs-&gt;size(), &amp;analysis, &amp;(*glyphs)[0],
                     &amp;(*logs)[0], &amp;(*visattr)[0], &amp;glyphs_used);

    if (SUCCEEDED(hr)) {
      // It worked, resize the output list to the exact number it returned.
      glyphs-&gt;resize(glyphs_used);
      break;
    }

    // Different types of failure...
    if (hr == E_PENDING) {
      // Need to select the font for the call. Don't do this if we don't have to
      // since it may be slow.
      temp_dc = GetDC(NULL);
      old_font = SelectObject(temp_dc, hfont);
      // Loop again...

    } else if (hr == E_OUTOFMEMORY) {
      // The glyph buffer needs to be larger. Just double it every time.
      glyphs-&gt;resize(glyphs-&gt;size() * 2);
      visattr-&gt;resize(glyphs-&gt;size() * 2);
      // Loop again...

    } else if (hr == USP_E_SCRIPT_NOT_IN_FONT) {
      // The font you selected doesn't have enough information to display
      // what you want. You'll have to pick another one somehow...
      // For our cases, we'll just return failure.
      break;

    } else {
      // Some other failure.
      break;
    }
  }

  if (old_font) {
      SelectObject(NULL, old_font);  // Put back the previous font.
      ReleaseDC(NULL);
   }

  return SUCCEEDED(hr);
}

// Called with the array output by callScriptItemize, this will

bool callScriptShape(wchar_t* input, int input_length, // IN: characters

HFONT hfont, SCRIPT_CACHE* script_cache,// IN: font info

SCRIPT_ANALYSIS* analysis, // IN: from ScriptItemize

std::vector<WORD>* logs, // OUT: one per character

std::vector<WORD>* glyphs, // OUT: one per glyph

std::vector<SCRIPT_VISATTR>* visattr); // OUT: one per glyph

{

// Initial size guess for the number of glyphs recommended by Uniscribe

glyphs->resize(input_length * 3 / 2 + 16);

visattr->resize(glyphs->size());

// The logs array is the same size as the input.

logs->resize(input_length);

HDC temp_dc = NULL; // Don't give it a DC unless we have to.

HFONT old_font = NULL;

HRESULT hr;

while (true) {

int glyphs_used;

hr = ScriptShape(temp_dc, script_cache, input, input_length, analysis

logs->size(), &analysis, &(*glyphs)[0],

&(*logs)[0], &(*visattr)[0], &glyphs_used);

if (SUCCEEDED(hr)) {

// It worked, resize the output list to the exact number it returned.

glyphs->resize(glyphs_used);

break;

}

// Different types of failure...

if (hr == E_PENDING) {

// Need to select the font for the call. Don't do this if we don't have to

// since it may be slow.

temp_dc = GetDC(NULL);

old_font = SelectObject(temp_dc, hfont);

// Loop again...

} else if (hr == E_OUTOFMEMORY) {

// The glyph buffer needs to be larger. Just double it every time.

glyphs->resize(glyphs->size() * 2);

visattr->resize(glyphs->size() * 2);

// Loop again...

} else if (hr == USP_E_SCRIPT_NOT_IN_FONT) {

// The font you selected doesn't have enough information to display

// what you want. You'll have to pick another one somehow...

// For our cases, we'll just return failure.

break;

} else {

// Some other failure.

break;

}

if (old_font) {

SelectObject(NULL, old_font); // Put back the previous font.

ReleaseDC(NULL);

}

return SUCCEEDED(hr);

}

ScriptPlace

MSDN Documentation for ScriptPlace

ScriptPlace computes the actual glyphs and positions of a run. It is called with the output of ScriptShape. With the output of this function, you can compute the width of the run for layout purposes, and draw the run using ScriptTextOut.

Parameters

HRESULT ScriptPlace(
HDC	hdc,
SCRIPT_CACHE*	psc,
const WORD*	pwGlyphs,
int	cGlyphs,
const SCRIPT_VISATTR*	psva,
SCRIPT_ANALYSIS*	psa,
int*	piAdvance,
GOFFSET*	pGoffset,
ABC*	pABC);

hdc, psc: An optional HDC with the desired font selected into it, plus the SCRIPT_CACHE of that font from a previous call. MSDN actually has a pretty good explanation of how caching works.
pwGlyphs, cGlyphs, psva: These are the things computed by ScriptShape: the glyph indices, the number of glyphs, and the SCRIPT_VISATTR array corresponding to each glyph.
psa: The SCRIPT_ANALYSIS object filled in by ScriptItemize for this run. MSDN says this structure will be modified, but it is not clear in what ways.
piAdvance: Contains an array of the advance widths, one per glyph. This is the amount to advance after drawing the corresponding glyph, to get to the next glyph. Some advance widths may be 0 to cause glyphs to overlap, for example, to combine a base glyph with a combining accent (see pGoffset below). The the “Example input and output” below.
pGoffset: This is an array of GOFFSET structures, one for each item. The GOFFSET indicates an offset amount (horizontally GOFFSET.du and vertially GOFFSET.dv) that the associated glyph should be shifted when it is drawn. Generally, this shifting amount will be 0, but will be nonzero to move combining accents, Hebrew vowel points, or other “decorations” to the correct position relative to the base glyph. The application generally doesn’t have to pay attention to these offsets at all. They are generated by ScriptPlace and used by ScriptTextOut, and all the application needs to do is keep track of the values in the meantime. For example:; ế = e + ˆ + ́; In this example, there are three glyphs, the first two with an advance of 0, and the third with an advance of the width of the combination. This causes the glyphs to be drawn over the top of each other. Depending on the font, however, the position of the accents if rendered over the top of the “e” may not be correct (in this example, the top (acute) accent may not be high enough to fit over the circumflex). In this case, the GOFFSET indicates how they should be moved to produce the proper combination.
pABC: This is a pointer to one ABC structure that contains the width information for the entire run. Normally, the ABC structure tells you how to place a glyph in relation to the surrounding characters.; Here, however, I’m not sure what the use is beyond giving you an easy way to compute the width of the run. It appears that the sum of the ABC widths abc.abcA + abc.abcB + abc.abcC of the run exactly equal the sum of the advance widths of each character in the run. It seems that one is not supposed to treat the A and C widths (normally extra space that one needs to apply to calculations) manually, as these are included in the advance width of the first and the last character, respectively.; Since the application does not account for the advance itself, and the ABC width is not passed to any other Uniscribe functions, I’m not sure how ScriptTextOut knows how to place the first character in relation to the start of the run. Perhaps this is one of the “reserved” fields in the SCRIPT_VISATTR structure, or perhaps this information is stored elsewhere.

Example input and output

This example slows how “écrit” might be represented. Notice that the “e” has no advance, causing the next glyph, an accent, to be drawn over the top. The accent also has a small offset to move it into the appropriate place over the “e”. The advance for the accent takes us over the “e” and to where the “c” should begin.

Input glyph	Output advance	Output offset
e	0	(0,0)
´	16	(1,-2)
c	16	(0,0)
r	11	(0,0)
i	8	(0,0)
t	10	(0,0)

Example code

// Outputs from this function, the two arrays should be same length as the number of glyphs.
std::vector&lt;int&gt; advances;
std::vector&lt;GOFFSET&gt; offsets;
advances.resize(glyphs.size());
offsets.resize(glyphs.size());
ABC abc;

HDC temp_dc = NULL;  // Don't give it a DC unless we have to.
HFONT old_font = NULL;
while (true) {
    HRESULT hr = ScriptPlace(temp_dc,
                             script_cache,
                             &amp;glyphs[0], glyphs.size(),  // From previous call to ScriptShape
                             &amp;visattr[0],   // From previous call to ScriptShape
                             &amp;analysis,     // From previous call to ScriptItemize
                             &amp;advances[0],  // Output: glyph advances
                             &amp;offsets[0],   // Output: glyph offsets
                             &amp;abc);         // Output: size of run
    if (hr != E_PENDING)
        break;  // Done with the call.

    // Need to select the font for the call. Don't do this if we don't have to
    // since it may be slow.
    temp_dc = hdc;
    old_font = SelectObject(hdc, hfont);
}

if (old_font)
    SelectObject(hdc, old_font);  // Put back the previous font.

if (FAILED(hr)) {
    // Handle error...
}

// Outputs from this function, the two arrays should be same length as the number of glyphs.

std::vector<int> advances;

std::vector<GOFFSET> offsets;

advances.resize(glyphs.size());

offsets.resize(glyphs.size());

ABC abc;

HDC temp_dc = NULL; // Don't give it a DC unless we have to.

HFONT old_font = NULL;

while (true) {

HRESULT hr = ScriptPlace(temp_dc,

script_cache,

&glyphs[0], glyphs.size(), // From previous call to ScriptShape

&visattr[0], // From previous call to ScriptShape

&analysis, // From previous call to ScriptItemize

&advances[0], // Output: glyph advances

&offsets[0], // Output: glyph offsets

&abc); // Output: size of run

if (hr != E_PENDING)

break; // Done with the call.

// Need to select the font for the call. Don't do this if we don't have to

// since it may be slow.

temp_dc = hdc;

old_font = SelectObject(hdc, hfont);

}

if (old_font)

SelectObject(hdc, old_font); // Put back the previous font.

if (FAILED(hr)) {

// Handle error...

}

ScriptJustify

MSDN Documentation for ScriptJustify

ScriptJustify allows you to expand text to fit a column width. For most languages, justification is straightforward because one can just distribute the additional space between all the spaces in the line. Given input in English, for example, this is exactly what ScriptJustify will do. Arabic, however, is more complicated. Justification involves adding additional lines called kashidas between certain characters, and ScriptJustify will handle this properly. For example:

ScriptJustify is therefore very good for justification of Latin or Arabic scripts. It will also be good for longer runs of Arabic that have a few Latin-based words in them. ScriptJustify also does not require that its input is only one run, as long as you can collapse your runs to form a single input array, it will distribute spaces appropriately between all the runs on the line. You will have to then expand these back into arrays that correspond to your runs so you can use the rest of the Uniscribe functions.

However, because it will favor adding kashidas rather than spaces between words, if you have some text that is mostly in a Latin script but with one or two Arabic words in it, ScriptJustify probably doesn’t do what you want. It will assign all the extra space to the Arabic word in the form of kashidas, which may make them look overly extended. If you want to handle this case, you may want to do your own algorithm. For example, one approach would be count the number of space-separated words, and distribute the amount of space you are adding to each of the runs in individual calls to ScriptJustify in porportion to the number of words they have. This will distribute space evenly between Arabic kashidas and spaces between Latin-based words.

Parameters

HRESULT ScriptJustify(
const SCRIPT_VISATTR*	psva,
const int*	piAdvance,
int	cGlyphs,
int	iDx,
int	iMinKashida,
int*	piJustify);

psva: An array of SCRIPT_VISATTR structures, one for each glyph, that was computed by ScriptShape.
piAdvance: An array of advance widths computed, one for each glyph, by ScriptPlace.
cGlyphs: The number of glyphs in the run.
iDx: The amount of space that should be added to the input. The MSDN documentation states that this is the length of the desired line, but it is incorrect. You need to measure the total advance of the widths of all the glyphs in all the runs (this is also the sum of the ABC widths of the runs), and subtract that from your desired line width to get iDx.
iMinKashida: The minimum amount of space to assign to a kashida. MSDN offers little guidance on what this is for. In talking to somebody more informed than I am, I found that “it’s the width of the glyph used for the tatweel character U+640 in the current font family and font size.” The reason it needs this is presumably that you haven’t given it an HFONT or an HDC at this point in layout, so it can’t retrieve the value itself. Michael Kaplan also wrote a blog post discussing kashidas which touches on this topic. For what it’s worth, Chrome hard-codes this to 2 and it works OK, though possibly this will cause bugs with extreme sizes.
piJustify: The output array, one for each input glyph, that contains the new widths that the characters take up. This is the old advance width plus any additional width added for justification purposes. You should use this in place of advance widths for many applications, such as measuring the width of the text or using ScriptXtoCP and ScriptCPtoX.; This value is passed into ScriptTextOut in addition to the regular advance widths. I’m guessing, but I assume that ScriptTextOut compares the justified advances with the regular advances. Any extra space is treated according to the SCRIPT_VISATTR.uJustification field associated with each glyph, be it a kashida, a space or a number of other possibilities.

ScriptXtoCP

MSDN Documentation for ScriptXtoCP

ScriptXtoCP converts a pixel offset to a character position given a whole lot of information computed by ScriptPlace (see above). Because ScriptXtoCP operates only on single runs, you will need to skip over whole runs yourself until you find the run with the given offset. Only then can you call this function. See the example on how to do this.

This function is not very difficult to call, you mostly just have to collect all the information you collected previously, so I’ll refer to the MSDN documentation for this. Note that if the X position occurs before any characters, the return value will be -1.

One tricky thing is that if you called ScriptJustify to expand the glyphs, the advances returned by ScriptPlace won’t represent where the glyphs actually are and you’ll get incorrect results. In this case, you should instead pass in the piJustify array computed by ScriptJustify for the piAdvance parameter.

Example code

This example assumes your input runs are in the order returned by ScriptItemize, which is the same order as the input. However, the presence of right-to-left text will mean that these items should actually be displayed in a different order. This example uses the visual_to_logical lookup table computed in the example for ScriptLayout above.

// Call with an X coordinate relative to the left-hand-side of the list of runs.
int callScriptXtoCP(int x)
{
    // Go to the next-to-last item to skip the dummy item at the end.
    // (See ScriptItemize above).
    for (int i = 0; i &lt; items.size() - 1; i++) {
        int item_index = visual_to_logical[i];

        // Look up the information you previously stored about run |item_index|
        // This example assumes you have arrays called |logs|, |visattrs|,
        // |abcs|, etc. which in turn contain the corresponding data of list of
        // data indexed by the logical run index.

	// See if the X position is in the current run. We assume that the text
	// has not been justified so that the ABC width of the run is the width
	// of it on the screen. If this is not the case, we'd need to add up
	// the justified advances as returned by ScriptJustify to find the
	// screen width of it.
        int run_width = abcs[item_index].abcA + abcs[item_index].abcB +
                abcs[item_index].abcC
        if (x &gt; run_width) {
            // Not in the current run, adjust the X position to account for this
            // run and go to the next one.
            x -= run_width;
            continue;
        }

        int item_char_length = items[item_index + 1].iCharPos - items[item_index].iCharPos;
        // Assume we have a glyphs[] array that stores the array of glyphs for each run.
        int item_glyph_length = glyphs[item_index].size();

        // This code assumes you haven't called ScriptJustify. If you did, you
        // should use the justified widths returned by that instead of the
        // advances returned by ScriptPlace.
        int cp, trailing;
        ScriptXtoCP(x, item_char_length, item_glyph_length, &amp;logs[item_index][0],
                    &amp;visattrs[item_index][0], &amp;advances[item_index][0],
                    &amp;items[item_index].a, &amp;cp, &amp;trailing);
        return cp;
    }
    // X position is not in the text, you'll have to decide what to do here...
    return YO_MAMA;
}

// Call with an X coordinate relative to the left-hand-side of the list of runs.

int callScriptXtoCP(int x)

{

// Go to the next-to-last item to skip the dummy item at the end.

// (See ScriptItemize above).

for (int i = 0; i < items.size() - 1; i++) {

int item_index = visual_to_logical[i];

// Look up the information you previously stored about run |item_index|

// This example assumes you have arrays called |logs|, |visattrs|,

// |abcs|, etc. which in turn contain the corresponding data of list of

// data indexed by the logical run index.

// See if the X position is in the current run. We assume that the text

// has not been justified so that the ABC width of the run is the width

// of it on the screen. If this is not the case, we'd need to add up

// the justified advances as returned by ScriptJustify to find the

// screen width of it.

int run_width = abcs[item_index].abcA + abcs[item_index].abcB +

abcs[item_index].abcC

if (x > run_width) {

// Not in the current run, adjust the X position to account for this

// run and go to the next one.

x -= run_width;

continue;

}

int item_char_length = items[item_index + 1].iCharPos - items[item_index].iCharPos;

// Assume we have a glyphs[] array that stores the array of glyphs for each run.

int item_glyph_length = glyphs[item_index].size();

// This code assumes you haven't called ScriptJustify. If you did, you

// should use the justified widths returned by that instead of the

// advances returned by ScriptPlace.

int cp, trailing;

ScriptXtoCP(x, item_char_length, item_glyph_length, &logs[item_index][0],

&visattrs[item_index][0], &advances[item_index][0],

&items[item_index].a, &cp, &trailing);

return cp;

}

// X position is not in the text, you'll have to decide what to do here...

return YO_MAMA;

}

ScriptCPtoX

MSDN Documentation for ScriptCPtoX

ScriptCPtoX converts character positions to offsets. Like ScriptXtoCP, the parameters, though numerous, are not very difficult to figure out, so I’ll mostly refer you to the MSDN documentation.

The key to calling this function is that it only handles one run, so that you have to manually compute the advance for the runs preceding it to the left on the screen (this information is computed by ScriptLayout).

As with ScriptXtoCP, if you have justified the text, you should pass the justified advances returned by ScriptJustify for the advance parameter of ScriptCPtoX to get the correct results.

Example code

This example assumes you have the visual_to_logical lookup table computed in the example above for ScriptLayout.

int callScriptCPtoX(int cp)
{
    if (cp &lt; 0 || cp &gt;= input_char_length) {
        // Figure out what to return in these error case.
        return YO_MAMA;
    }

    // First, find the logical run that contains the given character.
    int run_index = -1;
    for (int i = 0; i &lt; items.size() - 1; i++) {
        // Check the starting point of the following run to see if it's contained
        // within ours. The magic last run has no length, so we don't need to
        // worry about checking for characters in it.
        if (i &lt; items[i + 1].iCharPos) {
            run_index = i;
            break;
        }
    }
    if (run_index &lt; 0)
        return YO_MAMA;  // Some error

    // Figure out the X position within the run of the given character.
    int item_char_length = items[run_index + 1].iCharPos - items[run_index].iCharPos;
    int x_within_run;

    // We assume the glyphs have not been justified. If they have been, see above.
    ScriptCPtoX(cp - items[run_index].iCharPos,  // Offset within this run.
                false,  // Use leading edge (normally what you want)
                item_char_len,
                glyphs[run_index].size(),  // Glyph length of run.
                &amp;logs[run_index][0], &amp;visattrs[run_index][0],
                &amp;advances[run_index][0], &amp;items[run_index].a,
                &amp;x_within_run);

    // Now that we have the offset within that run, we need to compute the
    // total width of all runs to the left of it on the screen. We iterate all
    // runs in screen order, adding up their widths, until we find the one we
    // used above.
    int preceding_width = 0;
    for (int i = 0; i &lt; items.size() - 1; i++) {
        if (visual_to_logical[i] == run_index) {
            // Found the run, so everything to the left of it, plus the offset
            // within the run, is the final answer.
            return preceding_width + x_within_run;
        }

        // We assume that the text has not been justified so that the ABC width
        // of the run is the width of it on the screen. If this is not the
        // case, we'd need to add up the justified advances as returned by
        // ScriptJustify to find the screen width of it.
        const ABC&amp; abc = abcs[visual_to_logical[i]];
        preceding_width += abc.abcA + abc.abcB + abc.abcC;
    }

    // Error, who knows what to do.
    return YO_MAMA;
}

int callScriptCPtoX(int cp)

{

if (cp < 0 || cp >= input_char_length) {

// Figure out what to return in these error case.

return YO_MAMA;

}

// First, find the logical run that contains the given character.

int run_index = -1;

for (int i = 0; i < items.size() - 1; i++) {

// Check the starting point of the following run to see if it's contained

// within ours. The magic last run has no length, so we don't need to

// worry about checking for characters in it.

if (i < items[i + 1].iCharPos) {

run_index = i;

break;

}

if (run_index < 0)

return YO_MAMA; // Some error

// Figure out the X position within the run of the given character.

int item_char_length = items[run_index + 1].iCharPos - items[run_index].iCharPos;

int x_within_run;

// We assume the glyphs have not been justified. If they have been, see above.

ScriptCPtoX(cp - items[run_index].iCharPos, // Offset within this run.

false, // Use leading edge (normally what you want)

item_char_len,

glyphs[run_index].size(), // Glyph length of run.

&logs[run_index][0], &visattrs[run_index][0],

&advances[run_index][0], &items[run_index].a,

&x_within_run);

// Now that we have the offset within that run, we need to compute the

// total width of all runs to the left of it on the screen. We iterate all

// runs in screen order, adding up their widths, until we find the one we

// used above.

int preceding_width = 0;

for (int i = 0; i < items.size() - 1; i++) {

if (visual_to_logical[i] == run_index) {

// Found the run, so everything to the left of it, plus the offset

// within the run, is the final answer.

return preceding_width + x_within_run;

}

// We assume that the text has not been justified so that the ABC width

// of the run is the width of it on the screen. If this is not the

// case, we'd need to add up the justified advances as returned by

// ScriptJustify to find the screen width of it.

const ABC& abc = abcs[visual_to_logical[i]];

preceding_width += abc.abcA + abc.abcB + abc.abcC;

}

// Error, who knows what to do.

return YO_MAMA;

}

只回答业务咨询

学习日记，兼职软件设计，软件修改，毕业设计。

本文出自学习日记，转载时请注明出处及相应链接。

本文永久链接: https://www.softwareace.cn/?p=250

« Uniscribe绘制复杂文本的说明

Uniscribe Sample »

Uniscribe: The Missing Documentation & Examples

Uniscribe: The Missing Documentation & Examples

Index

Introduction

Why should you use Uniscribe?

Overview

Disclaimer

ScriptItemize

Parameters

Example input and output

Example code

ScriptLayout

Example input and output

Example code

ScriptShape

Parameters

Example input and output

Example code

ScriptPlace

Parameters

Example input and output

Example code

ScriptJustify

Parameters

ScriptXtoCP

Example code

ScriptCPtoX

Example code

发表评论取消回复

分类目录

业务咨询站长

Uniscribe: The Missing Documentation & Examples

Uniscribe: The Missing Documentation & Examples

Index

Introduction

Why should you use Uniscribe?

Overview

Disclaimer

ScriptItemize

Parameters

Example input and output

Example code

ScriptLayout

Example input and output

Example code

ScriptShape

Parameters

Example input and output

Example code

ScriptPlace

Parameters

Example input and output

Example code

ScriptJustify

Parameters

ScriptXtoCP

Example code

ScriptCPtoX

Example code

发表评论 取消回复

分类目录

业务咨询站长

发表评论取消回复