Locale Segmentation and Bidi

The library is designed for multilingual text, not just ASCII paragraphs.

Locale-aware segmentation

PretextLayout.SetLocale lets you override the locale used for segmentation. On desktop targets the implementation can use ICU word breaking when available, which improves token boundaries for scripts that do not behave well under simple whitespace splitting.

This matters most for:

  • Thai and other scripts that do not rely on ordinary spaces between words
  • multilingual paragraphs where punctuation attachment differs by script
  • tests where you want stable expectations independent of machine culture

If you do not call SetLocale, Pretext uses the current UI culture when available.

Bidirectional text

Prepared results can also carry segment levels for mixed-direction text. That matters when a line contains Latin text, Arabic, Hebrew, punctuation, and other content that should still wrap predictably.

When bidi content is present:

  • PreparedTextWithSegments.SegmentLevels contains one level per segment
  • segments can still be line-fit deterministically
  • materialized line text remains aligned with the same cursor boundaries used by streamed layout

Practical advice

  • Use the current UI culture unless you have a stronger content-specific locale.
  • Call SetLocale in tests when you need stable expectations.
  • Validate sample strings from your actual product languages, not just English.

What the current implementation covers

The shipped tests explicitly cover:

  • Arabic punctuation attachment
  • Arabic punctuation-plus-mark clusters
  • Devanagari danda attachment
  • Myanmar punctuation and possessive markers
  • CJK punctuation and iteration marks
  • mixed-direction smoke tests
  • Thai locale-sensitive segmentation

Further reading

The behavior in this area is informed by Unicode guidance: