Locale Segmentation and Bidi

The library is designed for multilingual text, not just ASCII paragraphs.

Locale-aware segmentation

PretextLayout.SetLocale lets you override the locale used for segmentation. On desktop targets the implementation can use ICU word breaking when available, which improves token boundaries for scripts that do not behave well under simple whitespace splitting.

This matters most for:

Thai and other scripts that do not rely on ordinary spaces between words
multilingual paragraphs where punctuation attachment differs by script
tests where you want stable expectations independent of machine culture

If you do not call SetLocale, Pretext uses the current UI culture when available.

Bidirectional text

Prepared results can also carry segment levels for mixed-direction text. That matters when a line contains Latin text, Arabic, Hebrew, punctuation, and other content that should still wrap predictably.

When bidi content is present:

PreparedTextWithSegments.SegmentLevels contains one level per segment
segments can still be line-fit deterministically
materialized line text remains aligned with the same cursor boundaries used by streamed layout

Practical advice

Use the current UI culture unless you have a stronger content-specific locale.
Call SetLocale in tests when you need stable expectations.
Validate sample strings from your actual product languages, not just English.

What the current implementation covers

The shipped tests explicitly cover:

Arabic punctuation attachment
Arabic punctuation-plus-mark clusters
Devanagari danda attachment
Myanmar punctuation and possessive markers
CJK punctuation and iteration marks
mixed-direction smoke tests
Thai locale-sensitive segmentation

Locale Segmentation and Bidi

Locale-aware segmentation

Bidirectional text

Practical advice

What the current implementation covers

Further reading