Text Fragment Rendering

24 October 2025, 16:15

Text fragment rendering displays specific portions of document text (defined by character ranges) with all inline embeds resolved to their actual content.

Overview

When embedding a document fragment with a range (e.g., hm://account/doc#blockId[10:50]), the system needs to:

Extract characters 10-50 from the block's text
Resolve any inline embeds within that range
Display the resulting text with embed names

The Challenge

Fragment ranges are specified using unicode code point positions in the original text, which:

Counts actual text characters
Excludes invisible inline embed markers (U+FEFF)
Uses unicode code points (not UTF-16 code units)

But we want to display text where:

Inline embeds are replaced with document names
Character positions map correctly to the original range

Solution: FragmentText Component

Component Location

frontend/packages/ui/src/document-content.tsx

How It Works

function FragmentText({
  documentId,
  blockRef,
  start,
  end,
}: {
  documentId: UnpackedHypermediaId
  blockRef: string
  start: number
  end: number
})

Process:

Fetch Full Text: Call getDocumentText with the blockRef and resolveInlineEmbeds: true

getDocumentText(
  {...documentId, blockRef, blockRange: null},
  {lineBreaks: false, resolveInlineEmbeds: true}
)

Extract Fragment: Use Array.from() to properly handle unicode code points

const codePoints = Array.from(fullText)
const fragment = codePoints.slice(start, end).join('')

Display: Render the extracted text

<Text className="whitespace-pre-wrap">{fragment}</Text>

Integration with ContentEmbed

The ContentEmbed component detects text fragments and renders them appropriately:

// Check if this is a text fragment (blockRef with start/end range)
const isTextFragment =
  props.blockRef &&
  props.blockRange &&
  'start' in props.blockRange &&
  'end' in props.blockRange

if (isTextFragment && props.blockRef && props.blockRange &&
    'start' in props.blockRange && 'end' in props.blockRange) {
  // Render as plain text with resolved embeds
  content = (
    <FragmentText
      documentId={narrowHmId(props)}
      blockRef={props.blockRef}
      start={props.blockRange.start}
      end={props.blockRange.end}
    />
  )
} else {
  // Normal block rendering
  // ...
}

Example Scenarios

Example 1: Simple Text Fragment

Original block text:

"Hello world, this is a test paragraph with some content."

Fragment: #blockId[0:11]

Result: "Hello world"

Example 2: Fragment with Inline Embed

Original block text (with invisible markers):

"Check out \uFEFF post about AI!"
// Position: 0-9, [10 = embed], 11-25

With inline embed resolved:

"Check out @Alice's Guide post about AI!"

Fragment: #blockId[0:20]

Result: "Check out @Alice's Guide pos" (first 20 unicode code points)

Example 3: Multiple Inline Embeds

Original text:

"Read \uFEFF and \uFEFF for more info"
// [Read ][embed1][ and ][embed2][ for more info]

With embeds resolved:

"Read @Getting Started and @Advanced Topics for more info"

Fragment: #blockId[0:25]

Result: First 25 code points with both embed names included

Character Position Mapping

Key Concepts

Original Positions: Defined in the blockRange, count actual text excluding embed markers
Resolved Text: After documentToText processes it, embeds become their document names
Unicode Code Points: Use Array.from() to properly count multi-byte characters

Why Array.from()?

JavaScript strings are UTF-16 encoded. Emojis and special characters may use multiple UTF-16 code units:

// Wrong: UTF-16 code units
"Hello 👋".length // 7 (emoji uses 2 code units)

// Correct: Unicode code points
Array.from("Hello 👋").length // 6 (emoji is 1 code point)

API Endpoint Support

Fragment rendering works seamlessly in both desktop and web:

Desktop

Direct grpcClient access
Synchronous document fetching
Immediate text resolution

Web

Server-side API: /hm/api/document-text
Accepts blockRef in URL parameters
Returns resolved text via JSON

Component States

Loading

if (loading) {
  return (
    <div className="flex items-center justify-center p-2">
      <Spinner />
    </div>
  )
}

Error

if (error) {
  return <ErrorBlock message={`Failed to load fragment: ${error}`} />
}

Success

return (
  <Text className="whitespace-pre-wrap">
    {text}
  </Text>
)

Usage in Embeds

When a user creates an embed with a text range:

Editor: User selects text range in a block
Link Creation: System creates link like hm://account/doc#blockId[10:50]
Rendering:
- ContentEmbed detects the range
- FragmentText fetches and extracts text
- Displays resolved fragment

Performance Considerations

Caching

Consider caching getDocumentText results
Fragment extraction is fast (O(n) where n = text length)
Network requests may be slow on web

Optimization

Only fetch when fragment changes
UseEffect dependencies include all ID components
Loading state prevents UI jank

Dependencies

useEffect(() => {
  // ...
}, [
  getDocumentText,
  documentId.uid,
  documentId.path?.join('/'),
  documentId.version,
  blockRef,
  start,
  end
])

Testing

Test scenarios to verify:

Basic fragment extraction: Simple text without embeds
Single inline embed: Fragment includes embed name
Multiple inline embeds: All embeds resolved correctly
Unicode handling: Emojis and special characters
Boundary cases: start=0, end=text.length
Error handling: Missing blocks, network errors

Related Files

frontend/packages/ui/src/document-content.tsx - FragmentText component
frontend/packages/shared/src/document-to-text.ts - Text resolution
frontend/packages/shared/src/document-content-types.ts - Type definitions
frontend/apps/web/app/routes/hm.api.document-text.tsx - Web API