Skip to content

Elements Between

Internal utility elements_between() used by Annotation, Reference, TrackedChange.

Functions:

Name Description
elements_between

Return elements located between two specified marker elements (start and end).

_clean_inner_list

_clean_inner_list(inner: list[Element]) -> list[Element]

Internal helper to clean a list of elements by removing unwanted tags.

Specifically targets tags related to tracked changes and reference marks.

Parameters:

Name Type Description Default
inner list[Element]

The list of elements to clean.

required

Returns:

Type Description
list[Element]

list[Element]: A new list with unwanted elements removed or stripped.

Source code in odfdo/elements_between.py
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
def _clean_inner_list(inner: list[Element]) -> list[Element]:
    """Internal helper to clean a list of elements by removing unwanted tags.

    Specifically targets tags related to tracked changes and reference marks.

    Args:
        inner: The list of elements to clean.

    Returns:
        list[Element]: A new list with unwanted elements removed or stripped.
    """
    CLEAN_TAGS = (
        "text:change",
        "text:change-start",
        "text:change-end",
        "text:reference-mark",
        "text:reference-mark-start",
        "text:reference-mark-end",
    )
    request_self = " | ".join([f"self::{tag}" for tag in CLEAN_TAGS])
    result: list[Element] = [e for e in inner if not e.xpath(request_self)]
    request = " | ".join([f"descendant::{tag}" for tag in CLEAN_TAGS])
    for element in result:
        to_del = element.xpath(request)
        for elem in to_del:
            if isinstance(elem, Element):
                element.delete(elem)
    return result

_common_ancestor

_common_ancestor(
    root: Element,
    tag1: str,
    attr1: str,
    val1: str,
    tag2: str,
    attr2: str,
    val2: str,
) -> Element | None

Internal helper to find the common ancestor of two elements in the XML tree.

The elements are identified by their tag, attribute, and value.

Parameters:

Name Type Description Default
root Element

The root element from which to start the search.

required
tag1 str

The tag name of the first element.

required
attr1 str

The attribute name of the first element.

required
val1 str

The value of the attribute for the first element.

required
tag2 str

The tag name of the second element.

required
attr2 str

The attribute name of the second element.

required
val2 str

The value of the attribute for the second element.

required

Returns:

Type Description
Element | None

Element | None: The common ancestor element, or None if not found.

Source code in odfdo/elements_between.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def _common_ancestor(
    root: Element,
    tag1: str,
    attr1: str,
    val1: str,
    tag2: str,
    attr2: str,
    val2: str,
) -> Element | None:
    """Internal helper to find the common ancestor of two elements in the XML tree.

    The elements are identified by their tag, attribute, and value.

    Args:
        root: The root element from which to start the search.
        tag1: The tag name of the first element.
        attr1: The attribute name of the first element.
        val1: The value of the attribute for the first element.
        tag2: The tag name of the second element.
        attr2: The attribute name of the second element.
        val2: The value of the attribute for the second element.

    Returns:
        Element | None: The common ancestor element, or `None` if not found.
    """
    request1 = f'descendant::{tag1}[@{attr1}="{val1}"]'
    request2 = f'descendant::{tag2}[@{attr2}="{val2}"]'
    ancestor = root.xpath(request1)[0]
    if ancestor is None:
        return None
    while True:
        # print "up",
        new_ancestor = ancestor.parent
        if new_ancestor is None:
            return None
        has_tag2 = new_ancestor.xpath(request2)
        ancestor = new_ancestor
        if not has_tag2:
            continue
        # print 'found'
        break
    # print up.serialize()
    return ancestor

_find_any_id

_find_any_id(element: Element) -> tuple[str, str, str]

Internal helper to find any ID attribute and its value for a given element.

It iterates through a predefined list of common ODF ID attributes.

Parameters:

Name Type Description Default
element Element

The element to search for an ID.

required

Returns:

Type Description
tuple[str, str, str]

tuple[str, str, str]: A tuple containing the element’s tag, the name of the found ID attribute, and its string value.

Raises:

Type Description
ValueError

If no recognized ID attribute is found on the element.

Source code in odfdo/elements_between.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def _find_any_id(element: Element) -> tuple[str, str, str]:
    """Internal helper to find any ID attribute and its value for a given element.

    It iterates through a predefined list of common ODF ID attributes.

    Args:
        element: The element to search for an ID.

    Returns:
        tuple[str, str, str]: A tuple containing the element's tag, the name
            of the found ID attribute, and its string value.

    Raises:
        ValueError: If no recognized ID attribute is found on the element.
    """
    for attribute in (
        "text:id",
        "text:change-id",
        "text:name",
        "office:name",
        "text:ref-name",
        "xml:id",
    ):
        idx = element.get_attribute(attribute)
        if idx is not None:
            return element.tag, attribute, str(idx)
    raise ValueError(f"No Id found in {element.serialize()}")

_get_between_base

_get_between_base(
    element: Element, tag1: Element, tag2: Element
) -> list[Element]

Internal helper to extract elements between two specified markers (tag1, tag2).

This function finds the common ancestor of tag1 and tag2, then traverses the XML tree between them, collecting all elements.

Parameters:

Name Type Description Default
element Element

The starting element for the search (usually the document root).

required
tag1 Element

The starting marker element.

required
tag2 Element

The ending marker element.

required

Returns:

Type Description
list[Element]

list[Element]: A list of elements found between tag1 and tag2.

Raises:

Type Description
RuntimeError

If no common ancestor is found, or if the traversal fails to find an expected element.

Source code in odfdo/elements_between.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
def _get_between_base(
    element: Element,
    tag1: Element,
    tag2: Element,
) -> list[Element]:
    """Internal helper to extract elements between two specified markers (`tag1`, `tag2`).

    This function finds the common ancestor of `tag1` and `tag2`, then traverses
    the XML tree between them, collecting all elements.

    Args:
        element: The starting element for the search (usually the document root).
        tag1: The starting marker element.
        tag2: The ending marker element.

    Returns:
        list[Element]: A list of elements found between `tag1` and `tag2`.

    Raises:
        RuntimeError: If no common ancestor is found, or if the traversal fails
            to find an expected element.
    """
    elem1_tag, elem1_attr, elem1_val = _find_any_id(tag1)
    elem2_tag, elem2_attr, elem2_val = _find_any_id(tag2)
    ancestor_result = _common_ancestor(
        element.root,
        elem1_tag,
        elem1_attr,
        elem1_val,
        elem2_tag,
        elem2_attr,
        elem2_val,
    )
    if ancestor_result is None:
        raise RuntimeError(f"No common ancestor for {elem1_tag!r} and {elem2_tag!r}")
    ancestor = ancestor_result.clone
    path1 = f'{elem1_tag}[@{elem1_attr}="{elem1_val}"]'
    path2 = f'{elem2_tag}[@{elem2_attr}="{elem2_val}"]'
    result = ancestor.clone
    for child in result.children:
        result.delete(child)
    result.text = ""
    result.tail = ""
    target = result
    current = ancestor.children[0]

    state = 0
    while True:
        if current is None:
            raise RuntimeError(
                f"No current ancestor for {elem1_tag!r} and {elem2_tag!r}"
            )
        # print 'current', state, current.serialize()
        if state == 0:  # before tag 1
            if current.xpath(f"descendant-or-self::{path1}"):
                if current.xpath(f"self::{path1}"):
                    tail = current.tail
                    if tail:
                        # got a tail => the parent should be either text:p or text:h
                        target.text = tail
                    current, target = _get_successor(current, target)  # type: ignore
                    state = 1
                    continue
                # got T1 in children, need further analysis
                new_target = current.clone
                for child in new_target.children:
                    new_target.delete(child)
                new_target.text = ""
                new_target.tail = ""
                target._Element__append(new_target)  # type: ignore[attr-defined]
                target = new_target
                current = current.children[0]
                continue
            else:
                # before tag1 : forget element, go to next one
                current, target = _get_successor(current, target)  # type: ignore
                continue
        else:  # collect elements
            further = False
            if current.xpath(f"descendant-or-self::{path2}"):
                if current.xpath(f"self::{path2}"):
                    # end of trip
                    break
                # got T2 in children, need further analysis
                further = True
            # further analysis needed :
            if further:
                new_target = current.clone
                for child in new_target.children:
                    new_target.delete(child)
                new_target.text = ""
                new_target.tail = ""
                target._Element__append(new_target)  # type: ignore[attr-defined]
                target = new_target
                current = current.children[0]
                continue
            # collect
            target._Element__append(current.clone)  # type: ignore[attr-defined]
            current, target = _get_successor(current, target)  # type: ignore
            continue
    # Now resu should be the "parent" of inserted parts
    # - a text:h or text:p single item (simple case)
    # - a upper element, with some text:p, text:h in it => need to be
    #   stripped to have a list of text:p, text:h
    if result.tag in {"text:p", "text:h"}:
        inner = [result]
    else:
        inner = result.children
    return inner

_get_successor

_get_successor(
    element: Element, target: Element
) -> tuple[Element | None, Element | None]

Internal helper to find the logical successor of an element in the XML tree.

This function attempts to find the next sibling. If no next sibling exists, it traverses up to the parent and tries to find the successor of the parent.

Parameters:

Name Type Description Default
element Element

The current element to find the successor for.

required
target Element

The corresponding element in the target structure, used to maintain context during recursion.

required

Returns:

Type Description
tuple[Element | None, Element | None]

tuple[Element | None, Element | None]: A tuple containing the successor element and its corresponding target element, or (None, None) if no successor is found.

Source code in odfdo/elements_between.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def _get_successor(
    element: Element, target: Element
) -> tuple[Element | None, Element | None]:
    """Internal helper to find the logical successor of an element in the XML tree.

    This function attempts to find the next sibling. If no next sibling exists,
    it traverses up to the parent and tries to find the successor of the parent.

    Args:
        element: The current element to find the successor for.
        target: The corresponding element in the target structure,
            used to maintain context during recursion.

    Returns:
        tuple[Element | None, Element | None]: A tuple containing the successor
            element and its corresponding target element, or (None, None) if no
            successor is found.
    """
    next_u_element = element._xml_element.getnext()
    if next_u_element is not None:
        return Element.from_tag(next_u_element), target
    parent = element.parent
    if parent is None:
        return None, None
    return _get_successor(parent, target.parent)  # type: ignore[arg-type]

_no_header_inner_list

_no_header_inner_list(
    inner: list[Element],
) -> list[Element]

Internal helper to convert header elements (text:h) to paragraph elements (text:p).

Parameters:

Name Type Description Default
inner list[Element]

The list of elements to process.

required

Returns:

Type Description
list[Element]

list[Element]: A new list where text:h elements are replaced by text:p elements, preserving their content.

Source code in odfdo/elements_between.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def _no_header_inner_list(inner: list[Element]) -> list[Element]:
    """Internal helper to convert header elements (`text:h`) to paragraph elements (`text:p`).

    Args:
        inner: The list of elements to process.

    Returns:
        list[Element]: A new list where `text:h` elements are replaced by `text:p`
            elements, preserving their content.
    """
    result: list[Element] = []
    for element in inner:
        if element.tag == "text:h":
            children = element.children
            text = element._xml_element.text
            para = Element.from_tag("text:p")
            para.text = text or ""
            for child in children:
                para._xml_append(child)
            result.append(para)
        else:
            result.append(element)
    return result

elements_between

elements_between(
    base: Element,
    start: Element,
    end: Element,
    as_text: bool = False,
    clean: bool = True,
    no_header: bool = True,
) -> list | str

Return elements located between two specified marker elements (start and end).

This function extracts a segment of the XML tree defined by the start and end elements. These markers should be unique and possess an ID attribute.

Parameters:

Name Type Description Default
base Element

The base element to search within (e.g., the document body).

required
start Element

The starting marker element.

required
end Element

The ending marker element.

required
as_text bool

If True, returns the concatenated text content of the elements. Otherwise, returns a list of Element objects.

False
clean bool

If True, cleans the extracted elements by removing unwanted tags (e.g., tracked changes marks).

True
no_header bool

If True, converts any text:h (header) elements within the extracted content to text:p (paragraph) elements.

True

Returns:

Type Description
list | str

list | str: A list of Element objects between the markers, or a string if as_text is True.

Raises:

Type Description
RuntimeError

If start or end elements are not found or if no common ancestor can be determined (propagated from internal helpers).

Source code in odfdo/elements_between.py
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
def elements_between(
    base: Element,
    start: Element,
    end: Element,
    as_text: bool = False,
    clean: bool = True,
    no_header: bool = True,
) -> list | str:
    """Return elements located between two specified marker elements (`start` and `end`).

    This function extracts a segment of the XML tree defined by the `start` and `end`
    elements. These markers should be unique and possess an ID attribute.

    Args:
        base: The base element to search within (e.g., the document body).
        start: The starting marker element.
        end: The ending marker element.
        as_text: If True, returns the concatenated text content of the
            elements. Otherwise, returns a list of `Element` objects.
        clean: If True, cleans the extracted elements by removing unwanted
            tags (e.g., tracked changes marks).
        no_header: If True, converts any `text:h` (header) elements
            within the extracted content to `text:p` (paragraph) elements.

    Returns:
        list | str: A list of `Element` objects between the markers,
            or a string if `as_text` is True.

    Raises:
        RuntimeError: If `start` or `end` elements are not found or if no
            common ancestor can be determined (propagated from internal helpers).
    """
    inner = _get_between_base(base, start, end)

    if clean:
        inner = _clean_inner_list(inner)
    if no_header:  # crude replace text:h by text:p
        inner = _no_header_inner_list(inner)
    if as_text:
        return "\n".join([e.get_formatted_text() for e in inner])
    else:
        return inner