Make WordPress Core

Opened 3 months ago

Closed 8 weeks ago

Last modified 5 weeks ago

#61072 closed enhancement (fixed)

HTML API: Add custom text decoder

Reported by: dmsnell's profile dmsnell Owned by: dmsnell's profile dmsnell
Milestone: 6.6 Priority: normal
Severity: normal Version: 6.6
Component: HTML API Keywords: has-patch needs-dev-note
Focuses: Cc:

Description

Provide a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's html_entity_decode() function:

  • It isn't aware of 720 of the possible named character references in HTML, leaving many out that should be translated.
  • It isn't aware of the ambiguous ampersand rule, which allows conversion of character references in certain contexts when they are missing their closing ;.
  • It doesn't draw a distinction for the ambiguous ampersand rule when decoding attribute values instead of markup values.

This decoder will also provide some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.

Change History (7)

#1 @dmsnell
3 months ago

  • Milestone changed from Awaiting Review to 6.6

I'm not sure why the PR isn't auto-linking: https://github.com/WordPress/wordpress-develop/pull/6387

This ticket was mentioned in Slack in #core-performance by dmsnell. View the logs.


2 months ago

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


2 months ago

This ticket was mentioned in Slack in #core by oglekler. View the logs.


2 months ago

#5 @dmsnell
8 weeks ago

  • Owner set to dmsnell
  • Resolution set to fixed
  • Status changed from new to closed

In 58281:

HTML API: Add custom text decoder.

Provides a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's html_entity_decode() function:

  • It isn't aware of 720 of the possible named character references in HTML, leaving many out that should be translated.
  • It isn't aware of the ambiguous ampersand rule, which allows conversion of character references in certain contexts when they are missing their closing ;.
  • It doesn't draw a distinction for the ambiguous ampersand rule when decoding attribute values instead of markup values.
  • Use of html_entity_decode() requires manually passing non-default paramter values to ensure it decodes properly.

This decoder also provides some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.

Developed in https://github.com/WordPress/wordpress-develop/pull/6387
Discussed in https://core.trac.wordpress.org/ticket/61072

Props dmsnell, gziolo, jonsurrell, jorbin, westonruter, zieladam.
Fixes #61072.

#6 @dmsnell
5 weeks ago

  • Keywords needs-dev-note added

#7 @dmsnell
5 weeks ago

Dev note incorporated into Updates to the HTML API

Note: See TracTickets for help on using tickets.