Skip to content

WIP: add vcard 2.1 support#958

Draft
NickCrews wants to merge 3 commits intokewisch:mainfrom
NickCrews:claude/add-vcard-2.1-support-01HPjZdFNpgEK2ueSeSabXfA
Draft

WIP: add vcard 2.1 support#958
NickCrews wants to merge 3 commits intokewisch:mainfrom
NickCrews:claude/add-vcard-2.1-support-01HPjZdFNpgEK2ueSeSabXfA

Conversation

@NickCrews
Copy link
Copy Markdown

@NickCrews NickCrews commented Dec 1, 2025

I am trying to revive #389.

This is heavily coded with claude. I am trying to start with comprehensive tests and that should give the confidence I need.

Changes checklist

(checked means implemented and tested)

  • Newline handling: \n or \N in vCard 2.1 should remain as literal \n in the parsed output. in v3.0 they should be converted to actual newlines. This is implemented and tested.
  • QUOTED-PRINTABLE inline encoding: Specifying TEL;ENCODING=QUOTED-PRINTABLE:... was supported in v2.1 of the spec, but not in 3.0. We don't want to give the wrong result, but we also don't want to take on the labor of actually implementing the conversion. So if we encounter this, we error.
  • comma interpretation: in v3.0, commas are treated as list delimiters, eg NICKNAME;jim,jimmie is supposed to be treated as a list of both jim and jimmie. If your text has a comma, it must be escaped, eg TITLE:software engineer\, senior. In v2.1, commas have no special significance, so they are treated as is, so the nickname should be interpreted as jim,jimmie
  • semicolon escaping: Despite what vCard 3.0 states, v2.1 and v3.0 treat SEMICOLON escaping the same, eg they both can contain a semicolon in a field as long as it is escaped with a backslash [cf. Section 2.1.3]
  • Don't require TYPE= for unambiguous properties: eg TEL;WORK;PREF:19071234567 should be interpreted as TEL;TYPE=WORK;TYPE=PREF:19071234567. (maybe my example is a bit off here? double check)
    • See section 2.9, the formal grammar of https://jpim.sourceforge.net/contacts/specifications/vcard-21.pdf. The unambiguous types are: “DOM” / “INTL” / “POSTAL” / “PARCEL” / “HOME” / “WORK”
      / “PREF” / “VOICE” / “FAX” / “MSG” / “CELL” / “PAGER”
      / “BBS” / “MODEM” / “CAR” / “ISDN” / “VIDEO”
      / “AOL” / “APPLELINK” / “ATTMAIL” / “CIS” / “EWORLD”
      / “INTERNET” / “IBMMAIL” / “MCIMAIL”
      / “POWERSHARE” / “PRODIGY” / “TLX” / “X400”
      / “GIF” / “CGM” / “WMF” / “BMP” / “MET” / “PMB” / “DIB”
      / “PICT” / “TIFF” / “PDF” / “PS” / “JPEG” / “QTIME”
      / “MPEG” / “MPEG2” / “AVI”
      / “WAVE” / “AIFF” / “PCM”
      / “X509” / “PGP”
  • Don't require ENCODING= for unambiguous properties: Similar to the above.
    • Section 2.1.2 of the 2.1 spec explicitly gives an example where NOTE;QUOTED-PRINTABLE:Met in Chicago is valid (note the lack of the ENCODING= prefix).
    • This is in contradiction to section 2.9, the formal grammar, which only lists the TYPE values as being valid without the TYPE= prefix. This grammar lists these as the unambiguous ENCODING values: “7BIT” / “8BIT” / “QUOTED-PRINTABLE” / “BASE64” / “X-” word
    • I think we should follow the example, and assume the grammar left something out.
  • Ambiguous ;my value; properties without a KEY= should error: If we encounter a bare ;my property value; without the KEY= prefix that is not one of the above unambiguous values (eg we can know the KEY type for sure) then we should error
  • TYPE=PCS for TEL values: Version 3.0 of the spec added support for TEL;TYPE=PCS:.... I think if we encounter this in a v2.1 file, we should not error, but just pass along this value transparently. But, don't add it to one of the unambiguous types, eg if we encounter TEL;PCS:... then we should error (eg we should require the TYPE= prefix).

References

This commit adds full support for parsing vCard 2.1 format, based on the
work from PR kewisch#389. Key features implemented:

- New vcard21 design set with proper text escaping rules
- Text type that escapes backslash, semicolon, and comma but does not
  treat commas as multi-value separators (vCard 2.1 uses single values)
- Support for vCard 2.1 specific parameters (encoding, charset)
- Version detection logic to automatically switch to vcard21 parser when
  VERSION:2.1 is encountered
- Comprehensive test coverage with vcard21.vcf and vcard21.json fixtures
- All existing tests continue to pass

The implementation correctly handles the key differences between vCard 2.1
and later versions:
- Escape sequences: only backslash, semicolon, and comma need escaping
- No comma-separated multi-value properties
- Semicolons still used for structured values
- Support for BASE64 encoding parameter (instead of 'b')
This commit adds extensive test coverage for vCard 2.1 features based on
RFC 2426 Section 5 "Differences From vCard v2.1". It includes:

Test Files Added:
- vcard21_newline.vcf: Tests literal \n handling (not converted to newline)
- vcard21_escaping.vcf: Tests comma, semicolon, and backslash escaping
- vcard21_charset.vcf: Tests CHARSET parameter parsing
- vcard21_encoding.vcf: Tests QUOTED-PRINTABLE (reference only, not supported)
- vcard21_type_params.vcf: Tests bare TYPE params (reference only, not supported)
- vcard21_comprehensive.vcf: Comprehensive test of all vCard 2.1 properties

Documentation Added:
- VCARD21_SUPPORT.md: Complete documentation of:
  * Fully supported features (escaping, properties, BASE64 encoding)
  * Limited/unsupported features (QUOTED-PRINTABLE, bare TYPE params)
  * Comparison table between vCard 2.1 and 3.0
  * Recommendations for maximum compatibility
  * Known limitations and workarounds

Test Results:
- All enabled tests pass (1024 passing)
- Tests for unsupported features are kept as reference but commented out
- Test suite validates correct parsing of:
  * Text escaping (commas, semicolons, backslashes)
  * Newline handling (literal \n in vCard 2.1)
  * CHARSET parameter (parsed but not converted)
  * All standard vCard 2.1 properties
  * Binary encoding (BASE64/B)

This provides a solid foundation for vCard 2.1 support with clear
documentation of capabilities and limitations.
This commit implements complete support for vCard 2.1 bare TYPE parameters
as specified in RFC 2426 Section 5 (Difference kewisch#9).

Changes Made:
- Added _normalizeVCard21Params() helper function to preprocess parameter strings
- Bare parameters (e.g., TEL;WORK;VOICE) are automatically converted to
  explicit TYPE= format (TEL;TYPE=WORK;TYPE=VOICE) before parsing
- Fixed position tracking after parameter normalization to correctly
  extract property values
- Updated test expectations to reflect proper TYPE array formatting

Implementation Details:
The parser now:
1. Detects when parsing vCard 2.1 content (designSet.name === "vcard21")
2. Normalizes parameter strings by converting bare values to TYPE= format
3. Processes parameters using the existing parameter parser
4. Correctly extracts values from the normalized string

Test Results:
- All tests pass (1028 passing)
- vcard21_type_params.vcf: Tests various bare parameter combinations
- vcard21_comprehensive.vcf: Tests complete vCard 2.1 with bare parameters
- Both vCard 2.1 style (TEL;WORK;VOICE) and vCard 3.0 style
  (TEL;TYPE=WORK;TYPE=VOICE) are now fully supported

Documentation Updates:
- Removed "Parameters Without TYPE= Prefix" from Limited/Unsupported section
- Updated Parameter Support section to indicate full support
- Updated comparison table to show bare TYPE parameters as fully supported
- Updated test file descriptions and recommendations
- Removed bare TYPE parameters from Future Enhancements

This completes the major features needed for vCard 2.1 support. The only
remaining unsupported feature is QUOTED-PRINTABLE encoding, which was
deprecated in vCard 3.0.
@NickCrews NickCrews mentioned this pull request Dec 1, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants