Skip to content

Update data tables to Unicode 7.0.0#6

Closed
jiahao wants to merge 5 commits intoJuliaStrings:masterfrom
jiahao:cjh/markdata
Closed

Update data tables to Unicode 7.0.0#6
jiahao wants to merge 5 commits intoJuliaStrings:masterfrom
jiahao:cjh/markdata

Conversation

@jiahao
Copy link
Collaborator

@jiahao jiahao commented Jul 17, 2014

Updates:

  1. Updates the data_generator.rb script. This script now runs on a modern version of ruby (>1.8) and has the hard-coded data tables replaced with file reads from the appropriate Unicode data (UNIDATA) files.
  2. Provides a new Makefile target, update, which automatically downloads the relevant UNIDATA and runs data_generator.rb to produce the file utf8proc_data.c.new.
  3. Updates utf8proc_data.c to the output generated by running make update against UNIDATA v7.0.0

Observations:

  1. There are #defined constants in utf8proc.c which may in principle have changed from v5.0 to v7.0, such as the constants marking the location of Hangul, Unihan, etc. I haven't checked them and it's probably not worth recomputing for each new Unicode version.
  2. It looks like utf8proc implements an internal processing mode called LUMP, which is briefly described in lump.txt. As far as I can tell, this is a custom normalization mode which is separate from the Unicode standard, but I think we'll want to use these.

Ref: #1

@jiahao jiahao changed the title XXX Marking data locations Update data tables to Unicode 7.0.0 Jul 18, 2014
@jiahao
Copy link
Collaborator Author

jiahao commented Jul 18, 2014

I managed to bork this PR.

@jiahao jiahao closed this Jul 18, 2014
@jiahao
Copy link
Collaborator Author

jiahao commented Jul 18, 2014

Replaced by #9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant