Update data tables to Unicode 7.0.0 by jiahao · Pull Request #6 · JuliaStrings/utf8proc

jiahao · 2014-07-17T22:37:26Z

Updates:

Updates the data_generator.rb script. This script now runs on a modern version of ruby (>1.8) and has the hard-coded data tables replaced with file reads from the appropriate Unicode data (UNIDATA) files.
Provides a new Makefile target, update, which automatically downloads the relevant UNIDATA and runs data_generator.rb to produce the file utf8proc_data.c.new.
Updates utf8proc_data.c to the output generated by running make update against UNIDATA v7.0.0

Observations:

There are #defined constants in utf8proc.c which may in principle have changed from v5.0 to v7.0, such as the constants marking the location of Hangul, Unihan, etc. I haven't checked them and it's probably not worth recomputing for each new Unicode version.
It looks like utf8proc implements an internal processing mode called LUMP, which is briefly described in lump.txt. As far as I can tell, this is a custom normalization mode which is separate from the Unicode standard, but I think we'll want to use these.

Ref: #1

jiahao · 2014-07-18T14:16:27Z

I managed to bork this PR.

jiahao · 2014-07-18T14:17:24Z

Replaced by #9.

jiahao added 5 commits July 17, 2014 15:32

Mark location of CaseFolding.txt data

f0943b4

Remove utf8proc_data.c (generated by data_generator.rb)

76b96f1

Mark Default_Ignorable_Code_Point data

ba5d970

Mark Grapheme_Extend data

d78ced6

Mark composition exclusion characters

e55defc

jiahao mentioned this pull request Jul 18, 2014

Update data_generator script #8

Closed

jiahao changed the title ~~XXX Marking data locations~~ Update data tables to Unicode 7.0.0 Jul 18, 2014

jiahao closed this Jul 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update data tables to Unicode 7.0.0#6

Update data tables to Unicode 7.0.0#6
jiahao wants to merge 5 commits intoJuliaStrings:masterfrom
jiahao:cjh/markdata

jiahao commented Jul 17, 2014

Uh oh!

jiahao commented Jul 18, 2014

Uh oh!

jiahao commented Jul 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiahao commented Jul 17, 2014

Uh oh!

jiahao commented Jul 18, 2014

Uh oh!

jiahao commented Jul 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant