Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 74 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,45 +25,50 @@ middleman spellcheck source/blog/

## Usage

You can spellcheck only some resources using a regex with the URL:
There are several ways to select what content will be checked.

```ruby
activate :spellcheck, page: "documentation/*" # you can use regexes, too, e.g. /post_[1-9]/
```
1. To spellcheck only some resources using a regex with the URL:

You can limit which tags the spell checker will only run through:
```ruby
activate :spellcheck, page: "documentation/*" # you can use regexes, too, e.g. /post_[1-9]/
```

```ruby
activate :spellcheck, tags: :p # pass an array of tags if you have more!
```
2. To limit which tags the spell checker will only run through:

If there are some words that you would like to be allowed
```ruby
activate :spellcheck, tags: :p # pass an array of tags if you have more!
```

```ruby
activate :spellcheck, allow: ["Gooby", "pls"]
```
3. To ignore sections by using css selectors
For example, to ignore all sections with a class of `CodeRay`:

You can also add allowed words to the front-matter through the
`spellcheck-allowed` keyword. Example:
```ruby
activate :spellcheck, ignore_selector: '.CodeRay'
```

```
title: "Some time ago"
...
spellcheck-allowed:
- GitHub
- Linux
```
Or to ignore all tables in a document:

```ruby
activate :spellcheck, ignore_selector: 'table'
```

Or to ignore all `<p class="technical-jargon">`:
```ruby
activate :spellcheck, ignore_selector: 'p.technical-jargon'
```

Look into section "Fixing spelling mistakes" to help yourself with fixing
spelling problems in already existing articles.
To ignore multiple selectors, seperate them with a comma
```ruby
activate :spellcheck, ignore_selector: 'p.technical-jargon, .CodeRay'
```

Middleman-spellcheck automatically ignores `.css`, `.js`, & `.coffee` file
4. Middleman-spellcheck automatically ignores `.css`, `.js`, & `.coffee` file
extensions. If there are some additional file type extensions that you would
like to skip:

```ruby
activate :spellcheck, ignored_exts: [".xml", ".png"]
```
```ruby
activate :spellcheck, ignored_exts: [".xml", ".png"]
```

To select a dictionary used by a spellchecker, use lang: option. For
example, to use Polish dictionary, use:
Expand All @@ -74,12 +79,15 @@ activate :spellcheck, lang: "pl"

If you define the ``lang`` metadata in your pages / articles, then spellcheck will use those language.

Middleman-spellcheck can issue many warnings if you run it over a new
content. If you want to give yourself a chance to fix mistakes gradually and
not fail each time you build, use :dontfail flag:
## Options


For warnings only (allow build to pass), use `dontfail` option.
This is helpful when you want to give yourself a chance to fix mistakes or false hits gradually and
not fail each time you build.

```ruby
activate :spellcheck, lang: "en", dontfail: 1
activate :spellcheck, dontfail: 1
```

You can also disable the automatic spellcheck after build (and only run manual checks from the command line):
Expand All @@ -103,10 +111,22 @@ who encountered issues, useful might be debug: option, which will turn on
extensive amount of debugging.

```ruby
activate :spellcheck, debug: 1
activate :spellcheck, lang: "en", debug: 1
```

If there are some words that you would like to be allowed you can pass them to the allow option as an array. **Depricated - Please see the "Fixing spelling mistakes" section for now prefered way to include allowed words

```ruby
activate :spellcheck, allow: ["Gooby", "pls"]
```

You can also pass a regex to the `ignore_regex` option. Any match will be ignored.
For example to remove words in quotes
```ruby
activate :spellcheck, lang: "en", ignore_regex: /\s('|")\w*('|")(\s|\.|,)/
```

## Fixing spelling mistakes
## Fixing spelling mistakes & false positives

The `middleman-spellchecker` extension is likely to generate large number
of false-positives, e.g.: words which the spellchecker will consider
Expand All @@ -118,25 +138,40 @@ names. To solve this, `middleman-spellcheck` offers two solutions:
containing words considered correct. Author of the website may decide which
words are allowed to be used site-wide. Example: if you write a lot about
IBM products, this file would have names such as "IBM", "AIX" or "DB/2".
Add the words one word per line without quotes.

To set the global file, use the following clause in your `config.rb`:

```set :spellcheck_allow_file, "./data/words_allowed.txt"```

2. The `spellcheck-allow` keyword in a frontmatter, which will work in the
context of this particular article, but not other articles. Example: your
blog is about IBM, but 1 article is about AirBnB. You'd put `AirBnB` into
your front-matter.

To set the global file, use the following clause in your `config.rb`:

set :spellcheck_allow_file, "./data/words_allowed.txt"

To use 2nd solution, add the following to your frontmatter:

spellcheck-allow:
- "AirBnB"
```
title: "Blog about IBM"
...
spellcheck-allow:
- "AirBnB"```

Another example

```
title: "Some time ago"
...
spellcheck-allowed:
- GitHub
- Linux
```

The `middleman-spellcheck` also comes with a simple CLI for fixing many
problems in your articles. To invoke:

middleman spellcheck source/blog/2015-11-01-nginx-on-travis-ci.md --fix
```middleman spellcheck source/blog/2015-11-01-nginx-on-travis-ci.md --fix```

This will pull up simple CLI menu and for each misspelled word, you'll have
a following choice
Expand Down
30 changes: 23 additions & 7 deletions lib/middleman-spellcheck/extension.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ class SpellcheckExtension < Extension
option :tags, [], "Run spellcheck only on some tags from the output"
option :allow, [], "Allow specific words to be misspelled"
option :ignored_exts, [], "Ignore specific extensions (ex: '.xml')"
option :ignore_regex, false, "Ignore regex matches"
option :ignore_selector, false, "Ignore nodes with a css selector"
option :lang, "en", "Language for spellchecking"
option :cmdargs, "", "Pass alternative command line arguments"
option :debug, 0, "Enable debugging (for developers only)"
option :dontfail, 0, "Don't fail when misspelled words are found"
option :dontfail, false, "Don't fail because misspelled words are found"
option :run_after_build, true, "Run Spellcheck after build"

def after_build(builder)
Expand All @@ -33,13 +35,14 @@ def after_build(builder)
total_misspelled += current_misspelled
end

unless total_misspelled.empty?
estr = "Build failed. There are spelling errors."
if options.dontfail != 0
print "== :dontfail set! Will issue warning only, but not fail.\n"
print estr, "\n"
builder.say_status :spellcheck, "Spellchecks done. #{total_misspelled.length} misspelling(s) found.", :blue

unless total_misspelled.empty?
if options.dontfail
builder.say_status :spellcheck, "dontfail is set! Builder will ignore misspellings.", :yellow
else
raise Thor::Error, estr
desc = "Build failed. There are spelling errors."
raise Thor::Error, desc
end
end
end
Expand All @@ -49,13 +52,25 @@ def select_content(resource)
doc = Nokogiri::HTML.fragment(rendered_resource)
doc.search('code,style,script').each(&:remove)

if options.ignore_selector
doc.css(options.ignore_selector).each(&:remove)
end

if options.tags.empty?
doc.text
else
select_tagged_content(doc, option_tags)
end
end

def regex_filter_content(text)
if options.ignore_regex
text.to_s.gsub options.ignore_regex , ' '
else
text
end
end

def option_tags
if options.tags.is_a? Array
options.tags
Expand Down Expand Up @@ -91,6 +106,7 @@ def spellcheck_resource(resource)

def run_check(resource, lang)
text = select_content(resource)
text = regex_filter_content(text)
results = Spellchecker.check(text, lang)
results = exclude_allowed(resource, results)
results.reject { |entry| entry[:correct] }
Expand Down
5 changes: 4 additions & 1 deletion lib/middleman-spellcheck/spellchecker.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# encoding: utf-8

class Spellchecker
@@aspell_path = "aspell"
@@aspell_cmdargs = ""
Expand Down Expand Up @@ -76,7 +78,8 @@ def self.check(text, lang)
text.gsub! '’', '\''
sdbg "self.check got raw text:\n#{text}\n"

words = text.split(/[^\p{L}']+/).select { |s|
#Split words and
words = text.split(/'?[^\p{L}']+'?/).select { |s|
s != "" and s != "'s" and s != "'"
}.uniq
sdbg "self.check word array:\n#{words}\n"
Expand Down