Empirical Variogram: Update and Refactoring#106
Conversation
* implemented a prototype for the unstructured function with angles. * minor fix with variable name * added docstring and arguments for angle estimation * added a working version which tests the basic functionality * docstring adaption * changed for automatic testcase data generation and split up the test cases * changed default angle tolerance to 25deg * added option to also return the counts (number of pairs) from unstructured * implemented a meaningful test for 2d variogram estimation * bugfix for 3d case when elevation is 90° or 270° * implemented some basic 3d test cases * vario: cleanup cython routines; use greate-circle for tolerance in 3D; check both directions between point pairs * vario: doc update; correct intervals for angles; general formatting of angles array * vario: better handling of angle ranges * vario: fix wrong assumption about hemisphere for angles Co-authored-by: MuellerSeb <mueller.seb@posteo.de>
…es (pos array; nD dist; angle checks)
…ariant to vario_estimate
…eter to provide an external mask
|
@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError. EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations. |
| if not (isnan(f[m,k]) or isnan(f[m,j])): | ||
| counts[d, i] += 1 | ||
| variogram[d, i] += estimator_func(f[m,k] - f[m,j]) | ||
|
|
There was a problem hiding this comment.
One could argue to incorporate a break here (for d), since if a point pair matches one direction, it (maybe) shouldn't be used in another direction as well. This only happens if the angels_tol is big enough, so two directions have an overlapping search area. @LSchueler your opinion?
There was a problem hiding this comment.
Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.
There was a problem hiding this comment.
For speed up reasons we could check all angles between directions and if their minimum is greater than 2 * angels_tol, we could break ;-)
There was a problem hiding this comment.
Fixed that by checking if the directions are separated. If the search bands overlap, point pairs are counted repeatedly.
With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is... |
LSchueler
left a comment
There was a problem hiding this comment.
Once again a tremendous effort! Thanks for that.
I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.
I hope you don't mind that I pushed some small typo-fixes directly onto this branch.
I think if you update the changelog, I'd be happy about a merge.
| vec[:, i] *= np.cos(angles[:, (i - 1)]) | ||
| if dim in [2, 3]: | ||
| vec[:, [0, 1]] = vec[:, [1, 0]] # to match convention in 2D and 3D | ||
| return vec |
There was a problem hiding this comment.
I hope I will never have to fix a bug in here ;-)
There was a problem hiding this comment.
As I said: rotation in higher dimensions (>1) is a pain in the neck!
| if not (isnan(f[m,k]) or isnan(f[m,j])): | ||
| counts[d, i] += 1 | ||
| variogram[d, i] += estimator_func(f[m,k] - f[m,j]) | ||
|
|
There was a problem hiding this comment.
Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.
If the field is a masked array and the mask there is entirely True, a 0 valued variogram is returned. If an external mask was given that is enterly true with an ordinary field, an Error was raised. That was the inconsistency. The now returned variogram is constantly 0 and the counts are 0 as well. I think this is meaningful, since it says: No data! |
Thanks ❤️
I think it is now in line with all other routines, since the
Of course not. My english is not se goodest.
I'll do a changelog later in develop, so I can mention the PRs. |
…es in cython; some code refactoring
This PR reworks the whole variogram estimation subpackage:
vario_estimateinstead ofvario_estimate_unstructured(old kept for legacy code) for simplicityvario_estimate_axisinstead ofvario_estimate_structured(old kept for legacy code) for simplicityvario_estimateno_dataoption added to allow missing valuesmaskkeyword was added to provide an external maskvario_estimate_axisno_dataoption added to allow missing values (sovles vario_estimate_structured returns only nan if nan points in the field #83)"x","y","z") or axis number (0,1,2,3, ...)Thanks to @TobiasGlaubach for starting this (#87) and to @LSchueler for providing the first implementation of the new distance calculation in cython (#82). This also re-implements #104, which makes to original PR obsolete.