I just discovered the library and really like it so far, great job! When tinkering a bit, I tried to work with the SCAT dataset and copied this test case into my notebook:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 3
1 from traffic.data.datasets.scat import SCAT
----> 3 s = SCAT("scat20161112_20161118.zip", nflights=10)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/traffic/data/datasets/scat.py:121, in SCAT.__init__(self, ident, nflights)
118 if "grib_meteo" in file_info.filename:
119 continue
--> 121 entry = self.parse_zipinfo(zf, file_info)
122 flights.append(entry.flight)
123 flight_plans.append(entry.flight_plan)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/traffic/data/datasets/scat.py:64, in SCAT.parse_zipinfo(self, zf, file_info)
58 decoded = json.loads(content_bytes.decode())
59 flight_id = str(decoded["id"]) # noqa: F841
61 flight_plan = (
62 pd.json_normalize(decoded["fpl"]["fpl_plan_update"])
63 .rename(columns=rename_columns)
---> 64 .eval(
65 """
66 timestamp = @pd.to_datetime(timestamp, utc=True, format="mixed", errors="coerce")
67 flight_id = @flight_id
68 """
69 )
70 )
72 clearance = (
73 pd.json_normalize(decoded["fpl"]["fpl_clearance"])
74 .rename(columns=rename_columns)
(...)
80 )
81 )
83 fpl_base, *_ = decoded["fpl"]["fpl_base"]
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/pandas/core/frame.py:4937, in DataFrame.eval(self, expr, inplace, **kwargs)
4934 kwargs["target"] = self
4935 kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers
-> 4937 return _eval(expr, inplace=inplace, **kwargs)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/pandas/core/computation/eval.py:357, in eval(expr, parser, engine, local_dict, global_dict, resolvers, level, target, inplace)
355 eng = ENGINES[engine]
356 eng_inst = eng(parsed_expr)
--> 357 ret = eng_inst.evaluate()
359 if parsed_expr.assigner is None:
360 if multi_line:
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/pandas/core/computation/engines.py:81, in AbstractEngine.evaluate(self)
78 self.result_type, self.aligned_axes = align_terms(self.expr.terms)
80 # make sure no names in resolvers and locals/globals clash
---> 81 res = self._evaluate()
82 return reconstruct_object(
83 self.result_type, res, self.aligned_axes, self.expr.terms.return_type
84 )
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/pandas/core/computation/engines.py:121, in NumExprEngine._evaluate(self)
119 scope = env.full_scope
120 _check_ne_builtin_clash(self.expr)
--> 121 return ne.evaluate(s, local_dict=scope)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/numexpr/necompiler.py:975, in evaluate(ex, local_dict, global_dict, out, order, casting, sanitize, _frame_depth, **kwargs)
973 return re_evaluate(local_dict=local_dict, _frame_depth=_frame_depth)
974 else:
--> 975 raise e
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/numexpr/necompiler.py:877, in validate(ex, local_dict, global_dict, out, order, casting, _frame_depth, sanitize, **kwargs)
874 arguments = getArguments(names, local_dict, global_dict, _frame_depth=_frame_depth)
876 # Create a signature
--> 877 signature = [(name, getType(arg)) for (name, arg) in
878 zip(names, arguments)]
880 # Look up numexpr if possible.
881 numexpr_key = expr_key + (tuple(signature),)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/numexpr/necompiler.py:877, in <listcomp>(.0)
874 arguments = getArguments(names, local_dict, global_dict, _frame_depth=_frame_depth)
876 # Create a signature
--> 877 signature = [(name, getType(arg)) for (name, arg) in
878 zip(names, arguments)]
880 # Look up numexpr if possible.
881 numexpr_key = expr_key + (tuple(signature),)
File ~/miniconda3/envs/traffic/lib/python3.10/site-packages/numexpr/necompiler.py:717, in getType(a)
715 if kind == 'U':
716 raise ValueError('NumExpr 2 does not support Unicode as a dtype.')
--> 717 raise ValueError("unknown type %s" % a.dtype.name)
I tried again in a clean conda environment and the error disappeared. It probably was caused by another package that installed the "numexpr" engine. When calling .eval() this is the default evaluation engine (see: pandas docs) and it falls back to "python" when the "numexpr" engine is not found. Some expressions do not seem to be supported by numexpr, therefore to avoid similar issues, I would suggest to explicitly set the engine to "python" for all cases where the numexpr engine will fail. (See also this discussion). This is what fixed the error for me:
flight_plan = (
pd.json_normalize(decoded["fpl"]["fpl_plan_update"])
.rename(columns=rename_columns)
.eval(
"""
timestamp = @pd.to_datetime(timestamp, utc=True, format="mixed")
flight_id = @flight_id
""", engine="python"
)
)
Alternatively, one could also use python code for these cases directly. This might be more work to implement but ultimately the best option, as it also increases readability and errors are easier to debug. What do you think?
Hey Xavier,
I just discovered the library and really like it so far, great job! When tinkering a bit, I tried to work with the SCAT dataset and copied this test case into my notebook:
traffic/tests/test_datasets.py
Line 5 in e8cabd6
to my surprise I got this error:
I tried again in a clean conda environment and the error disappeared. It probably was caused by another package that installed the "numexpr" engine. When calling
.eval()this is the default evaluation engine (see: pandas docs) and it falls back to "python" when the "numexpr" engine is not found. Some expressions do not seem to be supported by numexpr, therefore to avoid similar issues, I would suggest to explicitly set the engine to "python" for all cases where the numexpr engine will fail. (See also this discussion). This is what fixed the error for me:Example from SCAT:
Alternatively, one could also use python code for these cases directly. This might be more work to implement but ultimately the best option, as it also increases readability and errors are easier to debug. What do you think?