Ticket #156 (new defect)
python: support '|S1' for string data
| Reported by: | Cpascual | Owned by: | Paul Kienzle |
|---|---|---|---|
| Priority: | minor | Milestone: | NeXus 4.2 Release |
| Component: | other bindings | Version: | 4.2rc2 |
| Keywords: | numpy dtype char | Cc: |
Description
It would be nice if '|S1' was accepted as a synonym of 'char' when passing dtype to napi functions. See attached patch to 4.2rc2 napi.py
Rationale: Currently one can use numpy.dtype objects wherever a napi function requires dtype except for the case of string data. This is because the numpy-->nexus data type conversion is done through
_nxtype_code[str(type)]
In the case of numpy 1-character arrays, str(dtype) returns '|S1', so at least this case can be covered pretty easily by using the proposed patch.
Note that this only works for numpy 1-character arrays, not for python strings. But at least one can convert to an array using:
numpy.array(tuple(string))
Attachments
Change History
comment:1 Changed 3 years ago by Freddie Akeroyd
- Owner changed from Unassigned to Paul Kienzle
- Milestone set to NeXus 4.2 Release
comment:2 Changed 3 years ago by Cpascual
On a deeper test, I see that my patch is not sufficient... some other functions such as _pinput do comparisons looking for "char" but then _is_string_like() evaluates to False when using numpy arrays of characters...
Still it would be nice if we could somehow get rid of the inconsistent handling of "char" VS the other datatypes that are numpy.
comment:3 Changed 3 years ago by Paul Kienzle
Before trying to fix this, I would like to understand why you are passing numpy character arrays instead of python strings.
Getting the interface to act consistently so that you get the same value back on read that you put in on write could be tricky.
comment:4 Changed 3 years ago by Cpascual
The ability to deal with numpy string arrays as the input would be interesting for two reasons:
1) It makes the API more consistent (one would not need act differently depending on whether is using 'char' or other types)
2)It paves the path to multidimensional character arrays.
Regarding the output: IMHO, the output should be always the same regardless of the input, and only dependent on the data shape (i.e., on what is returned by NXgetinfo(). I think that the most consistent option is to always return a numpy array of strings:
If the data shape is [m], then the output should be a zero-dimensional numpy array containing a string of length m
If the data shape is [n,m], then the output should be a n-long 1D numpy array of strings of length m.
If the data shape is [k,n,m] then the output should be a 2D array [with shape=(k,n)] of strings of length m.
...
As an exception it could be reasonable to implement it in such a way that data with shape [m] returns a python string instead of a numpy array of shape=(,). But, IMHO, always returning numpy makes it more self-consistent. The user can always convert the output (or slices of it) to strings using the .tostring() method of the numpy arrays.
I am currently working on my own implementation of this. I will post it as soon as it is ready.
Changed 3 years ago by Cpascual
-
attachment
patch-napi.py_4rc2.diff
added
proposed patch to support arrays of strings in the python bindings
