Ticket #156 (new defect)

Opened 3 years ago

Last modified 3 years ago

python: support '|S1' for string data

Reported by: Cpascual Owned by: Paul Kienzle
Priority: minor Milestone: NeXus 4.2 Release
Component: other bindings Version: 4.2rc2
Keywords: numpy dtype char Cc:

Description

It would be nice if '|S1' was accepted as a synonym of 'char' when passing dtype to napi functions. See attached patch to 4.2rc2 napi.py

Rationale: Currently one can use numpy.dtype objects wherever a napi function requires dtype except for the case of string data. This is because the numpy-->nexus data type conversion is done through

_nxtype_code[str(type)]

In the case of numpy 1-character arrays, str(dtype) returns '|S1', so at least this case can be covered pretty easily by using the proposed patch.

Note that this only works for numpy 1-character arrays, not for python strings. But at least one can convert to an array using:

numpy.array(tuple(string))

Attachments

patch.diff Download (532 bytes) - added by Cpascual 3 years ago.
patch-napi.py_4rc2.diff Download (3.0 KB) - added by Cpascual 3 years ago.
proposed patch to support arrays of strings in the python bindings

Change History

Changed 3 years ago by Cpascual

comment:1 Changed 3 years ago by Freddie Akeroyd

  • Owner changed from Unassigned to Paul Kienzle
  • Milestone set to NeXus 4.2 Release

comment:2 Changed 3 years ago by Cpascual

On a deeper test, I see that my patch is not sufficient... some other functions such as _pinput do comparisons looking for "char" but then _is_string_like() evaluates to False when using numpy arrays of characters...

Still it would be nice if we could somehow get rid of the inconsistent handling of "char" VS the other datatypes that are numpy.

comment:3 Changed 3 years ago by Paul Kienzle

Before trying to fix this, I would like to understand why you are passing numpy character arrays instead of python strings.

Getting the interface to act consistently so that you get the same value back on read that you put in on write could be tricky.

comment:4 Changed 3 years ago by Cpascual

The ability to deal with numpy string arrays as the input would be interesting for two reasons:

1) It makes the API more consistent (one would not need act differently depending on whether is using 'char' or other types)

2)It paves the path to multidimensional character arrays.

Regarding the output: IMHO, the output should be always the same regardless of the input, and only dependent on the data shape (i.e., on what is returned by NXgetinfo(). I think that the most consistent option is to always return a numpy array of strings:

If the data shape is [m], then the output should be a zero-dimensional numpy array containing a string of length m

If the data shape is [n,m], then the output should be a n-long 1D numpy array of strings of length m.

If the data shape is [k,n,m] then the output should be a 2D array [with shape=(k,n)] of strings of length m.

...

As an exception it could be reasonable to implement it in such a way that data with shape [m] returns a python string instead of a numpy array of shape=(,). But, IMHO, always returning numpy makes it more self-consistent. The user can always convert the output (or slices of it) to strings using the .tostring() method of the numpy arrays.

I am currently working on my own implementation of this. I will post it as soon as it is ready.

Changed 3 years ago by Cpascual

proposed patch to support arrays of strings in the python bindings

comment:5 Changed 3 years ago by Paul Kienzle

I posted a fix to the handling of character storage which should allow the above. Please check that it works on the desired use cases. If not, post code showing what you want to support.

Note: See TracTickets for help on using tickets.