Re: Indexing, encoding, transformations and processing with PyArrow - GitHub 6284

Athanassios I. Hatzis Mon, 27 Jan 2020 23:36:53 -0800

On Mon, 2020-01-27 at 10:25 -0600, Wes McKinney wrote:

>I asked to move this discussion here because we use the dev@ and user@
> mailing list for discussions (this is explained in the GitHub issue
> template 
> https://github.com/apache/arrow/blob/master/.github/ISSUE_TEMPLATE.md)


Sure, I noticed this, but then I can hardly find any reason for opening an 
issue at GitHub. As a
user I find a lot easier to open and track an issue for replies at GitHub than 
registering and
searching in email lists and in my opinion it's a lot easier and far more 
efficient for other users
too, especially newcomers, to search and find relevant answers. By the way how 
am I supposed to
search, view this user list online from a Web explorer GUI like the one at 
GitHub, is there a web
link ? 

> treated as a valid floating point value in algorithms like dictionary_encode

Hi Wes, I was not aware that np.nan and None are not treated equivalently 
thanks for illustrating
this with your Notebook. I can understand the logic behind this but it has 
serious flaws that
originate from SQL, implementation of Codd's relational theory. 

This is one of the reasons that I am promoting Associative Semiotic Hypergraph 
as an alternative
data model for processing data in queries. Associations (hyperedge set 
connecting n data items) are
the equivalent of table records but null values are excluded. Therefore in my 
system dictionary
should always be clean from missing values. Anyway as you suggest I need to 
maintain some custom
code for this.

There was also the following question in my email that was not answered.
> > I also noticed that there is NumPy integration and you can convert easily 
> > from NumPy to Arrow
> > but
> > the reverse direction has several limitations. For example I cannot create 
> > view for StringArray
> > (NotImplementedError: NumPy array view is only supported for primitive 
> > types). But string()
> > (utf8)
> > is in the list of your primitive types. Any plans for supporting this type 
> > with NumPy soon ?

Could you please suggest or point to a piece of code on how to convert 
arrow.StringArray to numpy
for further processing ? Do I have to forget the view with the to_numpy() 
method and make a copy in
order to process it, modify it in NumPy ?


Thank you for your time

Athan

Re: Indexing, encoding, transformations and processing with PyArrow - GitHub 6284

Reply via email to