Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-04 Thread Micah Kornfield
FWIW, I lack historical context on how these methods evolved, so I'd appreciate insight from anyone who has worked on the java codebase for a longer period of time. The current situation seems less then ideal. On Tue, Aug 4, 2020 at 12:55 AM Ji Liu wrote: > Hi all, > > > When I worked on

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-04 Thread Micah Kornfield
I think a stronger case needs to be made for adding a new builtin type to support this. Can you provide concrete use-cases? Why can't dates outside of the one representable by int64 be truncated (even for nano precision 64-bits max value is is over 200 years in the future)? It seems like in

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-04 Thread Fan Liya
Hi Ji, This sounds like a universal requirement, as 64-bit is not sufficient to hold the precision for nano-second. For the extension type, we have two choices: 1. Extending struct(int64, int32), which represents the design of SoA (Struct of Arrays). 2. Extending fixed width binary(12), which

[DISCUSS] How to extended time value range for Timestamp type?

2020-08-04 Thread Ji Liu
Hi all, Now in Arrow Timestamp type, it support different TimeUnit(seconds, milliseconds, microseconds, nanoseconds) with int64 type for storage. In most cases this is enough, but if the timestamp value range of external system exceeds int64_t::max, then it's impossible to directly convert to

Arrow sync call August 5 at 12:00 US/Eastern, 16:00 UTC

2020-08-04 Thread Neal Richardson
Hi all, Reminder that our biweekly call is tomorrow at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be sent out to the mailing list afterward. Neal

Re: [DISCUSS] Execute dataset scan tasks in distributed system

2020-08-04 Thread Joris Van den Bossche
Hi Hongze, I am not too familiar with distributed systems in general, but I did work on using the Arrow Dataset API in the python Dask library which can work in a distributed way (https://dask.org/). For dask, we used the second idea of sending serialized data to the workers, but on the level of

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-04-0

2020-08-04 Thread Crossbow
Arrow Build Report for Job nightly-2020-08-04-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-04-0 Failed Tasks: - conda-linux-gcc-py36-cpu: URL:

[DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-04 Thread Ji Liu
Hi all, When I worked on ARROW-7539[1], I met some problems and not sure what's the proper way to solve it. This issue was about to avoid set reader/writer indices in FieldVector#getFieldBuffers according to the following reasons: i. getBuffers set reader/writer indices and it's right for the

Re: [DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type

2020-08-04 Thread Kenta Murata
Agreed. I made ARROW-9642 and its pull-request. https://github.com/apache/arrow/pull/7898 2020年8月4日(火) 6:32 Wes McKinney : > > It seems useful to use the index type to set the starting bit width of > the builder. I guess we can preserve the behavior of expanding to the > next bit width when