Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-09-06 Thread via GitHub


lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-3262366875

   We will be presenting our recent work at cppcon 2025: 
https://cppcon2025.sched.com/event/27bQx/reflection-based-json-in-c++-at-gigabytes-per-second


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-07-24 Thread via GitHub


WillAyd commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-3114230912

   I've started looking at this, but found that SIMDJSON has very little 
support for serialization (see the upstream 
[discussion](https://github.com/simdjson/simdjson/discussions/2086))
   
   It looks like some initial work to implement serialization support was 
merged just [last week](https://github.com/simdjson/simdjson/pull/2282), but 
that requires C++26 and hasn't made its way into an official release.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2810028915

   @pitrou See PR https://github.com/simdjson/simdjson/pull/2365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809987618

   @pitrou Point taken. You are correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809985297

   > The simdjson library is used by major systems like the Node.js JavaScript 
runtime, [ClickHouse](https://github.com/ClickHouse/ClickHouse), 
[StarRocks](https://github.com/StarRocks/starrocks), 
[Velox](https://velox-lib.io), and so forth. It supports 32-bit systems, big 
endian systems, and so forth.
   
   I see, thanks. You might want to document platform support, because 
[https://github.com/simdjson/simdjson/blob/master/doc/basics.md#requirements] 
seems a bit more restrictive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809976587

   @pitrou 
   
   >  it doesn't reduce the current platform compatibility (...) we currently 
need to be compatible with 32-bit systems. It's not obvious that simdjson 
allows that.
   
   The simdjson library is used by major systems like the Node.js JavaScript 
runtime,  [ClickHouse](https://github.com/ClickHouse/ClickHouse), 
[StarRocks](https://github.com/StarRocks/starrocks), 
[Velox](https://velox-lib.io), and so forth. It supports 32-bit systems, big 
endian systems, and so forth.
   
   
   Node.js code sample:
   
   
https://github.com/nodejs/node/blob/af85f3f169b5ce151ad4662cc74775e8c2973e1d/src/node_config_file.cc#L168-L188
   
   Should you want to consider simdjson, the core simdjson will be happy to 
provide support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809866592

   And, by the way, we currently need to be compatible with 32-bit systems. 
It's not obvious that simdjson allows that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809862168

   [This 
part](https://github.com/simdjson/simdjson/blob/master/doc/performance.md#free-padding),
 however, will require additional care:
   > For performance reasons, the simdjson library requires that the JSON input 
contain at least simdjson::SIMDJSON_PADDING bytes at the end of the stream. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-16 Thread via GitHub


pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809849645

   I agree we need to do something about our RapidJSON dependency. simdjson is 
a reasonable contender. We should just have to check it doesn't reduce the 
current platform compatibility (especially when SIMD isn't available/supported 
by simdjson).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-11 Thread via GitHub


lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2797380908

   >  I seem to remember some past issues with homebrew packaging with respect 
to buildtime vs. runtime SIMD support (although I imagine simdjson has 
considered this as well!).
   
   The simdjson library uses runtime dispatching. It is currently used by 
Node.js, ClickHouse and other important systems. In turn, Node.js is part of 
several important systems.
   
   Optionally, we also support fancy C++20 features, and we will be integrating 
C++26 features (e.g., static reflection and the like) as soon as possible.
   
   
   Suggested reference: 
   - [On-Demand JSON: A Better Way to Parse 
Documents?](https://arxiv.org/abs/2312.17149) Software: Practice and Experience 
54 (6), 2024
   
   Documentation: https://github.com/simdjson/simdjson/blob/master/doc/basics.md
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-11 Thread via GitHub


paleolimbot commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2797190801

   For what it's worth, I think DuckDB uses yyjson, although I don't know if 
simdjson was considered. I seem to remember some past issues with homebrew 
packaging with respect to buildtime vs. runtime SIMD support (although I 
imagine simdjson has considered this as well!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-10 Thread via GitHub


kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2795848983

   Let's try simdjson for better maintainability!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]

2025-04-10 Thread via GitHub


wgtmac commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-2795796444

   I just want to revive this discussion again. 
https://github.com/apache/arrow/pull/45459 will make `RapidJson` as a required 
dependency to parquet. However, RapidJson has been poorly maintained since 
2016: https://github.com/Tencent/rapidjson/issues/2321. It also seems to be a 
block to support CMake 4: https://github.com/apache/arrow/issues/45985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]