Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
lemire commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-3262366875 We will be presenting our recent work at cppcon 2025: https://cppcon2025.sched.com/event/27bQx/reflection-based-json-in-c++-at-gigabytes-per-second -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
WillAyd commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-3114230912 I've started looking at this, but found that SIMDJSON has very little support for serialization (see the upstream [discussion](https://github.com/simdjson/simdjson/discussions/2086)) It looks like some initial work to implement serialization support was merged just [last week](https://github.com/simdjson/simdjson/pull/2282), but that requires C++26 and hasn't made its way into an official release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
lemire commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2810028915 @pitrou See PR https://github.com/simdjson/simdjson/pull/2365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
lemire commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809987618 @pitrou Point taken. You are correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
pitrou commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809985297 > The simdjson library is used by major systems like the Node.js JavaScript runtime, [ClickHouse](https://github.com/ClickHouse/ClickHouse), [StarRocks](https://github.com/StarRocks/starrocks), [Velox](https://velox-lib.io), and so forth. It supports 32-bit systems, big endian systems, and so forth. I see, thanks. You might want to document platform support, because [https://github.com/simdjson/simdjson/blob/master/doc/basics.md#requirements] seems a bit more restrictive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
lemire commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809976587 @pitrou > it doesn't reduce the current platform compatibility (...) we currently need to be compatible with 32-bit systems. It's not obvious that simdjson allows that. The simdjson library is used by major systems like the Node.js JavaScript runtime, [ClickHouse](https://github.com/ClickHouse/ClickHouse), [StarRocks](https://github.com/StarRocks/starrocks), [Velox](https://velox-lib.io), and so forth. It supports 32-bit systems, big endian systems, and so forth. Node.js code sample: https://github.com/nodejs/node/blob/af85f3f169b5ce151ad4662cc74775e8c2973e1d/src/node_config_file.cc#L168-L188 Should you want to consider simdjson, the core simdjson will be happy to provide support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
pitrou commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809866592 And, by the way, we currently need to be compatible with 32-bit systems. It's not obvious that simdjson allows that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
pitrou commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809862168 [This part](https://github.com/simdjson/simdjson/blob/master/doc/performance.md#free-padding), however, will require additional care: > For performance reasons, the simdjson library requires that the JSON input contain at least simdjson::SIMDJSON_PADDING bytes at the end of the stream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
pitrou commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2809849645 I agree we need to do something about our RapidJSON dependency. simdjson is a reasonable contender. We should just have to check it doesn't reduce the current platform compatibility (especially when SIMD isn't available/supported by simdjson). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
lemire commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2797380908 > I seem to remember some past issues with homebrew packaging with respect to buildtime vs. runtime SIMD support (although I imagine simdjson has considered this as well!). The simdjson library uses runtime dispatching. It is currently used by Node.js, ClickHouse and other important systems. In turn, Node.js is part of several important systems. Optionally, we also support fancy C++20 features, and we will be integrating C++26 features (e.g., static reflection and the like) as soon as possible. Suggested reference: - [On-Demand JSON: A Better Way to Parse Documents?](https://arxiv.org/abs/2312.17149) Software: Practice and Experience 54 (6), 2024 Documentation: https://github.com/simdjson/simdjson/blob/master/doc/basics.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
paleolimbot commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2797190801 For what it's worth, I think DuckDB uses yyjson, although I don't know if simdjson was considered. I seem to remember some past issues with homebrew packaging with respect to buildtime vs. runtime SIMD support (although I imagine simdjson has considered this as well!). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
kou commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2795848983 Let's try simdjson for better maintainability! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] [C++] can we use simdjson to replace rapidjson [arrow]
wgtmac commented on issue #35460: URL: https://github.com/apache/arrow/issues/35460#issuecomment-2795796444 I just want to revive this discussion again. https://github.com/apache/arrow/pull/45459 will make `RapidJson` as a required dependency to parquet. However, RapidJson has been poorly maintained since 2016: https://github.com/Tencent/rapidjson/issues/2321. It also seems to be a block to support CMake 4: https://github.com/apache/arrow/issues/45985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
