Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Jules Damji
Yes, indeed, one or two LLM.txt index manifest wouldn’t hurt, especially if it facilitates LLM searches. Though not at standard yet, but it’s gaining attention: https://directory.llmstxt.cloud/Cheers Jules —Sent from my iPhonePardon the dumb thumb typos :)On Sep 10, 2025, at 4:11 PM, Hyukjin Kwon

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Dongjoon Hyun
Thank you, Allison and Hyukjin. IIUC, this proposal is not about a single file. SPIP already exposes multiple files which may increase our documentation and website size twice (or more in the worst case) because it's simply a duplication of the content. If we start to use AI tools to generate thes

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Hyukjin Kwon
I am +1 if we're sure that it's adding one or only a few files, On Thu, 11 Sept 2025 at 06:53, Denny Lee wrote: > While it is not standard per se, it is quickly becoming a common > approach. And as you noted per MCP site, they have the llms-full.txt, they > also have > https://modelcontextproto

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Bjørn Jørgensen
The protocol for this llms.txt is not a standard yet. "*To clarify, llms.txt is not meant to be a duplication of the full documentation.*" Some like the Model Context Protocol (MCP) site have their full web page in the llms page. h

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Allison Wang
Thanks Dongjoon for raising these concerns. I agree with your point that it’s worth making the lightweight manifest scope explicit in the SPIP so we have a systematic guarantee it stays small (under 10MB). To clarify, llms.txt is not meant to be a duplication of the full documentation. Instead, it

Re: [DISCUSS] SPIP: Add llms.txt files to Spark Documentation

2025-09-10 Thread Wenchen Fan
This should just be a llm-facing index page of Spark docs? Given the amount of APIs Spark provides today, I think this index page should be useful to humans as well. On Wed, Sep 10, 2025 at 10:46 PM Dongjoon Hyun wrote: > Thank you, Allison and Hyukjin. > > IIUC, this proposal is not about a sin

Re: [DISCUSS] Data Type framework

2025-09-10 Thread serge rielau . com
I think this is a great idea. There is a signifcant backlog of types which should be added: E.g TIMESTAMP(9), TIMESTAMP WITH TIME ZONE, TIME WITH TIMEZONE, some sort of big decimal to name a few). Making these more "plug and play" is goodness. +1 On Sep 10, 2025, at 1:22 PM, Max Gekk wrote: H

[DISCUSS] Data Type framework

2025-09-10 Thread Max Gekk
Hi All, I would like to propose refactoring of internal operations over Catalyst's data types. In the current implementation, data types are handled in an adhoc manner, and processing logic is dispersed across the entire code base. There are more than 100 places where every data type is pattern m

Re: [DISCUSS] Release Apache Spark 3.5.7

2025-09-10 Thread Max Gekk
+1 On Wed, Sep 10, 2025 at 8:13 AM Shaoyun Chen wrote: > +1 > > SPARK-46941[1] also fixed an issue with incorrect results. > > 1. https://issues.apache.org/jira/browse/SPARK-46941 > > Yang Jie 于2025年9月10日周三 11:49写道: > > > > +1 > > > > On 2025/09/10 02:32:29 Wenchen Fan wrote: > > > +1 > > > > >