Does the new ores-legacy support the same feature set.  E.g. features
output, injection, and threshold optimizations.  Or is it just prediction?
This will affect some of the systems I need to migrate.

On Fri, Sep 22, 2023, 06:21 Ilias Sarantopoulos <
isarantopou...@wikimedia.org> wrote:

> Hello!
>
>
> As a next step in the deprecation process of ORES
> https://wikitech.wikimedia.org/wiki/ORES the Machine Learning team will
> switch the backend of ores.wikimedia.org to ores-legacy, a k8s
> application meant to provide a compatibility layer between ORES and Lift
> Wing so users that have not yet migrated to Lift Wing will be
> transparently migrated. Ores-legacy is an application that has the same API
> as ORES but in the background makes requests to Lift Wing, allowing us to
> decommission the ORES servers until all clients have moved.
>
> This change is planned to take place on Monday 25th of September. If you
> have a client/application that is still using ORES we expect that this
> switch is going to be transparent for you.
>
> However keep in mind that ores-legacy is not a 100% replacement for ORES
> as some old and unused features are no longer supported.
>
> If you see anything out of the ordinary, feel free to contact the Machine
> Learning team:
>
> IRC libera: #wikimedia-ml
>
> Phabricator: Machine-Learning-team tag
>
> Thank you!
>
>
> On Wed, Aug 9, 2023 at 1:22 PM Chaloemphon Praphuchakang <
> yoshrakpra...@gmail.com> wrote:
>
>>
>> On Tue, 8 Aug 2023, 10:45 Tilman Bayer, <haebw...@gmail.com> wrote:
>>
>>>
>>> Hi Chris,
>>>
>>> On Mon, Aug 7, 2023 at 11:51 AM Chris Albon <cal...@wikimedia.org>
>>> wrote:
>>>
>>>> Hi Tilman,
>>>>
>>>> Most of the work is still very experimental. We have hosted a few LLMs
>>>> on Lift Wing already (StarCoder for example) but they were just running on
>>>> CPU, far too slow for real use cases. But it proves that we can easily host
>>>> LLMs on Lift Wing. We have been pretty quiet about it while we focus on the
>>>> ORES migration, but it is our next big project. More soon hopefully!
>>>>
>>> Understood. Looking forward to learning more later!
>>>
>>>
>>>> Where we are now is that we have budget for a big GPU purchase (~10-20
>>>> GPUs depending on cost), the question we will try to answer after the ORES
>>>> migration is complete is: what GPUs should we purchase? We are trying to
>>>> balance our strong preference to stay open source (i.e. AMD mROC) in a
>>>> world dominated by a single closed source vendor (i.e. Nvidia). In
>>>> addition, do we go for a few expensive GPUs better suited to LLMs (A1000,
>>>> H100, etc) or a mix of big and small? We will need to figure out all this.
>>>>
>>> I see. On that matter, what do you folks make of the recent
>>> announcements of AMD's partnerships with Hugging Face and Pytorch[5]?
>>> (which, I understand, came after the ML team had already launched the
>>> aforementioned new AMD explorations)
>>>
>>> "Open-source AI: AMD looks to Hugging Face and Meta spinoff PyTorch to
>>> take on Nvidia [...]
>>> Both partnerships involve AMD’s ROCm AI software stack, the company’s
>>> answer to Nvidia’s proprietary CUDA platform and application-programming
>>> interface. AMD called ROCm an open and portable AI system with
>>> out-of-the-box support that can port to existing AI models. [...B]oth AMD
>>> and Hugging Face are dedicating engineering resources to each other and
>>> sharing data to ensure that the constantly updated AI models from Hugging
>>> Face, which might not otherwise run well on AMD hardware, would be
>>> “guaranteed” to work on hardware like the MI300X. [...] AMD said PyTorch
>>> will fully upstream the ROCm software stack and “provide immediate ‘day
>>> zero’ support for PyTorch 2.0 with ROCm release 5.4.2 on all AMD Instinct
>>> accelerators,” which is meant to appeal to those customers looking to
>>> switch from Nvidia’s software ecosystem."
>>>
>>>
>>> In their own announcement, Hugging Face offered further details,
>>> including a pretty impressive list of models to be supported:[6]
>>>
>>>
>>> "We intend to support state-of-the-art transformer architectures for
>>> natural language processing, computer vision, and speech, such as BERT,
>>> DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course,
>>> generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT,
>>> LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also
>>> support more traditional computer vision models, like ResNet and ResNext,
>>> and deep learning recommendation models, a first for us. [..] We'll do our
>>> best to test and validate these models for PyTorch, TensorFlow, and ONNX
>>> Runtime for the above platforms. [...] We will integrate the AMD ROCm SDK
>>> seamlessly in our open-source libraries, starting with the transformers
>>> library."
>>>
>>>
>>> Do you think this may promise too much, or could it point to a possible
>>> solution of the Foundation's conundrum?
>>> In any case, this seems to be an interesting moment where many in AI are
>>> trying to move away from Nvidia's proprietary CUDA platform. Most of them
>>> probably more for financial and availability reasons though, given the
>>> current GPU shortages[7] (which the ML team is undoubtedly aware of
>>> already; mentioning this as context for others on this list. See also
>>> Marketwatch's remarks about current margins[5]).
>>>
>>> Regards, Tilman
>>>
>>>
>>> [5]
>>> https://archive.ph/2023.06.15-173527/https://www.marketwatch.com/amp/story/open-source-ai-amd-looks-to-hugging-face-and-meta-spinoff-pytorch-to-take-on-nvidia-e4738f87
>>> [6] https://huggingface.co/blog/huggingface-and-amd
>>> [7] See e.g.
>>> https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/ (avoid
>>> playing the song though. Don't say I didn't warn you)
>>>
>>>
>>>> I wouldn't characterize WMF's Language Team using CPU as because of
>>>> AMD, rather at the time we didn't have the budget for GPUs so Lift Wing
>>>> didn't have any. Since then we have moved two GPUs onto Lift Wing for
>>>> testing but they are pretty old (2017ish). Once we make the big GPU
>>>> purchase Lift Wing will gain a lot of functionality for LLM and similar
>>>> models.
>>>>
>>>> Chris
>>>>
>>>> On Sun, Aug 6, 2023 at 9:57 PM Tilman Bayer <haebw...@gmail.com> wrote:
>>>>
>>>>> On Thu, Aug 3, 2023 at 7:16 AM Chris Albon <cal...@wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> TL;DR We would like users of ORES models to migrate to our new open
>>>>>> source ML infrastructure, Lift Wing, within the next five months. We are
>>>>>> available to help you do that, from advice to making code commits. It is
>>>>>> important to note: All ML models currently accessible on ORES are also
>>>>>> currently accessible on Lift Wing.
>>>>>>
>>>>>> As part of the Machine Learning Modernization Project (
>>>>>> https://www.mediawiki.org/wiki/Machine_Learning/Modernization), the
>>>>>> Machine Learning team has deployed a Wikimedia’s new machine learning
>>>>>> inference infrastructure, called Lift Wing (
>>>>>> https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing). Lift
>>>>>> Wing brings a lot of new features such as support for GPU-based models,
>>>>>> open source LLM hosting, auto-scaling, stability, and ability to host a
>>>>>> larger number of models.
>>>>>>
>>>>>
>>>>> This sounds quite exciting! What's the best place to read up on that
>>>>> planned support for GPU-based models and open source LLMs? (I also saw in
>>>>> the recent NYT article[1] that the team is "in the process of adapting 
>>>>> A.I.
>>>>> models that are 'off the shelf; — essentially models that have been made
>>>>> available by researchers for anyone to freely customize — so that
>>>>> Wikipedia’s editors can use them for their work.")
>>>>>
>>>>> I'm aware of the history[2] of not being able to use NVIDIA GPUs due
>>>>> to their CUDA drivers being proprietary. It was mentioned recently in the
>>>>> Wikimedia AI Telegram group that this is still a serious limitation,
>>>>> despite some new explorations with AMD GPUs[3] - to the point that e.g. 
>>>>> the
>>>>> WMF's Language team has resorted to using models without GPU support (CPU
>>>>> only).[4]
>>>>> It sounds like there is reasonable hope that this situation could
>>>>> change fairly soon? Would it also mean both at the same time, i.e. open
>>>>> source LLMs running with GPU support (considering that at least some
>>>>> well-known ones appear to require torch.cuda.is_available() == True for
>>>>> that)?
>>>>>
>>>>> Regards, Tilman
>>>>>
>>>>> [1]
>>>>> https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html
>>>>> [2]
>>>>> https://techblog.wikimedia.org/2020/04/06/saying-no-to-proprietary-code-in-production-is-hard-work-the-gpu-chapter/
>>>>> [3] https://phabricator.wikimedia.org/T334583 etc.
>>>>> [4]
>>>>> https://diff.wikimedia.org/2023/06/13/mint-supporting-underserved-languages-with-open-machine-translation/
>>>>> or https://thottingal.in/blog/2023/07/21/wikiqa/ (experimental but, I
>>>>> understand, written to be deployable on WMF infrastructure)
>>>>>
>>>>>
>>>>>>
>>>>>> With the creation of Lift Wing, the team is turning its attention to
>>>>>> deprecating the current machine learning infrastructure, ORES. ORES 
>>>>>> served
>>>>>> us really well over the years, it was a successful project but it came
>>>>>> before radical changes in technology like Docker, Kubernetes and more
>>>>>> recently MLOps. The servers that run ORES are at the end of their planned
>>>>>> lifespan and so to save cost we are going to shut them down in early 
>>>>>> 2024.
>>>>>>
>>>>>> We have outlined a deprecation path on Wikitech (
>>>>>> https://wikitech.wikimedia.org/wiki/ORES), please read the page if
>>>>>> you are a maintainer of a tool or code that uses the ORES endpoint
>>>>>> https://ores.wikimedia.org/). If you have any doubt or if you need
>>>>>> assistance in migrating to Lift Wing, feel free to contact the ML team 
>>>>>> via:
>>>>>>
>>>>>> - Email: m...@wikimedia.org
>>>>>> - Phabricator: #Machine-Learning-Team tag
>>>>>> - IRC (Libera): #wikimedia-ml
>>>>>>
>>>>>> The Machine Learning team is available to help projects migrate, from
>>>>>> offering advice to making code commits. We want to make this as easy as
>>>>>> possible for folks.
>>>>>>
>>>>>> High Level timeline:
>>>>>>
>>>>>> **By September 30th 2023: *Infrastructure powering the ORES API
>>>>>> endpoint will be migrated from ORES to Lift Wing. For users, the API
>>>>>> endpoint will remain the same, and most users won’t notice any change.
>>>>>> Rather just the backend services powering the endpoint will change.
>>>>>>
>>>>>> Details: We'd like to add a DNS CNAME that points ores.wikimedia.org
>>>>>> to ores-legacy.wikimedia.org, a new endpoint that offers a almost
>>>>>> complete replacement of the ORES API calling Lift Wing behind the scenes.
>>>>>> In an ideal world we'd migrate all tools to Lift Wing before
>>>>>> decommissioning the infrastructure behind ores.wikimedia.org, but it
>>>>>> turned out to be really challenging so to avoid disrupting users we chose
>>>>>> to implement a transition layer/API.
>>>>>>
>>>>>> To summarize, if you don't have time to migrate before September to
>>>>>> Lift Wing, your code/tool should work just fine on
>>>>>> ores-legacy.wikimedia.org and you'll not have to change a line in
>>>>>> your code thanks to the DNS CNAME. The ores-legacy endpoint is not a 100%
>>>>>> replacement for ores, we removed some very old and not used features, so 
>>>>>> we
>>>>>> highly recommend at least test the new endpoint for your use case to 
>>>>>> avoid
>>>>>> surprises when we'll make the switch. In case you find anything weird,
>>>>>> please report it to us using the aforementioned channels.
>>>>>>
>>>>>> **September to January: *We will be reaching out to every user of
>>>>>> ORES we can identify and working with them to make the migration process 
>>>>>> as
>>>>>> easy as possible.
>>>>>>
>>>>>> **By January 2024: *If all goes well, we would like zero traffic on
>>>>>> the ORES API endpoint so we can turn off the ores-legacy API.
>>>>>>
>>>>>> If you want more information about Lift Wing, please check
>>>>>> https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing
>>>>>>
>>>>>> Thanks in advance for the patience and the help!
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> The Machine Learning Team
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>>>>>
>>>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>>>>
>>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>>>
>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to