Does the new ores-legacy support the same feature set. E.g. features output, injection, and threshold optimizations. Or is it just prediction? This will affect some of the systems I need to migrate.
On Fri, Sep 22, 2023, 06:21 Ilias Sarantopoulos < isarantopou...@wikimedia.org> wrote: > Hello! > > > As a next step in the deprecation process of ORES > https://wikitech.wikimedia.org/wiki/ORES the Machine Learning team will > switch the backend of ores.wikimedia.org to ores-legacy, a k8s > application meant to provide a compatibility layer between ORES and Lift > Wing so users that have not yet migrated to Lift Wing will be > transparently migrated. Ores-legacy is an application that has the same API > as ORES but in the background makes requests to Lift Wing, allowing us to > decommission the ORES servers until all clients have moved. > > This change is planned to take place on Monday 25th of September. If you > have a client/application that is still using ORES we expect that this > switch is going to be transparent for you. > > However keep in mind that ores-legacy is not a 100% replacement for ORES > as some old and unused features are no longer supported. > > If you see anything out of the ordinary, feel free to contact the Machine > Learning team: > > IRC libera: #wikimedia-ml > > Phabricator: Machine-Learning-team tag > > Thank you! > > > On Wed, Aug 9, 2023 at 1:22 PM Chaloemphon Praphuchakang < > yoshrakpra...@gmail.com> wrote: > >> >> On Tue, 8 Aug 2023, 10:45 Tilman Bayer, <haebw...@gmail.com> wrote: >> >>> >>> Hi Chris, >>> >>> On Mon, Aug 7, 2023 at 11:51 AM Chris Albon <cal...@wikimedia.org> >>> wrote: >>> >>>> Hi Tilman, >>>> >>>> Most of the work is still very experimental. We have hosted a few LLMs >>>> on Lift Wing already (StarCoder for example) but they were just running on >>>> CPU, far too slow for real use cases. But it proves that we can easily host >>>> LLMs on Lift Wing. We have been pretty quiet about it while we focus on the >>>> ORES migration, but it is our next big project. More soon hopefully! >>>> >>> Understood. Looking forward to learning more later! >>> >>> >>>> Where we are now is that we have budget for a big GPU purchase (~10-20 >>>> GPUs depending on cost), the question we will try to answer after the ORES >>>> migration is complete is: what GPUs should we purchase? We are trying to >>>> balance our strong preference to stay open source (i.e. AMD mROC) in a >>>> world dominated by a single closed source vendor (i.e. Nvidia). In >>>> addition, do we go for a few expensive GPUs better suited to LLMs (A1000, >>>> H100, etc) or a mix of big and small? We will need to figure out all this. >>>> >>> I see. On that matter, what do you folks make of the recent >>> announcements of AMD's partnerships with Hugging Face and Pytorch[5]? >>> (which, I understand, came after the ML team had already launched the >>> aforementioned new AMD explorations) >>> >>> "Open-source AI: AMD looks to Hugging Face and Meta spinoff PyTorch to >>> take on Nvidia [...] >>> Both partnerships involve AMD’s ROCm AI software stack, the company’s >>> answer to Nvidia’s proprietary CUDA platform and application-programming >>> interface. AMD called ROCm an open and portable AI system with >>> out-of-the-box support that can port to existing AI models. [...B]oth AMD >>> and Hugging Face are dedicating engineering resources to each other and >>> sharing data to ensure that the constantly updated AI models from Hugging >>> Face, which might not otherwise run well on AMD hardware, would be >>> “guaranteed” to work on hardware like the MI300X. [...] AMD said PyTorch >>> will fully upstream the ROCm software stack and “provide immediate ‘day >>> zero’ support for PyTorch 2.0 with ROCm release 5.4.2 on all AMD Instinct >>> accelerators,” which is meant to appeal to those customers looking to >>> switch from Nvidia’s software ecosystem." >>> >>> >>> In their own announcement, Hugging Face offered further details, >>> including a pretty impressive list of models to be supported:[6] >>> >>> >>> "We intend to support state-of-the-art transformer architectures for >>> natural language processing, computer vision, and speech, such as BERT, >>> DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course, >>> generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT, >>> LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also >>> support more traditional computer vision models, like ResNet and ResNext, >>> and deep learning recommendation models, a first for us. [..] We'll do our >>> best to test and validate these models for PyTorch, TensorFlow, and ONNX >>> Runtime for the above platforms. [...] We will integrate the AMD ROCm SDK >>> seamlessly in our open-source libraries, starting with the transformers >>> library." >>> >>> >>> Do you think this may promise too much, or could it point to a possible >>> solution of the Foundation's conundrum? >>> In any case, this seems to be an interesting moment where many in AI are >>> trying to move away from Nvidia's proprietary CUDA platform. Most of them >>> probably more for financial and availability reasons though, given the >>> current GPU shortages[7] (which the ML team is undoubtedly aware of >>> already; mentioning this as context for others on this list. See also >>> Marketwatch's remarks about current margins[5]). >>> >>> Regards, Tilman >>> >>> >>> [5] >>> https://archive.ph/2023.06.15-173527/https://www.marketwatch.com/amp/story/open-source-ai-amd-looks-to-hugging-face-and-meta-spinoff-pytorch-to-take-on-nvidia-e4738f87 >>> [6] https://huggingface.co/blog/huggingface-and-amd >>> [7] See e.g. >>> https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/ (avoid >>> playing the song though. Don't say I didn't warn you) >>> >>> >>>> I wouldn't characterize WMF's Language Team using CPU as because of >>>> AMD, rather at the time we didn't have the budget for GPUs so Lift Wing >>>> didn't have any. Since then we have moved two GPUs onto Lift Wing for >>>> testing but they are pretty old (2017ish). Once we make the big GPU >>>> purchase Lift Wing will gain a lot of functionality for LLM and similar >>>> models. >>>> >>>> Chris >>>> >>>> On Sun, Aug 6, 2023 at 9:57 PM Tilman Bayer <haebw...@gmail.com> wrote: >>>> >>>>> On Thu, Aug 3, 2023 at 7:16 AM Chris Albon <cal...@wikimedia.org> >>>>> wrote: >>>>> >>>>>> Hi everybody, >>>>>> >>>>>> TL;DR We would like users of ORES models to migrate to our new open >>>>>> source ML infrastructure, Lift Wing, within the next five months. We are >>>>>> available to help you do that, from advice to making code commits. It is >>>>>> important to note: All ML models currently accessible on ORES are also >>>>>> currently accessible on Lift Wing. >>>>>> >>>>>> As part of the Machine Learning Modernization Project ( >>>>>> https://www.mediawiki.org/wiki/Machine_Learning/Modernization), the >>>>>> Machine Learning team has deployed a Wikimedia’s new machine learning >>>>>> inference infrastructure, called Lift Wing ( >>>>>> https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing). Lift >>>>>> Wing brings a lot of new features such as support for GPU-based models, >>>>>> open source LLM hosting, auto-scaling, stability, and ability to host a >>>>>> larger number of models. >>>>>> >>>>> >>>>> This sounds quite exciting! What's the best place to read up on that >>>>> planned support for GPU-based models and open source LLMs? (I also saw in >>>>> the recent NYT article[1] that the team is "in the process of adapting >>>>> A.I. >>>>> models that are 'off the shelf; — essentially models that have been made >>>>> available by researchers for anyone to freely customize — so that >>>>> Wikipedia’s editors can use them for their work.") >>>>> >>>>> I'm aware of the history[2] of not being able to use NVIDIA GPUs due >>>>> to their CUDA drivers being proprietary. It was mentioned recently in the >>>>> Wikimedia AI Telegram group that this is still a serious limitation, >>>>> despite some new explorations with AMD GPUs[3] - to the point that e.g. >>>>> the >>>>> WMF's Language team has resorted to using models without GPU support (CPU >>>>> only).[4] >>>>> It sounds like there is reasonable hope that this situation could >>>>> change fairly soon? Would it also mean both at the same time, i.e. open >>>>> source LLMs running with GPU support (considering that at least some >>>>> well-known ones appear to require torch.cuda.is_available() == True for >>>>> that)? >>>>> >>>>> Regards, Tilman >>>>> >>>>> [1] >>>>> https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html >>>>> [2] >>>>> https://techblog.wikimedia.org/2020/04/06/saying-no-to-proprietary-code-in-production-is-hard-work-the-gpu-chapter/ >>>>> [3] https://phabricator.wikimedia.org/T334583 etc. >>>>> [4] >>>>> https://diff.wikimedia.org/2023/06/13/mint-supporting-underserved-languages-with-open-machine-translation/ >>>>> or https://thottingal.in/blog/2023/07/21/wikiqa/ (experimental but, I >>>>> understand, written to be deployable on WMF infrastructure) >>>>> >>>>> >>>>>> >>>>>> With the creation of Lift Wing, the team is turning its attention to >>>>>> deprecating the current machine learning infrastructure, ORES. ORES >>>>>> served >>>>>> us really well over the years, it was a successful project but it came >>>>>> before radical changes in technology like Docker, Kubernetes and more >>>>>> recently MLOps. The servers that run ORES are at the end of their planned >>>>>> lifespan and so to save cost we are going to shut them down in early >>>>>> 2024. >>>>>> >>>>>> We have outlined a deprecation path on Wikitech ( >>>>>> https://wikitech.wikimedia.org/wiki/ORES), please read the page if >>>>>> you are a maintainer of a tool or code that uses the ORES endpoint >>>>>> https://ores.wikimedia.org/). If you have any doubt or if you need >>>>>> assistance in migrating to Lift Wing, feel free to contact the ML team >>>>>> via: >>>>>> >>>>>> - Email: m...@wikimedia.org >>>>>> - Phabricator: #Machine-Learning-Team tag >>>>>> - IRC (Libera): #wikimedia-ml >>>>>> >>>>>> The Machine Learning team is available to help projects migrate, from >>>>>> offering advice to making code commits. We want to make this as easy as >>>>>> possible for folks. >>>>>> >>>>>> High Level timeline: >>>>>> >>>>>> **By September 30th 2023: *Infrastructure powering the ORES API >>>>>> endpoint will be migrated from ORES to Lift Wing. For users, the API >>>>>> endpoint will remain the same, and most users won’t notice any change. >>>>>> Rather just the backend services powering the endpoint will change. >>>>>> >>>>>> Details: We'd like to add a DNS CNAME that points ores.wikimedia.org >>>>>> to ores-legacy.wikimedia.org, a new endpoint that offers a almost >>>>>> complete replacement of the ORES API calling Lift Wing behind the scenes. >>>>>> In an ideal world we'd migrate all tools to Lift Wing before >>>>>> decommissioning the infrastructure behind ores.wikimedia.org, but it >>>>>> turned out to be really challenging so to avoid disrupting users we chose >>>>>> to implement a transition layer/API. >>>>>> >>>>>> To summarize, if you don't have time to migrate before September to >>>>>> Lift Wing, your code/tool should work just fine on >>>>>> ores-legacy.wikimedia.org and you'll not have to change a line in >>>>>> your code thanks to the DNS CNAME. The ores-legacy endpoint is not a 100% >>>>>> replacement for ores, we removed some very old and not used features, so >>>>>> we >>>>>> highly recommend at least test the new endpoint for your use case to >>>>>> avoid >>>>>> surprises when we'll make the switch. In case you find anything weird, >>>>>> please report it to us using the aforementioned channels. >>>>>> >>>>>> **September to January: *We will be reaching out to every user of >>>>>> ORES we can identify and working with them to make the migration process >>>>>> as >>>>>> easy as possible. >>>>>> >>>>>> **By January 2024: *If all goes well, we would like zero traffic on >>>>>> the ORES API endpoint so we can turn off the ores-legacy API. >>>>>> >>>>>> If you want more information about Lift Wing, please check >>>>>> https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing >>>>>> >>>>>> Thanks in advance for the patience and the help! >>>>>> >>>>>> Regards, >>>>>> >>>>>> The Machine Learning Team >>>>>> _______________________________________________ >>>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org >>>>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org >>>>>> >>>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>>>> >>>>> _______________________________________________ >>>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org >>>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org >>>>> >>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>>> >>>> _______________________________________________ >>>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org >>>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org >>>> >>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>> >>> _______________________________________________ >>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org >>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org >>> >>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >> >> _______________________________________________ >> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org >> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org >> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ > > _______________________________________________ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/