[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091901#comment-14091901 ]
Ivan Mitic commented on YARN-2198: ---------------------------------- Thanks again Remus for working on this big and important piece of work. I went ~70% thru the patch and below is my first pass of comments. Will review the rest in a day or two. 1. nativeio.c: {code} #ifdef UNIX THROW(env, "java/io/IOException", "The function createTaskAsUser is not supported on Unix"); return -1; #endif {code} Should we return null here? 2. {code} LPCWSTR lpszCwd = NULL, lpszJobName = NULL, lpszUser = NULL, lpszPidFile = NULL, lpszCmdLine = NULL; {code} Nit: nativeio code uses different naming convention for local variables. Please try to be consistent with the rest of the file. 3. nativeio.c: {code} done: if (lpszCwd) (*env)->ReleaseStringChars(env, cwd, lpszCwd); if (lpszJobName) (*env)->ReleaseStringChars(env, jobName, lpszJobName); if (lpszUser) (*env)->ReleaseStringChars(env, user, lpszUser); if (lpszPidFile) (*env)->ReleaseStringChars(env, pidFile, lpszPidFile); if (lpszCmdLine) (*env)->ReleaseStringChars(env, cmdLine, lpszCmdLine); if (dwError != ERROR_SUCCESS) { throw_ioe (env, dwError); } {code} Nit: I would move {{throw_ioe}} if check before done:, the code flow will be less error prone :) 4. winutils_process_stub.c: Can {{env->NewGlobalRef())) return null/throw? Should we handle this? 5. winutils_process_stub.c: You should properly handle the {{GetExitCodeProcess()}} failure case. 6. winutils_process_stub.c: {code} HANDLE hProcess, hThread; {code} Init to INVALID_HANDLE_VALUE? 7. client.c: Are RPC_STATUS error codes compatible with winerror codes? (semantic around checking for error) 8. config.cpp: {code} #define YARN_SITE_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-site.xml" #define YARN_DEFAULT_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-default.xml" {code} Wondering if there is a way to get to config files without adding a dependency on env variables? We are looking into how to to make hadoop xcopy deployable and evn variables are a pain to handle. One way to mitigate is to try to resolve the path relatively and if that fails fall back to HADOOP_CONF_DIR. 9. config.cpp: GetConfigValue: {code} if (*len) { goto done; } {code} This error check is unintuitive. Can you please be more explicit? 10. config.cpp: {code} DWORD GetConfigValue(__in LPCWSTR keyName, __out size_t* len, __out_bcount(len) LPCWSTR* value) { {code} Are SAL annotations correct? For strings one would usually use __out_ecount()? 11. config.cpp: {code} DWORD GetConfigValueFromXmlFile(__in LPCWSTR xmlFile, __in LPCWSTR keyName, __out size_t* outLen, __out_bcount(len) LPCWSTR* outValue) { {code} SAL annotation {{__out_bcount}}? Also outLen->len in the annotation. 11. config.cpp: GetConfigValueFromXmlFile: {code} hr = CoInitialize(NULL); ERROR_CHECK_HRESULT_DONE(hr, L"CoInitialize"); {code} This should be before StringCbPrintf to guarantee that CoInit and CoUninit are balanced. 12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly name it just winutilsvc.idl. Should we use spaces in this file for consistency? 13. winutils.h: {code} DWORD SplitStringIgnoreSpaceW(__in size_t len, __in_bcount(len) LPCWSTR source, __in WCHAR deli, __out size_t* count, __out_ecount(count) WCHAR*** out); {code} __in_bcount(len) -> __in_ecount(len) 14. libwinutils.c: SplitStirngIgnoreSpaceW: Looking at this function (and some other xml parsing functions) I'm wondering if this is good opportunity to introduce unittests for our C code, as the complexity started increasing beyond just windows OS calls, where there is little value in unittesting. 15. libwinutils.c: BuildServiceSecurityDescriptor: {code} eas = (EXPLICIT_ACCESS*) alloca(sizeof(EXPLICIT_ACCESS) * (grantSidCount + denySidCount)); {code} Should we deallocate this when BuildSecurityDescriptor fails? I don't think it is required to do this now, just wanted to bring it up: if our native codebase continues to grow at this pace we should consider introducing smart pointers. It is becoming impossibly hard to properly manage the memory in all success/failure cases. This becomes more important now that we have long running NM native client and winutils service. 16. What is the behaviour of calling {{winutils service}}. Will this command install and start a winutils.exe service under SYSTEM account, and exit? > Remove the need to run NodeManager as privileged account for Windows Secure > Container Executor > ---------------------------------------------------------------------------------------------- > > Key: YARN-2198 > URL: https://issues.apache.org/jira/browse/YARN-2198 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-2198.1.patch, YARN-2198.2.patch > > > YARN-1972 introduces a Secure Windows Container Executor. However this > executor requires a the process launching the container to be LocalSystem or > a member of the a local Administrators group. Since the process in question > is the NodeManager, the requirement translates to the entire NM to run as a > privileged account, a very large surface area to review and protect. > This proposal is to move the privileged operations into a dedicated NT > service. The NM can run as a low privilege account and communicate with the > privileged NT service when it needs to launch a container. This would reduce > the surface exposed to the high privileges. > There has to exist a secure, authenticated and authorized channel of > communication between the NM and the privileged NT service. Possible > alternatives are a new TCP endpoint, Java RPC etc. My proposal though would > be to use Windows LPC (Local Procedure Calls), which is a Windows platform > specific inter-process communication channel that satisfies all requirements > and is easy to deploy. The privileged NT service would register and listen on > an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop > with libwinutils which would host the LPC client code. The client would > connect to the LPC port (NtConnectPort) and send a message requesting a > container launch (NtRequestWaitReplyPort). LPC provides authentication and > the privileged NT service can use authorization API (AuthZ) to validate the > caller. -- This message was sent by Atlassian JIRA (v6.2#6252)