TutorialsPoint Unix/Linux 系统调用参考指南
来源:易百教程
Unix/Linux系统调用™
开始学习 >> :accept()函数 Unix/Linux
accept()函数
名称
accept - 接受连接套接字上
内容简介
#include <sys/types.h> int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); |
描述说明
accept()系统调用用于基于连接的套接字类型(SOCK_STREAM,SOCK_SEQPACKET)。提取完成连接队列中的第一个连接请求,创建一个新的连接套接字,并返回一个新的文件描述符,指该套接字。新创建的套接字处于监听状态。原始套接字 sockfd 不受此调用。
参数 sockfd 是一个套接字绑定到本地地址 bind(2) socket(2),已创建侦听连接后 listen(2)。
参数addr是一个指向结构sockaddr。被填充在此结构的对等套接字的地址,作为已知的通信层。地址返回 addr 的确切格式由套接字的地址族(参见socket(2)及相应协议的手册页)。
addrlen 参数是一个值结果参数:最初它应该包含大小addr所指向的结构,在函数返回时将包含实际的长度(以字节为单位)返回的地址。当没有填写addr是NULL。
如果没有挂起的连接队列,并没有被标记为非阻塞套接字,accept() 将阻塞,直到建立连接。如果套接字被标记无阻塞,没有未完成连接队列上,accept() 失败,并出现错误EAGAIN。
为了通知传入连接在套接字上,那么可以使用select(2)或 orpoll(2)。当尝试一个新的连接,然后可以调用accept() 获取套接字,连接一个可读事件将被传递。另外,您还可以设置套接字提供SIGIO活动发生在一个socket时,详情参见socket(7)。
需要一个明确的确认,如 DECNET 对于某些协议,accept() 可以被看作是仅仅从队列中取出下一个连接请求,不意味着确认。确认可以正常的读或写上新的文件描述符,暗示和排斥反应,可通过关闭新的套接字暗示。目前只有DECNet有这样的Linux上的语义。
注意
可能并不总是等待一个连接后 SIGIO 交付 select(2) 或 poll(2) 因为连接可能已被删除,被称为异步网络错误或另一个线程 accept() 返回一个可读性事件。如果发生这种情况,那么调用将阻塞等待下一个连接到达。
为了确保 accept() 从未阻塞,通过套接字sockfd中需要有O_NONBLOCK标志设置(参见socket(7))。
返回值
如果成功,accept()返回一个非负的整数,这是一个接受套接字描述符。上的错误,则返回-1,errno设置为合适。
错误处理
Linux 的 accept() 传递已经挂起的网络错误,在新的socket accept() 错误代码。此行为不同于其他的BSD套接字实现。对于可靠运行的应用程序应该检测网络错误定义的协议后accept() ,并把它们像EAGAIN重试。在这些情况下,TCP/ IP是ENETDOWN ENOPROTOOPT EPROTO,EHOSTDOWN,ENONET,EHOSTUNREACH,EOPNOTSUPP,和ENETUNREACH的。
错误
accept()可能失败如下:
标签 | 描述 |
EAGAINorEWOULDBLOCK | The socket is marked non-blocking and no connections are present to be accepted. |
EBADF | The descriptor is invalid. |
ECONNABORTED | A connection has been aborted. |
EINTR | The system call was interrupted by a signal that was caught before a valid connection arrived. |
EINVAL | Socket is not listening for connections, or addrlen is invalid (e.g., is negative). |
EMFILE | The per-process limit of open file descriptors has been reached. |
ENFILE | The system limit on the total number of open files has been reached. |
ENOTSOCK | The descriptor references a file, not a socket. |
EOPNOTSUPP | The referenced socket is not of typeSOCK_STREAM. |
accept() 可能会失败,如下:
标签 | 描述 |
EFAULT | The addr argument is not in a writable part of the user address space. |
ENOBUFS, ENOMEM | Not enough free memory. This often means that the memory allocation is limited by the socket bufferlimits, not by the system memory. |
EPROTO | Protocol error. |
Linux accept() 可能会失败,如下:
标签 | 描述 |
EPERM | Firewall rules forbid connection. |
此外,新的套接字的协议所定义的网络错误可能被返回。各种 Linux 内核可以返回其他错误,如ENOSR ESOCKTNOSUPPORT,EPROTONOSUPPORT ETIMEDOUT。在跟踪过程中,可能会出现值ERESTARTSYS。
遵循于
SVr4, 4.4BSD (accept() first appeared in 4.2BSD).
注意
最初是作为一个'‘int *’'声明 accept()的第三个参数(libc4和libc5和许多其他系统,如4.x的BSD,SunOS 4上,SGI);下一个POSIX.1g标准草案希望改变它变成了'size_t*',那是什么它是在SunOS5。后来POSIX汇票“socklen_t*”,这样做对单一Unix规范和glibc2。
另请参阅
access()函数
名称
access - 检查用户的权限的文件
内容简介
#include <unistd.h> int access(const char *pathname, int mode); |
描述
access()检查该进程是否将被允许读,写或测试存在的文件(或其他文件系统对象),其名称是路径名。如果 pathname 的符号链接文件权限这个符号链接所提到的测试.
mode 是一种包括一个或多个掩码 R_OK, W_OK, X_OK 和 F_OK.
R_OK, W_OK 和 X_OK 检查文件是否存在并具有读,写和执行权限,分别要求。 F_OK 只是要求检查存在的文件。
测试依赖于权限的目录中出现的文件路径 pathname ,并在途中遇到的符号链接的目录和文件的权限。
检查进程的真实的UID和GID,而不是ID作为实际尝试操作时的有效完成。这是为了让设置用户ID程序可以轻松地确定调用用户的权限。
只有访问位被选中,而不是文件类型或内容。因此,如果一个目录被发现是“可写,”它可能意味着文件可以在目录中创建,而不是作为一个文件可以写入该目录。同样,一个DOS文件可能被发现是“可执行文件”,但仍然会失败调用execve(2)调用。
如果过程中有适当的权限,执行可能表明,即使没有任何执行文件的权限位被设置为X_OK成功。
返回值
成功(所有请求的权限),则返回0。错误(至少一个位模式要求被拒绝的权限,或发生其他一些错误),则返回-1,errno设置为合适。
错误
access() 可能会失败,如果:
标签 | 描述 |
EACCES | The requested access would be denied to the file or search permission is denied for one of the directories in the path prefix of pathname. (See also path_resolution(2).) |
ELOOP | Too many symbolic links were encountered in resolvingpathname. |
ENAMETOOLONG | pathname is too long. |
ENOENT | A directory component in pathname would have been accessible but does not exist or was a dangling symbolic link. |
ENOTDIR | A component used as a directory in pathname is not, in fact, a directory. |
EROFS | Write permission was requested for a file on a read-only filesystem. |
access() 可能会失败,如果:
标签 | 描述 |
EFAULT | pathname points outside your accessible address space. |
EINVAL | mode was incorrectly specified. |
EIO | An I/O error occurred. |
ENOMEM | Insufficient kernel memory was available. |
ETXTBSY | Write access was requested to an executable which is being executed. |
限制
access() 返回一个错误,如果没有在所请求的调用失败的访问类型,即使其他类型可能会成功。
access() 可能无法正常工作与UID映射NFS文件系统上启用UID映射,因为在服务器上完成,并从客户端隐藏,检查权限。
使用 access() 来检查用户是否被授权,例如打开一个文件之前,其实这样使用 open(2)创建一个安全漏洞,因为用户可能会利用检查并打开文件操作的间隔时间短。
C遵循于
SVr4, POSIX.1-2001, 4.3BSD
请另参阅
acct()函数
名称
acct - 切换或关闭进程记帐
内容简介
#include <unistd.h> int acct(const char *filename); |
描述
与现有的文件名作为参数调用时,占被打开,每个终止的进程的记录,被追加到文件名作为终止。参数为NULL 引起占用被关闭。
返回值
成功则返回0。错误则返回-1,errno 设置为合适。
错误
标签 | 描述 |
EACCES | Write permission is denied for the specified file, or search permission is denied for one of the directories in the path prefix of filename (see also path_resolution(2)), or filename is not a regular file. |
EFAULT | filename points outside your accessible address space. |
EIO | Error writing to the file filename. |
EISDIR | filename is a directory. |
ELOOP | Too many symbolic links were encountered in resolving filename. |
ENAMETOOLONG | filename was too long. |
ENFILE | The system limit on the total number of open files has been reached. |
ENOENT | The specified filename does not exist. |
ENOMEM | Out of memory. |
ENOSYS | BSD process accounting has not been enabled when the operating system kernel was compiled. The kernel configuration parameter controlling this feature is CONFIG_BSD_PROCESS_ACCT. |
ENOTDIR | A component used as a directory in filename is not in fact a directory. |
EPERM | The calling process has insufficient privilege to enable process accounting. On Linux the CAP_SYS_PACCT capability is required. |
EROFS | filename refers to a file on a read-only file system. |
EUSERS | There are no more free file structures or we ran out of memory. |
遵循于
SVr4, 4.3BSD (but not POSIX).
注意
没有账号产生的程序运行时发生崩溃。特别是无穷的过程从来没有账号。
add_key()函数
名称
add_key - 添加到内核的密钥管理机制一个键
内容简介
#include <keyutils.h> key_serial_t add_key(const char *type, const char *description, const void *payload, size_t plen, key_serial_t keyring); |
描述
add_key() 要求内核给定类型和描述来创建或更新一个键,它的有效载荷plen 长度实例,将它安装到提名 keyringand,返回其序列号。
密钥类型可能会拒绝该数据,如果它是在错误的格式或以其他方式无效。
如果目标的钥匙圈已经包含匹配指定类型和描述,然后,如果密钥类型支持一个键,该键将被更新,而不是创建一个新的密钥,如果没有,将创建一个新的密钥,它将取代链接到现存的核心,从钥匙圈。
目的地钥匙圈序号可能是一个有效的钥匙圈,主调用写入权限,或者它可以是一个特殊的密钥环ID:
标签 | 描述 |
KEY_SPEC_THREAD_KEYRING | This specifies the caller’s thread-specific keyring. |
KEY_SPEC_PROCESS_KEYRING | This specifies the caller’s process-specific keyring. |
KEY_SPEC_SESSION_KEYRING | This specifies the caller’s session-specific keyring. |
KEY_SPEC_USER_KEYRING | This specifies the caller’s UID-specific keyring. |
KEY_SPEC_USER_SESSION_KEYRING | This specifies the caller’s UID-session keyring. |
密钥类型
有很多可供选择的核心密钥管理代码的密钥类型,而这些可以被指定为这个函数:
标签 | 描述 |
“user” | Keys of the user-defined key type may contain a blob of arbitrary data, and thedescription may be any valid string, though it is preferred that the description be prefixed with a string representing the service to which the key is of interest and a colon (for instance “afs:mykey”). The payload may be empty or NULL for keys of this type. |
“keyring” | Keyrings are special key types that may contain links to sequences of other keys of any type. If this interface is used to create a keyring, then a NULL payload should be specified, andplen should be zero. |
返回值
成功 add_key() 返回序列号密钥,它创建或更新。错误将返回值-1并且errno将被设置为一个适当的错误。
错误
标签 | 描述 |
ENOKEY | The keyring doesn’t exist. |
EKEYEXPIRED | The keyring has expired. |
EKEYREVOKED | The keyring has been revoked. |
EINVAL | The payload data was invalid. |
ENOMEM | Insufficient memory to create a key. |
EDQUOT | The key quota for this user would be exceeded by creating this key or linking it to the keyring. |
EACCES | The keyring wasn’t available for modification by the user. |
链接
虽然这是一个Linux系统调用,它是在libc中不存在,但可以发现合适的 libkey 工具。链接时,lkey 工具应指定给链接器。
另请参阅
adjtimex()函数
名称
adjtimex - 调内核时钟
内容简介
#include <sys/timex.h> |
描述
Linux使用大卫L. Mills的时钟调整算法(参见RFC1305)。 adjtimex()系统调用读取和任选设置该算法的调整参数。这需要一个指针的TIMEX结构,更新内核参数字段值,并返回相同的结构与当前的内核值。这种结构的声明如下:
struct timex { |
“modes ”字段确定的参数,如果有的话就设置。它可能包含一个按位或组合的零个或多个以下bits:
#define ADJ_OFFSET 0x0001 /* time offset */ |
普通用户限制到零值模式mode。只有超级用户可以设置任何参数。
返回值
成功,adjtimex() 返回时钟状态:
#define TIME_OK 0 /* clock synchronized */ |
如果失败,adjtimex()返回-1,并设置errno。
错误
标签 | 描述 |
EFAULT | buf does not point to writable memory. |
EINVAL | An attempt is made to setbuf.offset to a value outside the range -131071 to +131071,or to set buf.status to a value other than those listed above,or to set buf.tick to a value outside the range 900000/HZ to 1100000/HZ, where HZ is the system timer interrupt frequency. |
EPERM | buf.mode is non-zero and the caller does not have sufficient privilege.Under Linux the CAP_SYS_TIME capability is required. |
遵循于
adjtimex() 是Linux特有的,并且不应该被用在程序准备移植. 查看adjtime(3)用于调整系统时钟的方法,更轻便,但弹性较差。
另请参阅
afs_syscall()函数
名称
以下是Unix,Linux系统还没有实现的清单,写这个页面的时候的系统调用:
afs_syscall, |
内容简介
未实现的系统调用。
描述
这些系统调用在Linux 2.4内核中没有实现。
返回值
这些系统调用总是返回-1,并设置
errno to ENOSYS.
注意
注意:ftime(3), profil(3) and ulimit(3) 实现了库函数。
系统调用如: alloc_hugepages(2), free_hugepages(2),ioperm(2), iopl(2), and vm86(2) 只存在于一定的架构。
系统调用如: ipc(2), create_module(2), init_module(2), anddelete_module(2) 只存在Linux内核时,内置支持他们。
另请参阅
alarm()函数
名称
alarm - 设置闹钟传递信号
内容简介
#include <unistd.h> unsigned int alarm(unsigned int seconds); |
描述
alarm() arranges for a SIGALRM signal to be delivered to the process in secondsseconds.
If seconds is zero, no new alarm() is scheduled.
In any event any previously set alarm() is cancelled.
返回值
alarm() 返回剩余的秒数,直到任何先前预定的报警是由于传递或零,如果没有先前预定的报警。
注意
alarm() and setitimer() share the same timer; calls to one will interfere with use of the other.
sleep() may be implemented using SIGALRM; mixing calls to alarm() and sleep() is a bad idea.
调度延迟,以往一样,导致执行任意数量的时间被推迟的进程。
系统中的每个进程都有一个私有的闹钟。这个闹钟很像一个计时器,可以设置在一定秒数后闹钟。时间一到,时钟就发送一个信号SIGALRM到进程。
函数原型:unsigned int alarm(unsigned int seconds);
头文件:#include<unistd.h>
函数说明: alarm()用来设置信号SIGALRM在经过参数seconds指定的秒数后,传送给目前的进程。如果参数seconds为0,则之前设置的闹钟会被取消,并将剩下的时间返回。
返回值:如果调用此alarm()前,进程已经设置了闹钟时间,则返回上一个闹钟时间的剩余时间,否则返回0。 出错返回-1。
例1:
int main(int argc, char *argv[]) {
unsigned int timeleft;
printf( "Set the alarm and sleep\n" ); alarm( 10 ); sleep( 5 );
timeleft = alarm( 0 ); //获得上一个闹钟的剩余时间:5秒 printf( "\Time left before cancel, and rearm: %d\n", timeleft );
alarm( timeleft );
printf( "\Hanging around, waiting to die\n" ); pause(); //让进程暂停直到信号出现
return EXIT_SUCCESS;
}
运行结果:
首先打印 Set the alarm and sleep
5秒后打印 Time left before cancel, and rearm: 5
Hanging around, waiting to die
再经过5秒,程序结束
除非进程为SIGALRM设置了处理函数,否则信号将杀死这个进程。比较下例中signal(SIGALRM, wakeup);语句打开与关闭的区别。
例2:
static void timer(int sig) { static int count=0; count++;
printf("\ncount = %d\n", count);
if(sig == SIGALRM) { printf("timer\n"); }
signal(SIGALRM, timer); alarm(1);
if (count == 5) alarm(0); return; }
int main(int argc, char *argv[]) { signal(SIGALRM, timer); alarm(1); while(1);
}
计时器的另一个用途是调度一个在将来的某个时刻发生的动作同时做些其他事情。调度一个将要发生的动作很简单,通过调用alarm来设置计时器,然后继续做别的事情。当计时器计时到0时,信号发送,处理函数被调用。
遵循于
SVr4, POSIX.1-2001, 4.3BSD
另请参阅
alloc_hugepages()函数
名称
alloc_hugepages, free_hugepages - 分配或释放巨大的页面。
内容简介
void *alloc_hugepages(int key, void *addr, size_t len, int prot, int flag); int free_hugepages(void *addr); |
描述
The system calls alloc_hugepages() and free_hugepages() were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.
On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process’ memory or to free them again. Huge pages are locked into memory, and are not swapped.
The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the samekey, and inherited by child processes.
The addr parameter of free_hugepages() tells which page is being freed: it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.
The len parameter is the length of the required segment. It must be a multiple of the huge page size.
The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.
The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists.
返回值
On success, alloc_hugepages() returns the allocated virtual address, andfree_hugepages() returns zero. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
ENOSYS | The system call is not supported on this kernel. |
遵循于
These calls existed only in Linux 2.5.36 through to 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.
文件
/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.
/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.
注意
The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by using mmap() to map files in this virtual filesystem.
The maximal number of huge pages can be specified using the hugepages= boot parameter.
arch_prctl()函数
名称
arch_prctl - 设置架构特定的线程状态
内容简介
#include <asm/prctl.h> int arch_prctl(int code, unsigned long addr) |
描述
arch_prctl() 函数设置架构的具体进程或线程状态。代码选择一个子功能和参数地址传递给它。
x86-64的子函数是:
标签 | 描述 |
ARCH_SET_FS | Set the 64bit base for the FS register toaddr. |
ARCH_GET_FS | Return the 64bit base value for theFS register of the current thread in theunsigned long pointed to by the address parameter |
ARCH_SET_GS | Set the 64bit base for the GS register to addr. |
ARCH_GET_GS | Return the 64bit base value for the GS register of the current thread in the unsigned long pointed to by the addressparameter. |
错误
标签 | 描述 |
EFAULT | addr points to an unmapped address or is outside the process address space. |
EINVAL | code is not a valid subcommand. |
EPERM | addr is outside the process address space. |
作者
Man page written by Andi Kleen.
遵循于
arch_prctl() 是一个Linux/x86-64的扩展,并且不应该被用在程序准备移植。
请另参阅
bdflush()函数
名称
bdflush - 启动,刷新,或调缓冲区脏刷新守护
内容简介
int bdflush(int func, long *address); int bdflush(int func, long data); |
描述
bdflush() starts, flushes, or tunes the buffer-dirty-flush daemon. Only a privileged process (one with the CAP_SYS_ADMIN capability) may call bdflush().
If func is negative or 0, and no daemon has been started, then bdflush() enters the daemon code and never returns.
If func is 1, some dirty buffers are written to disk.
If func is 2 or more and is even (low bit is 0), then address is the address of a long word, and the tuning parameter numbered (func-2)/2 is returned to the caller in that address.
If func is 3 or more and is odd (low bit is 1), then data is a long word, and the kernel sets tuning parameter numbered (func-3)/2 to that value.
The set of parameters, their values, and their legal ranges are defined in the kernel source file fs/buffer.c.
返回值
If func is negative or 0 and the daemon successfully starts, bdflush() never returns. Otherwise, the return value is 0 on success and -1 on failure, with errno set to indicate the error.
错误
标签 | 描述 |
EBUSY | An attempt was made to enter the daemon code after another process has already entered. |
EFAULT | address points outside your accessible address space. |
EINVAL | An attempt was made to read or write an invalid parameter number, or to write an invalid value to a parameter. |
EPERM | Caller does not have the CAP_SYS_ADMIN capability. |
遵循于
bdflush() Linux特有的,并且不应该被用在程序准备移植。
另请参阅
bind()函数
名称
bind - 绑定一个名字到一个套接字
内容简介
#include
#include
int bind(int sockfd, const struct sockaddr *my_addr ", socklen_t " addrlen );
描述
bind() gives the socket sockfd the local address my_addr. my_addr is addrlen bytes long. Traditionally, this is called \(lqassigning a name to a socket.\(rq When a socket is created with socket(2), it exists in a name space (address family) but has no name assigned.
It is normally necessary to assign a local address using bind() before a SOCK_STREAMsocket may receive connections (see accept(2)).
The rules used in name binding vary between address families. Consult the manual entries in Section 7 for detailed information. For AF_INET see ip(7), for AF_INET6 seeipv6(7), for AF_UNIX see unix(7), for AF_APPLETALK see ddp(7), for AF_PACKET seepacket(7), for AF_X25 see x25(7) and for AF_NETLINK see netlink(7).
The actual structure passed for the my_addr argument will depend on the address family. The sockaddr structure is defined as something like:
#include
#include
#include
#include
#define MY_SOCK_PATH "/somepath"
int
main(int argc, char *argv[])
{
int sfd;
struct sockaddr_un addr;
sfd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sfd == -1) {
perror("socket");
exit(EXIT_FAILURE);
}
memset(&addr, 0, sizeof(struct sockaddr_un));
/* Clear structure */
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, MY_SOCK_PATH,
sizeof(addr.sun_path) - 1);
if (bind(sfd, (struct sockaddr *) &addr,
sizeof(struct sockaddr_un)) == -1) {
perror("bind");
exit(EXIT_FAILURE);
}
...
}
The only purpose of this structure is to cast the structure pointer passed in my_addr in order to avoid compiler warnings. The following example shows how this is done when binding a socket in the Unix (AF_UNIX) domain:
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Error Code | 描述 |
EACCES | The address is protected, and the user is not the superuser. |
EADDRINUSE | The given address is already in use. |
EBADF | sockfd is not a valid descriptor. |
EINVAL | The socket is already bound to an address. |
ENOTSOCK | sockfd is a descriptor for a file, not a socket. |
The following errors are specific to UNIX domain (AF_UNIX) sockets: | |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EADDRNOTAVAIL | A non-existent interface was requested or the requested address was not local. |
EFAULT | my_addr points outside the user’s accessible address space. |
EINVAL | The addrlen is wrong, or the socket was not in the AF_UNIXfamily. |
ELOOP | Too many symbolic links were encountered in resolving my_addr. |
ENAMETOOLONG | my_addr is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix is not a directory. |
EROFS | The socket inode would reside on a read-only file system. |
BUGS
透明代理的选择没有被描述。
遵循于
SVr4, 4.4BSD (the bind() function first appeared in 4.2BSD).
注意
The third argument of bind() is in reality an int (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).
另请参阅
break未实现
名称
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现的系统调用
内容简介
未实现的系统调用。
描述
这些系统调用在Linux 2.4内核中没有实现。
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
brk()函数
名称
brk, sbrk - 改变数据段大小
内容简介
#include
int brk(void *end_data_segment);
void *sbrk(intptr_t increment);
描述
brk() sets the end of the data segment to the value specified by end_data_segment, when that value is reasonable, the system does have enough memory and the process does not exceed its max data size (see setrlimit(2)).
sbrk() increments the program’s data space by increment bytes. sbrk() isn’t a system call, it is just a C library wrapper. Calling sbrk() with an increment of 0 can be used to find the current location of the program break.
返回值
On success, brk() returns zero, and sbrk() returns a pointer to the start of the new area. On error, -1 is returned, and errno is set to ENOMEM.
遵循于
4.3BSD; SUSv1, marked LEGACY in SUSv2, removed in POSIX.1-2001.
brk() and sbrk() are not defined in the C Standard and are deliberately excluded from the POSIX.1 standard (see paragraphs B.1.1.1.3 and B.8.3.3).
注意
Various systems use various types for the parameter of sbrk(). Common are int, ssize_t,ptrdiff_t, intptr_t.
另请参阅
cacheflush()函数
名称
cacheflush - 刷新指令和/或数据高速缓存的内容
内容简介
#include
int cacheflush(char *addr, int nbytes, int cache);
描述
cacheflush() 刷新指定的缓存(S)用户地址范围内的地址(地址为nbytes-1)的内容。缓存可能是:
标签 | 描述 |
ICACHE | Flush the instruction cache. |
DCACHE | Write back to memory and invalidate the affected valid cache lines. |
BCACHE | Same as (ICACHE|DCACHE). |
返回值
cacheflush() 成功返回0或-1错误。如果检测到错误,errno将指示错误。
错误
Error Code | 描述 |
EFAULT | Some or all of the address range addr to (addr+nbytes-1) is not accessible. |
EINVAL | cache parameter is not one of ICACHE, DCACHE, or BCACHE. |
BUGS
目前的实现忽略addr和nbytes以论据。因此,总是刷新整个缓存。
注意
这个系统调用是仅适用于基于MIPS的系统。它不应该被用于准备移植的程序。
chdir()函数
chdir, fchdir - 改变工作目录
内容简介
#include
int chdir(const char *path);
int fchdir(int fd);
描述
chdir() changes the current working directory to that specified in path. fchdir() is identical to chdir(); the only difference is that the directory is given as an open file descriptor.
返回值
成功,则返回0。上的错误,则返回-1,errno设置为合适。
错误
Depending on the file system, other errors can be returned. The more general errors for chdir() are listed below:
Error Code | 描述 |
EACCES | Search permission is denied for one of the directories in the path prefix of path. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of path is not a directory. |
The general errors for fchdir() are listed below: | |
EACCES | Search permission was denied on the directory open on fd. |
EBADF | fd is not a valid file descriptor. |
注意
A child process created via fork(2) inherits its parent’s current working directory. The current working directory is left unchanged by execve(2).
The prototype for fchdir() is only available if _BSD_SOURCE is defined, or_XOPEN_SOURCE is defined with the value 500.
遵循于
SVr4, 4.4BSD, POSIX.1-2001.
另请参阅
chmod()函数
名称
chmod, fchmod - 更改文件的权限
内容简介
#include
#include
int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);
描述
给定的文件路径或引用fildes的的模式改变。
所指定的“或”以下模式:
标签 | 描述 |
S_ISUID | 04000 set user ID on execution |
S_ISGID | 02000 set group ID on execution |
S_ISVTX | 01000 sticky bit |
S_IRUSR | 00400 read by owner |
S_IWUSR | 00200 write by owner |
S_IXUSR | 00100 execute/search by owner |
S_IRGRP | 00040 read by group |
S_IWGRP | 00020 write by group |
S_IXGRP | 00010 execute/search by group |
S_IROTH | 00004 read by others |
S_IWOTH | 00002 write by others |
S_IXOTH | 00001 execute/search by others |
The effective UID of the calling process must match the owner of the file, or the process must be privileged (Linux: it must have the CAP_FOWNER capability).
If the calling process is not privileged (Linux: does not have the CAP_FSETIDcapability), and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.
As a security measure, depending on the file system, the set-user-ID and set-group-ID execution bits may be turned off if a file is written. (On Linux this occurs if the writing process does not have the CAP_FSETID capability.) On some file systems, only the superuser can set the sticky bit, which may have a special meaning. For the sticky bit, and for set-user-ID and set-group-ID bits on directories, see stat(2).
On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
根据文件系统上的,其他错误,也可以返回,chmod() 更普遍的错误列举如下:
Error Code | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix is not a directory. |
EPERM | The effective UID does not match the owner of the file, and the process is not privileged (Linux: it does not have theCAP_FOWNER capability). |
EROFS | The named file resides on a read-only file system. |
The general errors for fchmod() are listed below: | |
EBADF | The file descriptor fildes is not valid. |
EIO | See above. |
EPERM | See above. |
EROFS | See above. |
遵循于
4.4BSD, SVr4, POSIX.1-2001.
另请参阅
chown()函数
chown, fchown, lchown - 改变文件的所有权
内容简介
#include <sys/types.h> int chown(const char *path, uid_t owner, gid_t group); int fchown(int fd, uid_t owner, gid_t group); int lchown(const char *path, uid_t owner, gid_t group); |
描述
These system calls change the owner and group of the file specified by path or by fd. Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. A privileged process (Linux: with CAP_CHOWN) may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed. When the owner or group of an executable file are changed by a non-superuser, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown(); the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown().
返回值
成功,则返回0。上的错误,则返回-1,errno设置为合适。
错误
根据文件系统上的,其他错误,也可以返回 chown() 更一般的错误在下面列出。
Error Code | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix is not a directory. |
EPERM | The calling process did not have the required permissions (see above) to change owner and/or group. |
EROFS | The named file resides on a read-only file system. |
The general errors for fchown() are listed below: | |
EBADF | The descriptor is not valid. |
EIO | A low-level I/O error occurred while modifying the inode. |
ENOENT | See above. |
EPERM | See above. |
EROFS | See above. |
注意
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown() did not follow symbolic links. Since Linux 2.1.81, chown() does follow symbolic links, and there is a new system call lchown() that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown()) has got the same syscall number, and chown() got the newly introduced number.
The prototype for fchown() is only available if _BSD_SOURCE is defined.
遵循于
4.4BSD, SVr4, POSIX.1-2001. The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files).
限制
The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
另请参阅
chroot()函数
chroot - 改变根目录
内容简介
#include <unistd.h> int chroot(const char *path); |
描述
chroot() 改变根目录中指定的路径。此目录将用于与/开头的路径名。根目录继承当前进程的的所有子目录。
Only a privileged process (Linux: one with the CAP_SYS_CHROOT capability) may callchroot(2).This call changes an ingredient in the pathname resolution process and does nothing else.
This call does not change the current working directory, so that after the call ‘.’ can be outside the tree rooted at ‘/’. In particular, the superuser can escape from a ‘chroot jail’ by doing ‘mkdir foo; chroot foo; cd ..’.
This call does not close open file descriptors, and such file descriptors may allow access to files outside the chroot tree.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Depending on the file system, other errors can be returned. The more general errors are listed below:
Error Code | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of path is not a directory. |
EPERM | The caller has insufficient privilege. |
遵循于
SVr4, 4.4BSD, SUSv2 (marked LEGACY). This function is not part of POSIX.1-2001.
注意
A child process created via fork(2) inherits its parent’s root directory. The root directory is left unchanged by execve(2).
FreeBSD has a stronger jail() system call.
另请参阅
clone()函数
clone, __clone2 - 创建一个子进程
内容简介
#include
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *pid, struct user_desc *tls
", pid_t *" ctid " */ );"
int __clone2(int (*fn)(void *), void *child_stack_base,
size_t stack_size, int flags, void *arg, ...
/* pid_t *pid, struct user_desc *tls
", pid_t *" ctid " */ );"
#include <sched.h>
描述
clone() creates a new process, in a manner similar to fork(2). It is actually a library function layered on top of the underlying clone() system call, hereinafter referred to assys_clone. A description of sys_clone is given towards the end of this page.
Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers. (Note that on this manual page, "calling process" normally corresponds to "parent process". But see the description of CLONE_PARENT below.)
The main use of clone() is to implement threads: multiple threads of control in a program that run concurrently in a shared memory space.
When the child process is created with clone(), it executes the function applicationfn(arg). (This differs from fork(2), where execution continues in the child from the point of the fork(2) call.) The fn argument is a pointer to a function that is called by the child process at the beginning of its execution. The arg argument is passed to the fn function.
When the fn(arg) function application returns, the child process terminates. The integer returned by fn is the exit code for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.
The child_stack argument specifies the location of the stack used by the child process. Since the child and calling process may share memory, it is not possible for the child process to execute in the same stack as the calling process. The calling process must therefore set up memory space for the child stack and pass a pointer to this space toclone(). Stacks grow downwards on all processors that run Linux (except the HP PA processors), so child_stack usually points to the topmost address of the memory space set up for the child stack.
The low byte of flags contains the number of the termination signal sent to the parent when the child dies. If this signal is specified as anything other than SIGCHLD, then the parent process must specify the __WALL or __WCLONE options when waiting for the child with wait(2). If no signal is specified, then the parent process is not signaled when the child terminates.
.
flags may also be bitwise-or’ed with zero or more of the following constants, in order to specify what is shared between the calling process and the child process:
标签 | 描述 |
CLONE_PARENT (since Linux 2.3.12) | If CLONE_PARENT is set, then the parent of the new child (as returned by getppid(2)) will be the same as that of the calling process. If CLONE_PARENT is not set, then (as with fork(2)) the child’s parent is the calling process. Note that it is the parent process, as returned bygetppid(2), which is signaled when the child terminates, so that if CLONE_PARENT is set, then the parent of the calling process, rather than the calling process itself, will be signaled. |
CLONE_FS | If CLONE_FS is set, the caller and the child processes share the same file system information. This includes the root of the file system, the current working directory, and the umask. Any call to chroot(2), chdir(2), or umask(2) performed by the calling process or the child process also affects the other process. If CLONE_FS is not set, the child process works on a copy of the file system information of the calling process at the time of the clone() call. Calls to chroot(2),chdir(2), umask(2) performed later by one of the processes do not affect the other process. |
CLONE_FILES | If CLONE_FILES is set, the calling process and the child processes share the same file descriptor table. Any file descriptor created by the calling process or by the child process is also valid in the other process. Similarly, if one of the processes closes a file descriptor, or changes its associated flags (using the fcntl(2) F_SETFD operation), the other process is also affected. If CLONE_FILES is not set, the child process inherits a copy of all file descriptors opened in the calling process at the time of clone(). (The duplicated file descriptors in the child refer to the same open file descriptions (seeopen(2)) as the corresponding file descriptors in the calling process.) Subsequent operations that open or close file descriptors, or change file descriptor flags, performed by either the calling process or the child process do not affect the other process. |
CLONE_NEWNS (since Linux 2.4.19) | Start the child in a new namespace. Every process lives in a namespace. The namespace of a process is the data (the set of mounts) describing the file hierarchy as seen by that process. After a fork(2) orclone(2) where the CLONE_NEWNS flag is not set, the child lives in the same namespace as the parent. The system calls mount(2) and umount(2) change the namespace of the calling process, and hence affect all processes that live in the same namespace, but do not affect processes in a different namespace. After a clone(2) where the CLONE_NEWNS flag is set, the cloned child is started in a new namespace, initialized with a copy of the namespace of the parent. Only a privileged process (one having the CAP_SYS_ADMIN capability) may specify theCLONE_NEWNS flag. It is not permitted to specify bothCLONE_NEWNS and CLONE_FS in the same clone() call. |
CLONE_SIGHAND | If CLONE_SIGHAND is set, the calling process and the child processes share the same table of signal handlers. If the calling process or child process calls sigaction(2) to change the behavior associated with a signal, the behavior is changed in the other process as well. However, the calling process and child processes still have distinct signal masks and sets of pending signals. So, one of them may block or unblock some signals usingsigprocmask(2) without affecting the other process. If CLONE_SIGHAND is not set, the child process inherits a copy of the signal handlers of the calling process at the time clone() is called. Calls to sigaction(2) performed later by one of the processes have no effect on the other process. Since Linux 2.6.0-test6, flags must also includeCLONE_VM if CLONE_SIGHAND is specified |
CLONE_PTRACE | If CLONE_PTRACE is specified, and the calling process is being traced, then trace the child also (see ptrace(2)). |
CLONE_UNTRACED (since Linux 2.5.46) | If CLONE_UNTRACED is specified, then a tracing process cannot force CLONE_PTRACE on this child process. |
CLONE_STOPPED (since Linux 2.6.0-test2) | If CLONE_STOPPED is set, then the child is initially stopped (as though it was sent a SIGSTOP signal), and must be resumed by sending it a SIGCONT signal. |
CLONE_VFORK | If CLONE_VFORK is set, the execution of the calling process is suspended until the child releases its virtual memory resources via a call to execve(2) or _exit(2) (as with vfork(2)). If CLONE_VFORK is not set then both the calling process and the child are schedulable after the call, and an application should not rely on execution occurring in any particular order. |
CLONE_VM | If CLONE_VM is set, the calling process and the child processes run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. Moreover, any memory mapping or unmapping performed with mmap(2) or munmap(2) by the child or calling process also affects the other process. If CLONE_VM is not set, the child process runs in a separate copy of the memory space of the calling process at the time of clone(). Memory writes or file mappings/unmappings performed by one of the processes do not affect the other, as with fork(2). |
CLONE_PID (obsolete) | If CLONE_PID is set, the child process is created with the same process ID as the calling process. This is good for hacking the system, but otherwise of not much use. Since 2.3.21 this flag can be specified only by the system boot process (PID 0). It disappeared in Linux 2.5.16. |
CLONE_THREAD (since Linux 2.4.0-test8) | If CLONE_THREAD is set, the child is placed in the same thread group as the calling process. To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is used to refer to the processes within a thread group. Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller. The threads within a group can be distinguished by their (system-wide) unique thread IDs (TID). A new thread’s TID is available as the function result returned to the caller of clone(), and a thread can obtain its own TID using gettid(2). When a call is made to clone() without specifyingCLONE_THREAD, then the resulting thread is placed in a new thread group whose TGID is the same as the thread’s TID. This thread is the leader of the new thread group. A new thread created with CLONE_THREAD has the same parent process as the caller of clone() (i.e., likeCLONE_PARENT), so that calls to getppid(2) return the same value for all of the threads in a thread group. When a CLONE_THREAD thread terminates, the thread that created it using clone() is not sent a SIGCHLD (or other termination) signal; nor can the status of such a thread be obtained using wait(2). (The thread is said to be detached.) After all of the threads in a thread group terminate the parent process of the thread group is sent a SIGCHLD(or other termination) signal. If any of the threads in a thread group performs anexecve(2), then all threads other than the thread group leader are terminated, and the new program is executed in the thread group leader. If one of the threads in a thread group creates a child using fork(2), then any thread in the group can wait(2) for that child. Since Linux 2.5.35, flags must also includeCLONE_SIGHAND if CLONE_THREAD is specified. Signals may be sent to a thread group as a whole (i.e., a TGID) using kill(2), or to a specific thread (i.e., TID) usingtgkill(2). Signal dispositions and actions are process-wide: if an unhandled signal is delivered to a thread, then it will affect (terminate, stop, continue, be ignored in) all members of the thread group. Each thread has its own signal mask, as set bysigprocmask(2), but signals can be pending either: for the whole process (i.e., deliverable to any member of the thread group), when sent with kill(2); or for an individual thread, when sent with tgkill(2). A call tosigpending(2) returns a signal set that is the union of the signals pending for the whole process and the signals that are pending for the calling thread. If kill(2) is used to send a signal to a thread group, and the thread group has installed a handler for the signal, then the handler will be invoked in exactly one, arbitrarily selected member of the thread group that has not blocked the signal. If multiple threads in a group are waiting to accept the same signal using sigwaitinfo(2), the kernel will arbitrarily select one of these threads to receive a signal sent using kill(2). |
CLONE_SYSVSEM (since Linux 2.5.10) | If CLONE_SYSVSEM is set, then the child and the calling process share a single list of System V semaphore undo values (see semop(2)). If this flag is not set, then the child has a separate undo list, which is initially empty. |
CLONE_SETTLS (since Linux 2.5.32) | The newtls parameter is the new TLS (Thread Local Storage) descriptor. (See set_thread_area(2).) |
CLONE_PARENT_SETTID(since Linux 2.5.49) | Store child thread ID at location parent_tidptr in parent and child memory. (In Linux 2.5.32-2.5.48 there was a flag CLONE_SETTID that did this.) |
CLONE_CHILD_SETTID(since Linux 2.5.49) | Store child thread ID at location child_tidptr in child memory. |
CLONE_CHILD_CLEARTID(since Linux 2.5.49) | Erase child thread ID at location child_tidptr in child memory when the child exits, and do a wakeup on the futex at that address. The address involved may be changed by the set_tid_address(2) system call. This is used by threading libraries. |
sys_clone
The sys_clone system call corresponds more closely to fork(2) in that execution in the child continues from the point of the call. Thus, sys_clone only requires the flags andchild_stack arguments, which have the same meaning as for clone(). (Note that the order of these arguments differs from clone().)
Another difference for sys_clone is that the child_stack argument may be zero, in which case copy-on-write semantics ensure that the child gets separate copies of stack pages when either process modifies the stack. In this case, for correct operation, theCLONE_VM option should not be specified.
Since Linux 2.5.49 the system call has five parameters. The two new parameters areparent_tidptr which points to the location (in parent and child memory) where the child thread ID will be written in case CLONE_PARENT_SETTID was specified, and child_tidptrwhich points to the location (in child memory) where the child thread ID will be written in case CLONE_CHILD_SETTID was specified.
返回值
On success, the thread ID of the child process is returned in the caller’s thread of execution. On failure, a -1 will be returned in the caller’s context, no child process will be created, and errno will be set appropriately.
错误
标签 | 描述 |
EAGAIN | Too many processes are already running. |
EINVAL | CLONE_SIGHAND was specified, but CLONE_VM was not. (Since Linux 2.6.0-test6.) |
EINVAL | CLONE_THREAD was specified, but CLONE_SIGHAND was not. (Since Linux 2.5.35.) |
EINVAL | Both CLONE_FS and CLONE_NEWNS were specified in flags. |
EINVAL | Returned by clone() when a zero value is specified forchild_stack. |
ENOMEM | Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller’s context that need to be copied. |
EPERM | CLONE_NEWNS was specified by a non-root process (process without CAP_SYS_ADMIN). |
EPERM | CLONE_PID was specified by a process other than process 0. |
VERSIONS
There is no entry for clone() in libc5. glibc2 provides clone() as described in this manual page.
遵循于
The clone() and sys_clone calls are Linux specific and should not be used in programs intended to be portable.
注意
In the kernel 2.4.x series, CLONE_THREAD generally does not make the parent of the new thread the same as the parent of the calling process. However, for kernel versions 2.4.7 to 2.4.18 the CLONE_THREAD flag implied the CLONE_PARENT flag (as in kernel 2.6).
For a while there was CLONE_DETACHED (introduced in 2.5.32): parent wants no child-exit signal. In 2.6.2 the need to give this together with CLONE_THREADdisappeared. This flag is still defined, but has no effect. On x86, clone() should not be called through vsyscall, but directly through int $0x80. On IA-64, a different system call is used:
int __clone2(int (*fn)(void *), void *child_stack_base,
size_t stack_size, int flags, void *arg, ...
/* pid_t *pid, struct user_desc *tls
", pid_t *" ctid " */ );"
The __clone2() system call operates in the same way as clone(), except thatchild_stack_base points to the lowest address of the child’s stack area, and stack_sizespecifies the size of the stack pointed to by child_stack_base.
BUGS
Versions of the GNU C library that include the NPTL threading library contain a wrapper function for getpid(2) that performs caching of PIDs. In programs linked against such libraries, calls to getpid(2) may return the same value, even when the threads were not created using CLONE_THREAD (and thus are not in the same thread group). To get the truth, it may be necessary to use code such as the following
#include <syscall.h>
pid_t mypid;
mypid = syscall(SYS_getpid);
另请参阅
- fork (2)
- futex (2)
- getpid (2)
- gettid (2)
- set_thread_area (2)
- set_tid_address (2)
- tkill (2)
- unshare (2)
- wait (2)
close()函数
close - 关闭一个文件描述符
内容简介
#include <unistd.h> int close(int fd); |
描述
close() closes a file descriptor, so that it no longer refers to any file and may be reused. Any record locks (see fcntl(2)) held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).
If fd is the last copy of a particular file descriptor the resources associated with it are freed; if the descriptor was the last reference to a file which has been removed using unlink(2) the file is deleted.
返回值
close() 成功返回零。上的错误,则返回-1,errno设置为合适。
错误
标签 | 描述 |
EBADF | fd isn’t a valid open file descriptor. |
EINTR | The close() call was interrupted by a signal. |
EIO | An I/O error occurred. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
注意
Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
另请参阅
connect()函数
connect - 发起连接在套接字上
内容简介
#include
#include
int connect(int sockfd,
const struct sockaddr *serv_addr,
socklen_t addrlen);
描述
The connect() system call connects the socket referred to by the file descriptor sockfd to the address specified by serv_addr. The addrlen argument specifies the size of serv_addr. The format of the address in serv_addr is determined by the address space of the socket sockfd; see socket(2) for further details.
If the socket sockfd is of type SOCK_DGRAM then serv_addr is the address to which datagrams are sent by default, and the only address from which datagrams are received. If the socket is of type SOCK_STREAM or SOCK_SEQPACKET, this call attempts to make a connection to the socket that is bound to the address specified byserv_addr.
Generally, connection-based protocol sockets may successfully connect() only once; connectionless protocol sockets may use connect() multiple times to change their association. Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC.
返回值
如果连接或绑定成功,则返回0。上的错误,则返回-1,anderrno设置适当。
错误
以下是一般的套接字错误。有可能是其他域特定的错误代码。
Error Code | 描述 |
EACCES | For Unix domain sockets, which are identified by pathname: Write permission is denied on the socket file, or search permission is denied for one of the directories in the path prefix. (See also path_resolution(2).) |
EACCES, EPERM | The user tried to connect to a broadcast address without having the socket broadcast flag enabled or the connection request failed because of a local firewall rule. |
EADDRINUSE | Local address is already in use. |
EAFNOSUPPORT | The passed address didn’t have the correct address family in itssa_family field. |
EADDRNOTAVAIL | Non-existent interface was requested or the requested address was not local. |
EALREADY | The socket is non-blocking and a previous connection attempt has not yet been completed. |
EBADF | The file descriptor is not a valid index in the descriptor table. |
ECONNREFUSED | No one listening on the remote address. |
EFAULT | The socket structure address is outside the user’s address space. |
EINPROGRESS | The socket is non-blocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, use getsockopt(2) to read the SO_ERRORoption at level SOL_SOCKET to determine whether connect() completed successfully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure). |
EINTR | The system call was interrupted by a signal that was caught. |
EISCONN | The socket is already connected. |
ENETUNREACH | Network is unreachable. |
ENOTSOCK | The file descriptor is not associated with a socket. |
ETIMEDOUT | Timeout while attempting connection. The server may be too busy to accept new connections. Note that for IP sockets the timeout may be very long when syncookies are enabled on the server. |
遵循于
SVr4, 4.4BSD (the connect() function first appeared in 4.2BSD).
注意
The third argument of connect() is in reality an int (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).
BUGS
Unconnecting a socket by calling connect() with a AF_UNSPEC address is not yet implemented.
另请参阅
create_module()函数
create_module - 创建一个可加载模块项目
内容简介
#include <linux/module.h> caddr_t create_module(const char *name, size_t size); |
描述
create_module() 尝试创建一个可加载模块项目,并预定将需要按住模块的内核内存。此系统调用需要的特权。
返回值
On success, returns the kernel address at which the module will reside. On error -1 is returned and errno is set appropriately.
错误
Error Code | 描述 |
EEXIST | A module by that name already exists. |
EFAULT | name is outside the program’s accessible address space. |
EINVAL | The requested size is too small even for the module header information. |
ENOMEM | The kernel could not allocate a contiguous block of memory large enough for the module. |
EPERM | The caller was not privileged (did not have theCAP_SYS_MODULE capability). |
遵循于
create_module() is Linux specific.
注意
这个系统调用是目前唯一在Linux2.4内核,直到它在Linux2.6中删除。
另请参阅
open()函数
open, creat - 打开并可能创建一个文件或设备
内容简介
#include
#include
#include
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int creat(const char *pathname, mode_t mode);
描述
Given a pathname for a file, open() returns a file descriptor, a small, non-negative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.). The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
The new file descriptor is set to remain open across an execve(2) (i.e., theFD_CLOEXEC file descriptor flag described in fcntl(2) is initially disabled). The file offset is set to the beginning of the file (see lseek(2)).
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via thefcntl() F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
The parameter flags must include one of the following access modes: O_RDONLY,O_WRONLY, or O_RDWR. These request opening the file read-only, write-only, or read/write, respectively.
In addition, zero or more file creation flags and file status flags can be bitwise-or’d inflags. The file creation flags are O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC. The file status flags are all of the remaining flags listed below. The distinction between these two groups of flags is that the file status flags can be retrieved and (in some cases) modified using fcntl(2).
文件创建标志和文件状态标志的完整列表如下:
Error Code | 描述 |
O_APPEND | The file is opened in append mode. Before each write(), the file offset is positioned at the end of the file, as if with lseek().O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can’t be done without a race condition. |
O_ASYNC | Enable signal-driven I/O: generate a signal (SIGIO by default, but this can be changed via fcntl(2)) when input or output becomes possible on this file descriptor. This feature is only available for terminals, pseudo-terminals, sockets, and (since Linux 2.6) pipes and FIFOs. See fcntl(2) for further details. |
O_CREAT | If the file does not exist it will be created. The owner (user ID) of the file is set to the effective user ID of the process. The group ownership (group ID) is set either to the effective group ID of the process or to the group ID of the parent directory (depending on filesystem type and mount options, and the mode of the parent directory, see, e.g., the mount optionsbsdgroups and sysvgroups of the ext2 filesystem, as described inmount(8)). |
O_DIRECT | Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of a read(2) or write(2), data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment must fit the block size of the device. A semantically similar (but deprecated) interface for block devices is described in raw(8). |
O_DIRECTORY | If pathname is not a directory, cause the open to fail. This flag is Linux-specific, and was added in kernel version 2.1.126, to avoid denial-of-service problems if opendir(3) is called on a FIFO or tape device, but should not be used outside of the implementation of opendir. |
O_EXCL | When used with O_CREAT, if the file already exists it is an error and the open() will fail. In this context, a symbolic link exists, regardless of where it points to. O_EXCL is broken on NFS file systems; programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful. |
O_LARGEFILE | (LFS) Allow files whose sizes cannot be represented in an off_t(but can be represented in an off64_t) to be opened. |
O_NOATIME | (Since Linux 2.6.8) Do not update the file last access time (st_atime in the inode) when the file is read(2). This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity. This flag may not be effective on all filesystems. One example is NFS, where the server maintains the access time. |
O_NOCTTY | If pathname refers to a terminal device — see tty(4) — it will not become the process’s controlling terminal even if the process does not have one. |
O_NOFOLLOW | If pathname is a symbolic link, then the open fails. This is a FreeBSD extension, which was added to Linux in version 2.1.126. Symbolic links in earlier components of the pathname will still be followed. |
O_NONBLOCK orO_NDELAY | When possible, the file is opened in non-blocking mode. Neither the open() nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait. For the handling of FIFOs (named pipes), see also fifo(7). For a discussion of the effect of O_NONBLOCK in conjunction with mandatory file locks and with file leases, see fcntl(2). |
O_SYNC | The file is opened for synchronous I/O. Any write()s on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware.But see RESTRICTIONS below. |
O_TRUNC | If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified. |
Some of these optional flags can be altered using fcntl() after the file has been opened. The argument mode specifies the permissions to use in case a new file is created. It is modified by the process’s umask in the usual way: the permissions of the created file are (mode & ~umask). Note that this mode only applies to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor. | |
The following symbolic constants are provided for mode: | |
S_IRWXU | 00700 user (file owner) has read, write and execute permission |
S_IRUSR | 00400 user has read permission |
S_IWUSR | 00200 user has write permission |
S_IXUSR | 00100 user has execute permission |
S_IRWXG | 00070 group has read, write and execute permission |
S_IRGRP | 00040 group has read permission |
S_IWGRP | 00020 group has write permission |
S_IXGRP | 00010 group has execute permission |
S_IRWXO | 00007 others have read, write and execute permission |
S_IROTH | 00004 others have read permission |
S_IWOTH | 00002 others have write permission |
S_IXOTH | 00001 others have execute permission |
mode must be specified when O_CREAT is in the flags, and is ignored otherwise.
creat() is equivalent to open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC.
返回值
open() and creat() return the new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
注意
Note that open() can open device special files, but creat() cannot create them; usemknod(2) instead.
On NFS file systems with UID mapping enabled, open() may return a file descriptor but e.g. read(2) requests are denied with EACCES. This is because the client performsopen() by checking the permissions, but UID mapping is performed by the server upon read and write requests.
If the file is newly created, its st_atime, st_ctime, st_mtime fields (respectively, time of last access, time of last status change, and time of last modification; see stat(2)) are set to the current time, and so are the st_ctime and st_mtime fields of the parent directory. Otherwise, if the file is modified because of the O_TRUNC flag, its st_ctime and st_mtime fields are set to the current time.
错误
Error Code | 描述 |
EACCES | The requested access to the file is not allowed, or search permission is denied for one of the directories in the path prefix of pathname, or the file did not exist yet and write access to the parent directory is not allowed. (See also path_resolution(2).) |
EEXIST | pathname already exists and O_CREAT and O_EXCL were used. |
EFAULT | pathname points outside your accessible address space. |
EISDIR | pathname refers to a directory and the access requested involved writing (that is, O_WRONLY or O_RDWR is set). |
ELOOP | Too many symbolic links were encountered in resolvingpathname, or O_NOFOLLOW was specified but pathname was a symbolic link. |
EMFILE | The process already has the maximum number of files open. |
ENAMETOOLONG | pathname was too long. |
ENFILE | The system limit on the total number of open files has been reached. |
ENODEV | pathname refers to a device special file and no corresponding device exists. (This is a Linux kernel bug; in this situation ENXIO must be returned.) |
ENOENT | O_CREAT is not set and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | pathname was to be created but the device containingpathname has no room for the new file. |
ENOTDIR | A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory. |
ENXIO | O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no process has the file open for reading. Or, the file is a device special file and no corresponding device exists. |
EOVERFLOW | pathname refers to a regular file, too large to be opened; see O_LARGEFILE above. |
EPERM | The O_NOATIME flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged (CAP_FOWNER). |
EROFS | pathname refers to a file on a read-only filesystem and write access was requested. |
ETXTBSY | pathname refers to an executable image which is currently being executed and write access was requested. |
EWOULDBLOCK | The O_NONBLOCK flag was specified, and an incompatible lease was held on the file (see fcntl(2)). |
注意
Under Linux, the O_NONBLOCK flag indicates that one wants to open but does not necessarily have the intention to read or write. This is typically used to open devices in order to get a file descriptor for use with ioctl(2).
遵循于
SVr4, 4.3BSD, POSIX.1-2001. The O_NOATIME, O_NOFOLLOW, and O_DIRECTORYflags are Linux-specific. One may have to define the _GNU_SOURCE macro to get their definitions.
The (undefined) effect of O_RDONLY | O_TRUNC varies among implementations. On many systems the file is actually truncated.
The O_DIRECT flag was introduced in SGI IRIX, where it has alignment restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2) call to query appropriate alignments, and sizes.
FreeBSD 4.x introduced a flag of same name, but without alignment restrictions. Support was added under Linux in kernel version 2.4.10. Older Linux kernels simply ignore this flag. One may have to define the _GNU_SOURCE macro to get its definition.
BUGS
"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." — Linus
Currently, it is not possible to enable signal-driven I/O by specifying O_ASYNC when calling open(); use fcntl(2) to enable this flag.
限制
There are many infelicities in the protocol underlying NFS, affecting amongst others O_SYNC and O_NDELAY.
POSIX provides for three different variants of synchronised I/O, corresponding to the flags O_SYNC, O_DSYNC and O_RSYNC. Currently (2.1.130) these are all synonymous under Linux.
另请参阅
- close (2)
- dup (2)
- fcntl (2)
- link (2)
- lseek (2)
- mknod (2)
- mount (2)
- mmap (2)
- openat (2)
- path_resolution (2)
- read (2)
- socket (2)
- stat (2)
- umask (2)
- unlink (2)
- write (2)
dup2()函数
dup, dup2 - 复制一个文件描述符
内容简介
#include
int dup(int oldfd);
int dup2(int oldfd, int newfd);
描述
dup() 和 dup2() 创建副本文件描述符oldfd。
After a successful return from dup() or dup2(),the old and new file descriptors may be used interchangeably. They refer to the same open file description (see open(2)) and thus share file offset and file status flags; for example, if the file offset is modified by using lseek(2) on one of the descriptors, the offset is also changed for the other.
The two descriptors do not share file descriptor flags (the close-on-exec flag). The close-on-exec flag (FD_CLOEXEC; see fcntl(2)) for the duplicate descriptor is off.
dup() uses the lowest-numbered unused descriptor for the new descriptor.
dup2() makes newfd be the copy of oldfd, closing newfd first if necessary.
返回值
dup() and dup2() return the new descriptor, or -1 if an error occurred (in which case,errno is set appropriately).
错误
标签 | 描述 |
EBADF | oldfd isn’t an open file descriptor, or newfd is out of the allowed range for file descriptors. |
EBUSY | (Linux only) This may be returned by dup2() during a race condition with open() and dup(). |
EINTR | The dup2() call was interrupted by a signal. |
EMFILE | The process already has the maximum number of file descriptors open and tried to open a new one. |
WARNINGS
The error returned by dup2() is different from that returned by fcntl(..., F_DUPFD, ...)when newfd is out of range. On some systems dup2() also sometimes returns EINVAL like F_DUPFD.
If newfd was open, any errors that would have been reported at close() time, are lost. A careful programmer will not use dup2() without closing newfd first.
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
另请参阅
dup()函数
dup, dup2 - 复制一个文件描述符
内容简介
#include
int dup(int oldfd);
int dup2(int oldfd, int newfd);
描述
dup() 和 dup2() 创建文件描述符的副本 oldfd.
After a successful return from dup() or dup2(),the old and new file descriptors may be used interchangeably. They refer to the same open file description (see open(2)) and thus share file offset and file status flags; for example, if the file offset is modified by using lseek(2) on one of the descriptors, the offset is also changed for the other.
The two descriptors do not share file descriptor flags (the close-on-exec flag). The close-on-exec flag (FD_CLOEXEC; see fcntl(2)) for the duplicate descriptor is off.
dup() 使用编号最小的未用描述符的新的描述符。
dup2() 使得newfd是oldfd副本,先关闭newfd,如果必要的话。
返回值
dup() and dup2() return the new descriptor, or -1 if an error occurred (in which case,errno is set appropriately).
错误
标签 | 描述 |
EBADF | oldfd isn’t an open file descriptor, or newfd is out of the allowed range for file descriptors. |
EBUSY | (Linux only) This may be returned by dup2() during a race condition with open() and dup(). |
EINTR | The dup2() call was interrupted by a signal. |
EMFILE | The process already has the maximum number of file descriptors open and tried to open a new one. |
警告
The error returned by dup2() is different from that returned by fcntl(..., F_DUPFD, ...)when newfd is out of range. On some systems dup2() also sometimes returns EINVALlike F_DUPFD.
If newfd was open, any errors that would have been reported at close() time, are lost. A careful programmer will not use dup2() without closing newfd first.
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
另请参阅
epoll_create()函数
epoll_create - 打开一个epoll的文件描述符
内容简介
#include
int epoll_create(int size)
描述
Open an epoll file descriptor by requesting the kernel allocate an event backing store dimensioned for size descriptors. The size is not the maximum size of the backing store but just a hint to the kernel about how to dimension internal structures. The returned file descriptor will be used for all the subsequent calls to the epoll interface. The file descriptor returned by epoll_create(2) must be closed by using close(2).
返回值
When successful, epoll_create(2) returns a non-negative integer identifying the descriptor. When an error occurs, epoll_create(2) returns -1 and errno is set appropriately.
错误
Error Code | 描述 |
EINVAL | size is not positive. |
ENFILE | The system limit on the total number of open files has been reached. |
ENOMEM | There was insufficient memory to create the kernel object. |
遵循于
epoll_create(2) 在Linux内核2.5.44 推出了一个新的API。该接口应该由Linux kernel 2.5.66。
另请参阅
epoll_ctl()函数
epoll_ctl - 一个epoll的描述符的控制接口
内容简介
#include
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
描述
Control an epoll descriptor, epfd, by requesting that the operation op be performed on the target file descriptor, fd. The event describes the object linked to the file descriptorfd. The struct epoll_event is defined as :
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
该事件成员是位集由使用下列可用的事件类型:
错误码 | 描述 | ||||||||
EPOLLIN | The associated file is available forread(2) operations. | ||||||||
EPOLLOUT | The associated file is available for write(2) operations. | ||||||||
EPOLLRDHUP | Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.) | ||||||||
EPOLLPRI | There is urgent data available for read(2) operations. | ||||||||
EPOLLERR | Error condition happened on the associated file descriptor.epoll_wait(2) will always wait for this event; it is not necessary to set it in events. | ||||||||
EPOLLHUP | Hang up happened on the associated file descriptor.epoll_wait(2) will always wait for this event; it is not necessary to set it in events. | ||||||||
EPOLLET | Sets the Edge Triggered behaviour for the associated file descriptor. The default behaviour for epoll is Level Triggered. See epoll(7) for more detailed information about Edge and Level Triggered event distribution architectures. | ||||||||
EPOLLONESHOT(since kernel 2.6.2) | Sets the one-shot behaviour for the associated file descriptor. This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must callepoll_ctl(2) with EPOLL_CTL_MOD to re-enable the file descriptor with a new event mask. | ||||||||
The epoll interface supports all file descriptors that support poll(2). Valid values for the op parameter are : | |||||||||
|
返回值
When successful, epoll_ctl(2) returns zero. When an error occurs, epoll_ctl(2) returns -1 and errno is set appropriately.
错误
错误码 | 描述 |
EBADF | epfd or fd is not a valid file descriptor. |
EEXIST | op was EPOLL_CTL_ADD, and the supplied file descriptor fd is already in epfd. |
EINVAL | epfd is not an epoll file descriptor, or fd is the same as epfd, or the requested operation op is not supported by this interface. |
ENOENT | op was EPOLL_CTL_MOD or EPOLL_CTL_DEL, and fd is not inepfd. |
ENOMEM | There was insufficient memory to handle the requested opcontrol operation. |
EPERM | The target file fd does not support epoll. |
遵循于
epoll_ctl(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
BUG
In kernel versions before 2.6.9, the EPOLL_CTL_DEL operation required a non-NULL pointer in event, even though this argument is ignored. Since kernel 2.6.9, event can be specified as NULL when using EPOLL_CTL_DEL.
另请参阅
epoll_wait()函数
epoll_wait - 等待在 epoll 文件描述符的I/O事件
内容简介
#include
int epoll_wait(int epfd, struct epoll_event * events,
int maxevents, int timeout);
描述
Wait for events on the epoll file descriptor epfd for a maximum time of timeoutmilliseconds. The memory area pointed to by events will contain the events that will be available for the caller. Up to maxevents are returned by epoll_wait(2).
The maxevents parameter must be greater than zero. Specifying a timeout of -1 makesepoll_wait(2) wait indefinitely, while specifying a timeout equal to zero makesepoll_wait(2) to return immediately even if no events are available (return code equal to zero).
struct epoll_event 的定义如下 :
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
The data of each returned structure will contain the same data the user set with a epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will contain the returned event bit field.
返回值
When successful, epoll_wait(2) returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested timeoutmilliseconds. When an error occurs, epoll_wait(2) returns -1 and errno is set appropriately.
错误
标签 | 描述 |
EBADF | epfd is not a valid file descriptor. |
EFAULT | The memory area pointed to by events is not accessible with write permissions. |
EINTR | The call was interrupted by a signal handler before any of the requested events occurred or the timeout expired. |
EINVAL | epfd is not an epoll file descriptor, or maxevents is less than or equal to zero. |
遵循于
epoll_wait(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
另请参阅
execve()函数
execve - 执行程序
内容简介
#include
int execve(const char *filename, char *const argv[],
char *const envp[]);
描述
execve() executes the program pointed to by filename. filename must be either a binary executable, or a script starting with a line of the form "#! interpreter [arg]". In the latter case, the interpreter must be a valid pathname for an executable which is not itself a script, which will be invoked as interpreter [arg] filename.
argv is an array of argument strings passed to the new program. envp is an array of strings, conventionally of the form key=value, which are passed as environment to the new program. Both argv and envp must be terminated by a null pointer. The argument vector and environment can be accessed by the called program’s main function, when it is defined as int main(int argc, char *argv[], char *envp[]).
execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of the program loaded. The program invoked inherits the calling process’s PID, and any open file descriptors that are not set to close-on-exec. Signals pending on the calling process are cleared. Any signals set to be caught by the calling process are reset to their default behaviour. The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.
If the current program is being ptraced, a SIGTRAP is sent to it after a successfulexecve().
If the set-user-ID bit is set on the program file pointed to by filename, and the calling process is not being ptraced, then the effective user ID of the calling process is changed to that of the owner of the program file. i Similarly, when the set-group-ID bit of the program file is set the effective group ID of the calling process is set to the group of the program file.
The effective user ID of the process is copied to the saved set-user-ID; similarly, the effective group ID is copied to the saved set-group-ID. This copying takes place after any effective ID changes that occur because of the set-user-ID and set-group-ID permission bits.
If the executable is an a.out dynamically-linked binary executable containing shared-library stubs, the Linux dynamic linker ld.so(8) is called at the start of execution to bring needed shared libraries into memory and link the executable with them.
If the executable is a dynamically-linked ELF executable, the interpreter named in the PT_INTERP segment is used to load the needed shared libraries. This interpreter is typically /lib/ld-linux.so.1 for binaries linked with the Linux libc version 5, or /lib/ld-linux.so.2 for binaries linked with the GNU libc version 2.
返回值
On success, execve() does not return, on error -1 is returned, and errno is set appropriately.
错误
错误码 | 描述 |
E2BIG | The total number of bytes in the environment (envp) and argument list (argv) is too large. |
EACCES | Search permission is denied on a component of the path prefix of filename or the name of a script interpreter. (See alsopath_resolution(2).) |
EACCES | The file or a script interpreter is not a regular file. |
EACCES | Execute permission is denied for the file or a script or ELF interpreter. |
EACCES | The file system is mounted noexec. |
EFAULT | filename points outside your accessible address space. |
EINVAL | An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter). |
EIO | An I/O error occurred. |
EISDIR | An ELF interpreter was a directory. |
ELIBBAD | An ELF interpreter was not in a recognised format. |
ELOOP | Too many symbolic links were encountered in resolving filenameor the name of a script or ELF interpreter. |
EMFILE | The process has the maximum number of files open. |
ENAMETOOLONG | filename is too long. |
ENFILE | The system limit on the total number of open files has been reached. |
ENOENT | The file filename or a script or ELF interpreter does not exist, or a shared library needed for file or interpreter cannot be found. |
ENOEXEC | An executable is not in a recognised format, is for the wrong architecture, or has some other format error that means it cannot be executed. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix of filename or a script or ELF interpreter is not a directory. |
EPERM | The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set. |
EPERM | The process is being traced, the user is not the superuser and the file has an SUID or SGID bit set. |
ETXTBSY | Executable was open for writing by one or more processes. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001. POSIX.1-2001 does not document the #! 行为,但在其他方面兼容。
注意
SUID and SGID processes can not be ptrace()d. Linux ignores the SUID and SGID bits on scripts.
The result of mounting a filesystem nosuid vary between Linux kernel versions: some will refuse execution of SUID/SGID executables when this would give the user powers she did not have already (and return EPERM), some will just ignore the SUID/SGID bits and exec() successfully.
A maximum line length of 127 characters is allowed for the first line in a #! executable shell script.
历史
With Unix V6 the argument list of an exec() call was ended by 0, while the argument list of main was ended by -1. Thus, this argument list was not directly usable in a furtherexec() call. Since Unix V7 both are NULL.
另请参阅
exit_group函数
exit_group - 退出所有线程在一个进程
内容简介
#include
void exit_group(int status);
描述
This system call is equivalent to exit(2) except that it terminates not only the present thread, but all threads in the current thread group.
返回值
这个系统调用无返回。
历史
This call is present since Linux 2.5.35.
遵循于
这个调用是Linux特有的。
另请参阅
_exit()函数
内容简介
#include <unistd.h>
void _exit(int status);
#include <stdlib.h>
void _Exit(int status);
描述
The function _exit() terminates the calling process "immediately". Any open file descriptors belonging to the process are closed; any children of the process are inherited by process 1, init, and the process’s parent is sent a SIGCHLD signal.
The value status is returned to the parent process as the process’s exit status, and can be collected using one of the wait() family of calls.
_Exit() 函数等同于 _exit().
返回值
些函数没有返回值
遵循于
SVr4, POSIX.1-2001, 4.3BSD. The function _Exit() was introduced by C99.
注意
For a discussion on the effects of an exit, the transmission of exit status, zombie processes, signals sent, etc., see exit(3).
The function _exit() is like exit(), but does not call any functions registered with atexit() or on_exit(). Whether it flushes standard I/O buffers and removes temporary files created with tmpfile(3) is implementation dependent. On the other hand, _exit() does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish. If the delay is undesired, it may be useful to call functions like tcflush() before calling _exit(). Whether any pending I/O is cancelled, and which pending I/O may be cancelled upon _exit(), is implementation-dependent.
另请参阅
exit()函数
_exit, _Exit - 终止当前进程
内容简介
#include
void _exit(int status);
#include
void _Exit(int status);
描述
The function _exit() terminates the calling process "immediately". Any open file descriptors belonging to the process are closed; any children of the process are inherited by process 1, init, and the process’s parent is sent a SIGCHLD signal.
The value status is returned to the parent process as the process’s exit status, and can be collected using one of the wait() family of calls.
The function _Exit() is equivalent to _exit().
返回值
These functions do not return.
遵循于
SVr4, POSIX.1-2001, 4.3BSD. The function _Exit() was introduced by C99.
注意
For a discussion on the effects of an exit, the transmission of exit status, zombie processes, signals sent, etc., see exit(3).
The function _exit() is like exit(), but does not call any functions registered with atexit() or on_exit(). Whether it flushes standard I/O buffers and removes temporary files created with tmpfile(3) is implementation dependent. On the other hand, _exit() does close open file descriptors, and this may cause an unknown delay, waiting for pending output to finish. If the delay is undesired, it may be useful to call functions like tcflush() before calling _exit(). Whether any pending I/O is cancelled, and which pending I/O may be cancelled upon _exit(), is implementation-dependent.
另请参阅
faccessat()函数
faccessat - 文件相对于一个目录文件描述符的更改权限
内容简介
#include
<unistd.h>
int faccessat(int dirfd, const char *path, int
mode ", int " flags );
描述
The faccessat() system call operates in exactly the same way as access(2), except for the differences described in this manual page.
If the pathname given in path is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by access(2) for a relative pathname).
If the pathname given in path is relative and dirfd is the special value AT_FDCWD, thenpath is interpreted relative to the current working directory of the calling process (likeaccess(2)).
If the pathname given in path is absolute, then dirfd is ignored.
flags is constructed by ORing together zero or more of the following values:
Code | 描述 |
AT_EACCESS | |
Perform access checks using the effective user and group IDs. By default, faccessat() uses the effective IDs (like access(2)). | |
AT_SYMLINK_NOFOLLOW | |
If path is a symbolic link, do not dereference it: instead return information about the link itself. |
返回值
On success, faccessat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for access(2) can also occur for faccessat(). The following additional errors can occur for faccessat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
EINVAL | Invalid flag specified in flags. |
ENOTDIR | path is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
See openat(2) for an explanation of the need for faccessat().
遵循于
这个系统调用是非标准的,但建议列入POSIX.1将来的修订版。
glibc的注意事项
The AT_EACCESS and AT_SYMLINK_NOFOLLOW flags are actually implemented within the glibc wrapper function for faccessat(). If either of these flags are specified, then the wrapper function employs fstatat(2) to determine access permissions.
版本
faccessat() 加入到Linux 的 kernel 2.6.16.
另请参阅
fattach()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用
描述
未实现系统调用在 Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
fchdir()函数
chdir, fchdir - 改变工作目录
内容简介
#include
int chdir(const char *path);
int fchdir(int fd);
描述
chdir() changes the current working directory to that specified in path. fchdir() is identical to chdir(); the only difference is that the directory is given as an open file descriptor.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Depending on the file system, other errors can be returned. The more general errors for chdir() are listed below:
Error Code | 描述 |
EACCES | Search permission is denied for one of the directories in the path prefix of path. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of path is not a directory. |
The general errors for fchdir() are listed below: | |
EACCES | Search permission was denied on the directory open on fd. |
EBADF | fd is not a valid file descriptor. |
注意
A child process created via fork(2) inherits its parent’s current working directory. The current working directory is left unchanged by execve(2).
The prototype for fchdir() is only available if _BSD_SOURCE is defined, or_XOPEN_SOURCE is defined with the value 500.
遵循于
SVr4, 4.4BSD, POSIX.1-2001.
另请参阅
fchmodat()函数
fchmodat - 文件相对于一个目录文件描述符的更改权限
内容简介
#include
int fchmodat(int dirfd, const char *path, mode_t
mode ", int " flags );
描述
The fchmodat() system call operates in exactly the same way as chmod(2), except for the differences described in this manual page.
If the pathname given in path is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by chmod(2) for a relative pathname).
If the pathname given in path is relative and dirfd is the special value AT_FDCWD, thenpath is interpreted relative to the current working directory of the calling process (likechmod(2)).
If the pathname given in path is absolute, then dirfd is ignored.
flags can either be 0, or include the following flag:
标签 | 描述 |
AT_SYMLINK_NOFOLLOW | If path is a symbolic link, do not dereference it: instead operate on the link itself. This flag is not currently implemented. |
返回值
On success, fchmodat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for chmod(2) can also occur for fchmodat(). The following additional errors can occur for fchmodat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
EINVAL | Invalid flag specified in flags. |
ENOTDIR | path is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
ENOTSUP | flags specified AT_SYMLINK_NOFOLLOW, which is not supported. |
注意
See openat(2) for an explanation of the need for fchmodat().
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1.
VERSIONS
fchmodat() was added to Linux in kernel 2.6.16.
另请参阅
fchmod()函数
chmod, fchmod - 修改一个文件权限
内容简介
#include <sys/types.h>
#include <sys/stat.h>
int chmod(const char *
path
, mode_t
mode
);
int fchmod(int
fildes
, mode_t
mode
);
描述
The mode of the file given by path or referenced by fildes is changed.
Modes are specified by or’ing the following:
Mode | 描述 |
S_ISUID | 04000 set user ID on execution |
S_ISGID | 02000 set group ID on execution |
S_ISVTX | 01000 sticky bit |
S_IRUSR | 00400 read by owner |
S_IWUSR | 00200 write by owner |
S_IXUSR | 00100 execute/search by owner |
S_IRGRP | 00040 read by group |
S_IWGRP | 00020 write by group |
S_IXGRP | 00010 execute/search by group |
S_IROTH | 00004 read by others |
S_IWOTH | 00002 write by others |
S_IXOTH | 00001 execute/search by others |
The effective UID of the calling process must match the owner of the file, or the process must be privileged (Linux: it must have the CAP_FOWNER capability).
If the calling process is not privileged (Linux: does not have the CAP_FSETIDcapability), and the group of the file does not match the effective group ID of the process or one of its supplementary group IDs, the S_ISGID bit will be turned off, but this will not cause an error to be returned.
As a security measure, depending on the file system, the set-user-ID and set-group-ID execution bits may be turned off if a file is written. (On Linux this occurs if the writing process does not have the CAP_FSETID capability.) On some file systems, only the superuser can set the sticky bit, which may have a special meaning. For the sticky bit, and for set-user-ID and set-group-ID bits on directories, see stat(2).
On NFS file systems, restricting the permissions will immediately influence already open files, because the access control is done on the server, but open files are maintained by the client. Widening the permissions may be delayed for other clients if attribute caching is enabled on them.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Depending on the file system, other errors can be returned. The more general errors forchmod() are listed below:
Error Code | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix is not a directory. |
EPERM | The effective UID does not match the owner of the file, and the process is not privileged (Linux: it does not have theCAP_FOWNER capability). |
EROFS | The named file resides on a read-only file system. |
The general errors for fchmod() are listed below: | |
EBADF | The file descriptor fildes is not valid. |
EIO | See above. |
EPERM | See above. |
EROFS | See above. |
遵循于
4.4BSD, SVr4, POSIX.1-2001.
另请参阅
fchownat()函数
fchownat - 改变文件的一个相对的所有权到一个目录文件描述符
内容简介
#include
<unistd.h>
int fchownat(int dirfd, const char *path,
uid_t owner, gid_t group, int flags);
描述
The fchownat() system call operates in exactly the same way as chown(2), except for the differences described in this manual page.
If the pathname given in path is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by chown(2) for a relative pathname).
If the pathname given in path is relative and dirfd is the special value AT_FDCWD, thenpath is interpreted relative to the current working directory of the calling process (likechown(2)).
If the pathname given in path is absolute, then dirfd is ignored.
flags can either be 0, or include the following flag:
标签 | 描述 |
AT_SYMLINK_NOFOLLOW | If path is a symbolic link, do not dereference it: instead operate on the link itself, like lchown(2). (By default,fchownat() dereferences symbolic links, like chown(2).) |
返回值
On success, fchownat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for chown(2) can also occur for fchownat(). The following additional errors can occur for fchownat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
EINVAL | Invalid flag specified in flags. |
ENOTDIR | path is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
See openat(2) for an explanation of the need for fchownat().
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1. A similar system call exists on Solaris.
VERSIONS
fchownat() was added to Linux in kernel 2.6.16.
另请参阅
fchown()函数
chown, fchown, lchown - 更改文件的所有权
内容简介
#include <sys/types.h>
#include <unistd.h>
int chown(const char *
path
, uid_t
owner
, gid_t
group
);
int fchown(int
fd
, uid_t
owner
, gid_t
group
);
int lchown(const char *
path
, uid_t
owner
, gid_t
group
);
描述
These system calls change the owner and group of the file specified by path or by fd. Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. A privileged process (Linux: with CAP_CHOWN) may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed.
When the owner or group of an executable file are changed by a non-superuser, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown(); the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown().
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Depending on the file system, other errors can be returned. The more general errors forchown() are listed below.
标签 | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path points outside your accessible address space. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | path is too long. |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | A component of the path prefix is not a directory. |
EPERM | The calling process did not have the required permissions (see above) to change owner and/or group. |
EROFS | The named file resides on a read-only file system. |
The general errors for fchown() are listed below: | |
EBADF | The descriptor is not valid. |
EIO | A low-level I/O error occurred while modifying the inode. |
ENOENT | See above. |
EPERM | See above. |
EROFS | See above. |
注意
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown() did not follow symbolic links. Since Linux 2.1.81, chown() does follow symbolic links, and there is a new system call lchown() that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown()) has got the same syscall number, and chown() got the newly introduced number.
The prototype for fchown() is only available if _BSD_SOURCE is defined.
遵循于
4.4BSD, SVr4, POSIX.1-2001.
The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files).
限制
The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
另请参阅
fcntl()函数
fcntl - 操作文件描述符
内容简介
#include <unistd.h>
#include <fcntl.h>
int fcntl(int
fd
, int
cmd
);
int fcntl(int
fd
, int
cmd
, long
arg
);
int fcntl(int
fd
, int
cmd
, struct flock *
lock
);
描述
fcntl() 执行下述就开文件描述符fd的操作之一。该操作是由 cmd 确定。
复制一个文件描述符
标签 | 描述 |
F_DUPFD | Find the lowest numbered available file descriptor greater than or equal to arg and make it be a copy of fd. This is different fromdup2(2) which uses exactly the descriptor specified. On success, the new descriptor is returned. See dup(2) for further details. |
文件描述符标志
The following commands manipulate the flags associated with a file descriptor. Currently, only one such flag is defined: FD_CLOEXEC, the close-on-exec flag. If theFD_CLOEXEC bit is 0, the file descriptor will remain open across an execve(2), otherwise it will be closed.
标签 | 描述 |
F_GETFD | Read the file descriptor flags. |
F_SETFD | Set the file descriptor flags to the value specified by arg. |
文件状态标志
Each open file description has certain associated status flags, initialized by open(2) and possibly modified by fcntl(2). Duplicated file descriptors (made with dup(),fcntl(F_DUPFD), fork(), etc.) refer to the same open file description, and thus share the same file status flags.
The file status flags and their semantics are described in open(2).
标签 | 描述 |
F_GETFL | Read the file status flags. |
F_SETFL | Set the file status flags to the value specified by arg. File access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in argare ignored. On Linux this command can only change theO_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, andO_NONBLOCK flags. |
咨询锁
F_GETLK, F_SETLK and F_SETLKW are used to acquire, release, and test for the existence of record locks (also known as file-segment or file-region locks). The third argument lock is a pointer to a structure that has at least the following fields (in unspecified order).
struct flock {
...
short l_type; /* Type of lock: F_RDLCK,
F_WRLCK, F_UNLCK */
short l_whence; /* How to interpret l_start:
SEEK_SET, SEEK_CUR, SEEK_END */
off_t l_start; /* Starting offset for lock */
off_t l_len; /* Number of bytes to lock */
pid_t l_pid; /* PID of process blocking our lock
(F_GETLK only) */
...
};
The l_whence, l_start, and l_len fields of this structure specify the range of bytes we wish to lock. l_start is the starting offset for the lock, and is interpreted relative to either: the start of the file (if l_whence is SEEK_SET); the current file offset (if l_whenceis SEEK_CUR); or the end of the file (if l_whence is SEEK_END). In the final two cases,l_start can be a negative number provided the offset does not lie before the start of the file. l_len is a non-negative integer (but see the NOTES below) specifying the number of bytes to be locked. Bytes past the end of the file may be locked, but not bytes before the start of the file. Specifying 0 for l_len has the special meaning: lock all bytes starting at the location specified by l_whence and l_start through to the end of file, no matter how large the file grows.
The l_type field can be used to place a read (F_RDLCK) or a write (F_WRLCK) lock on a file. Any number of processes may hold a read lock (shared lock) on a file region, but only one process may hold a write lock (exclusive lock). An exclusive lock excludes all other locks, both shared and exclusive. A single process can hold only one type of lock on a file region; if a new lock is applied to an already-locked region, then the existing lock is converted to the new lock type. (Such conversions may involve splitting, shrinking, or coalescing with an existing lock if the byte range specified by the new lock does not precisely coincide with the range of the existing lock.)
标签 | 描述 |
F_SETLK | Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock (when l_type is F_UNLCK) on the bytes specified by the l_whence, l_start, and l_len fields of lock. If a conflicting lock is held by another process, this call returns -1 and setserrno to EACCES or EAGAIN. |
F_SETLKW | As for F_SETLK, but if a conflicting lock is held on the file, then wait for that lock to be released. If a signal is caught while waiting, then the call is interrupted and (after the signal handler has returned) returns immediately (with return value -1 anderrno set to EINTR). |
F_GETLK | On input to this call, lock describes a lock we would like to place on the file. If the lock could be placed, fcntl() does not actually place it, but returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged. If one or more incompatible locks would prevent this lock being placed, then fcntl() returns details about one of these locks in thel_type, l_whence, l_start, and l_len fields of lock and sets l_pid to be the PID of the process holding that lock. |
In order to place a read lock, fd must be open for reading. In order to place a write lock,fd must be open for writing. To place both types of lock, open a file read-write.
As well as being removed by an explicit F_UNLCK, record locks are automatically released when the process terminates or if it closes any file descriptor referring to a file on which locks are held. This is bad: it means that a process can lose the locks on a file like /etc/passwd or /etc/mtab when for some reason a library function decides to open, read and close it.
Record locks are not inherited by a child created via fork(2), but are preserved across an execve(2).
Because of the buffering performed by the stdio(3) library, the use of record locking with routines in that package should be avoided; use read(2) and write(2) instead.
强制锁
(Non-POSIX.) The above record locks may be either advisory or mandatory, and are advisory by default.
Advisory locks are not enforced and are useful only between cooperating processes.
Mandatory locks are enforced for all processes. If a process tries to perform an incompatible access (e.g., read(2) or write(2)) on a file region that has an incompatible mandatory lock, then the result depends upon whether the O_NONBLOCK flag is enabled for its open file description. If the O_NONBLOCK flag is not enabled, then system call is blocked until the lock is removed or converted to a mode that is compatible with the access. If the O_NONBLOCK flag is enabled, then the system call fails with the error EAGAIN or EWOULDBLOCK.
To make use of mandatory locks, mandatory locking must be enabled both on the file system that contains the file to be locked, and on the file itself. Mandatory locking is enabled on a file system using the "-o mand" option to mount(8), or theMS_MANDLOCK flag for mount(2). Mandatory locking is enabled on a file by disabling group execute permission on the file and enabling the set-group-ID permission bit (seechmod(1) and chmod(2)).
管理信号
F_GETOWN, F_SETOWN, F_GETSIG and F_SETSIG are used to manage I/O availability signals:
标签 | 描述 |
F_GETOWN | Get the process ID or process group currently receiving SIGIO and SIGURG signals for events on file descriptor fd. Process IDs are returned as positive values; process group IDs are returned as negative values (but see BUGS below). |
F_SETOWN | Set the process ID or process group ID that will receive SIGIO and SIGURG signals for events on file descriptor fd. A process ID is specified as a positive value; a process group ID is specified as a negative value. Most commonly, the calling process specifies itself as the owner (that is, arg is specified asgetpid()). If you set the O_ASYNC status flag on a file descriptor (either by providing this flag with the open(2) call, or by using theF_SETFL command of fcntl()), a SIGIO signal is sent whenever input or output becomes possible on that file descriptor.F_SETSIG can be used to obtain delivery of a signal other than SIGIO. If this permission check fails, then the signal is silently discarded. Sending a signal to the owner process (group) specified byF_SETOWN is subject to the same permissions checks as are described for kill(2), where the sending process is the one that employs F_SETOWN (but see BUGS below). If the file descriptor fd refers to a socket, F_SETOWN also selects the recipient of SIGURG signals that are delivered when out-of-band data arrives on that socket. (SIGURG is sent in any situation where select(2) would report the socket as having an "exceptional condition".) If a non-zero value is given to F_SETSIG in a multi-threaded process running with a threading library that supports thread groups (e.g., NPTL), then a positive value given to F_SETOWNhas a different meaning: instead of being a process ID identifying a whole process, it is a thread ID identifying a specific thread within a process. Consequently, it may be necessary to pass F_SETOWN the result of gettid() instead ofgetpid() to get sensible results when F_SETSIG is used. (In current Linux threading implementations, a main thread’s thread ID is the same as its process ID. This means that a single-threaded program can equally use gettid() or getpid() in this scenario.) Note, however, that the statements in this paragraph do not apply to the SIGURG signal generated for out-of-band data on a socket: this signal is always sent to either a process or a process group, depending on the value given toF_SETOWN. Note also that Linux imposes a limit on the number of real-time signals that may be queued to a process (seegetrlimit(2) and signal(7)) and if this limit is reached, then the kernel reverts to delivering SIGIO, and this signal is delivered to the entire process rather than to a specific thread. |
F_GETSIG | Get the signal sent when input or output becomes possible. A value of zero means SIGIO is sent. Any other value (including SIGIO) is the signal sent instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO. |
F_SETSIG | Sets the signal sent when input or output becomes possible. A value of zero means to send the default SIGIO signal. Any other value (including SIGIO) is the signal to send instead, and in this case additional info is available to the signal handler if installed with SA_SIGINFO. Additionally, passing a non-zero value to F_SETSIG changes the signal recipient from a whole process to a specific thread within a process. See the description of F_SETOWN for more details. By using F_SETSIG with a non-zero value, and setting SA_SIGINFO for the signal handler (see sigaction(2)), extra information about I/O events is passed to the handler in asiginfo_t structure. If the si_code field indicates the source is SI_SIGIO, the si_fd field gives the file descriptor associated with the event. Otherwise, there is no indication which file descriptors are pending, and you should use the usual mechanisms (select(2), poll(2), read(2) with O_NONBLOCK set etc.) to determine which file descriptors are available for I/O. By selecting a real time signal (value >= SIGRTMIN), multiple I/O events may be queued using the same signal numbers. (Queuing is dependent on available memory). Extra information is available if SA_SIGINFO is set for the signal handler, as above. |
Using these mechanisms, a program can implement fully asynchronous I/O without using select(2) or poll(2) most of the time.
The use of O_ASYNC, F_GETOWN, F_SETOWN is specific to BSD and Linux. F_GETSIGand F_SETSIG are Linux-specific. POSIX has asynchronous I/O and the aio_sigeventstructure to achieve similar things; these are also available in Linux as part of the GNU C Library (Glibc).
租约
F_SETLEASE and F_GETLEASE (Linux 2.4 onwards) are used (respectively) to establish and retrieve the current setting of the calling process’s lease on the file referred to byfd. A file lease provides a mechanism whereby the process holding the lease (the "lease holder") is notified (via delivery of a signal) when a process (the "lease breaker") tries to open(2) or truncate(2) that file.
标签 | 描述 | ||||||||
F_SETLEASE | Set or remove a file lease according to which of the following values is specified in the integer arg:
| ||||||||
A process may hold only one type of lease on a file. | |||||||||
Leases may only be taken out on regular files. An unprivileged process may only take out a lease on a file whose UID matches the file system UID of the process. A process with the CAP_LEASE capability may take out leases on arbitrary files. | |||||||||
F_GETLEASE | Indicates what type of lease we hold on the file referred to by fdby returning either F_RDLCK, F_WRLCK, or F_UNLCK,indicating, respectively, that the calling process holds a read, a write, or no lease on the file. (The third argument to fcntl() is omitted.) |
When a process (the "lease breaker") performs an open() or truncate() that conflicts with a lease established via F_SETLEASE, the system call is blocked by the kernel and the kernel notifies the lease holder by sending it a signal (SIGIO by default). The lease holder should respond to receipt of this signal by doing whatever cleanup is required in preparation for the file to be accessed by another process (e.g., flushing cached buffers) and then either remove or downgrade its lease. A lease is removed by performing anF_SETLEASE command specifying arg as F_UNLCK. If we currently hold a write lease on the file, and the lease breaker is opening the file for reading, then it is sufficient to downgrade the lease to a read lease. This is done by performing an F_SETLEASEcommand specifying arg as F_RDLCK.
If the lease holder fails to downgrade or remove the lease within the number of seconds specified in /proc/sys/fs/lease-break-time then the kernel forcibly removes or downgrades the lease holder’s lease.
Once the lease has been voluntarily or forcibly removed or downgraded, and assuming the lease breaker has not unblocked its system call, the kernel permits the lease breaker’s system call to proceed.
If the lease breaker’s blocked open() or truncate() is interrupted by a signal handler, then the system call fails with the error EINTR, but the other steps still occur as described above. If the lease breaker is killed by a signal while blocked in open() ortruncate(), then the other steps still occur as described above. If the lease breaker specifies the O_NONBLOCK flag when calling open(), then the call immediately fails with the error EWOULDBLOCK, but the other steps still occur as described above.
The default signal used to notify the lease holder is SIGIO, but this can be changed using the F_SETSIG command to fcntl(). If a F_SETSIG command is performed (even one specifying SIGIO), and the signal handler is established using SA_SIGINFO, then the handler will receive a siginfo_t structure as its second argument, and the si_fd field of this argument will hold the descriptor of the leased file that has been accessed by another process. (This is useful if the caller holds leases against multiple files).
文件和目录更改通知
标签 | 描述 | ||||||||||||||||||||||
F_NOTIFY | (Linux 2.4 onwards) Provide notification when the directory referred to by fd or any of the files that it contains is changed. The events to be notified are specified in arg, which is a bit mask specified by ORing together zero or more of the following bits:
(In order to obtain these definitions, the _GNU_SOURCE feature test macro must be defined.) Directory notifications are normally "one-shot", and the application must re-register to receive further notifications. Alternatively, if DN_MULTISHOT is included in arg, then notification will remain in effect until explicitly removed.v A series of F_NOTIFY requests is cumulative, with the events inarg being added to the set already monitored. To disable notification of all events, make an F_NOTIFY call specifying argas 0. Notification occurs via delivery of a signal. The default signal is SIGIO, but this can be changed using the F_SETSIG command to fcntl(). In the latter case, the signal handler receives asiginfo_t structure as its second argument (if the handler was established using SA_SIGINFO) and the si_fd field of this structure contains the file descriptor which generated the notification (useful when establishing notification on multiple directories). Especially when using DN_MULTISHOT, a real time signal should be used for notification, so that multiple notifications can be queued. NOTE: New applications should consider using the inotifyinterface (available since kernel 2.6.13), which provides a superior interface for obtaining notifications of file system events. See inotify(7). |
返回值
对于一个成功的调用,返回值取决于操作:
标签 | 描述 |
F_DUPFD | The new descriptor. |
F_GETFD | Value of flags. |
F_GETFL | Value of flags. |
F_GETOWN | Value of descriptor owner. |
F_GETSIG | Value of signal sent when read or write becomes possible, or zero for traditional SIGIO behaviour. |
All other commands | |
Zero. |
On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES or EAGAIN | Operation is prohibited by locks held by other processes. |
EAGAIN | The operation is prohibited because the file has been memory-mapped by another process. |
EBADF | fd is not an open file descriptor, or the command was F_SETLKor F_SETLKW and the file descriptor open mode doesn’t match with the type of lock requested. |
EDEADLK | It was detected that the specified F_SETLKW command would cause a deadlock. |
EFAULT | lock is outside your accessible address space. |
EINTR | For F_SETLKW, the command was interrupted by a signal. ForF_GETLK and F_SETLK, the command was interrupted by a signal before the lock was checked or acquired. Most likely when locking a remote file (e.g. locking over NFS), but can sometimes happen locally. |
EINVAL | For F_DUPFD, arg is negative or is greater than the maximum allowable value. For F_SETSIG, arg is not an allowable signal number. |
EMFILE | For F_DUPFD, the process already has the maximum number of file descriptors open. |
ENOLCK | Too many segment locks open, lock table is full, or a remote locking protocol failed (e.g. locking over NFS). |
EPERM | Attempted to clear the O_APPEND flag on a file that has the append-only attribute set. |
注意
The errors returned by dup2() are different from those returned by F_DUPFD.
Since kernel 2.0, there is no interaction between the types of lock placed by flock(2) and fcntl(2).
POSIX.1-2001 allows l_len to be negative. (And if it is, the interval described by the lock covers bytes l_start+l_len up to and including l_start-1.) This is supported by Linux since Linux 2.4.21 and 2.5.49.
Several systems have more fields in struct flock such as e.g. l_sysid. Clearly, l_pid alone is not going to be very useful if the process holding the lock may live on a different machine.
BUGS
A limitation of the Linux system call conventions on some architectures (notably x86) means that if a (negative) process group ID to be returned by F_GETOWN falls in the range -1 to -4095, then the return value is wrongly interpreted by glibc as an error in the system call; that is, the return value of fcntl() will be -1, and errno will contain the (positive) process group ID.
In Linux 2.4 and earlier, there is bug that can occur when an unprivileged process usesF_SETOWN to specify the owner of a socket file descriptor as a process (group) other than the caller. In this case, fcntl() can return -1 with errno set to EPERM, even when the owner process (group) is one that the caller has permission to send signals to. Despite this error return, the file descriptor owner is set, and signals will be sent to the owner.
遵循于
SVr4, 4.3BSD, POSIX.1-2001. Only the operations F_DUPFD, F_GETFD, F_SETFD, F_GETFL, F_SETFL, F_GETLK, F_SETLK, F_SETLKW, F_GETOWN, and F_SETOWN are specified in POSIX.1-2001.
F_GETSIG, F_SETSIG, F_NOTIFY, F_GETLEASE, and F_SETLEASE are Linux specific. (Define the _GNU_SOURCE macro to obtain these definitions.)
另请参阅
fdatasync()函数
fdatasync - 同步的核心与该数据在磁盘上的文件
内容简介
#include <unistd.h>
int fdatasync(int
fd
);
描述
fdatasync() flushes all data buffers of a file to disk (before the system call returns). It resembles fsync() but is not required to update the metadata such as access time.
Applications that access databases or log files often write a tiny data fragment (e.g., one line in a log file) and then call fsync() immediately in order to ensure that the written data is physically stored on the harddisk. Unfortunately, fsync() will always initiate two write operations: one for the newly written data and another one in order to update the modification time stored in the inode.
If the modification time is not a part of the transaction concept fdatasync() can be used to avoid unnecessary inode disk write operations.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Error Code | 描述 |
EBADF | fd is not a valid file descriptor open for writing. |
EIO | An error occurred during synchronization. |
EROFS, EINVAL | fd is bound to a special file which does not support synchronization. |
BUGS
Currently (Linux 2.2) fdatasync() is equivalent to fsync().
可用性
On POSIX systems on which fdatasync() is available, _POSIX_SYNCHRONIZED_IO is defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)
遵循于
POSIX.1-2001.
另请参阅
fdetach()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用。
描述
这些系统调用未实现在 Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
flock()函数
flock - 应用或删除上一个打开的文件的咨询锁
内容简介
#include <sys/file.h>
int flock(int
fd
, int
operation
);
描述
应用或删除由 fd 所指定的打开文件的咨询锁。参数操作是执行下列操作之一:
标签 | 描述 |
LOCK_SH | Place a shared lock. More than one process may hold a shared lock for a given file at a given time. |
LOCK_EX | Place an exclusive lock. Only one process may hold an exclusive lock for a given file at a given time. |
LOCK_UN | Remove an existing lock held by this process. |
A call to flock() may block if an incompatible lock is held by another process. To make a non-blocking request, include LOCK_NB (by ORing) with any of the above operations.
A single file may not simultaneously have both shared and exclusive locks.
Locks created by flock() are associated with an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed.
If a process uses open(2) (or similar) to obtain more than one descriptor for the same file, these descriptors are treated independently by flock(). An attempt to lock the file using one of these file descriptors may be denied by a lock that the calling process has already placed via another descriptor.
A process may only hold one type of lock (shared or exclusive) on a file. Subsequentflock() calls on an already locked file will convert an existing lock to the new lock mode.
Locks created by flock() are preserved across an execve(2).
A shared or exclusive lock can be placed on a file regardless of the mode in which the file was opened.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Error Code | 描述 |
EBADF | fd is not a not an open file descriptor. |
EINTR | While waiting to acquire a lock, the call was interrupted by delivery of a signal caught by a handler. |
EINVAL | operation is invalid. |
ENOLCK | The kernel ran out of memory for allocating lock records. |
EWOULDBLOCK | The file is locked and the LOCK_NB flag was selected. |
遵循于
4.4BSD (the flock(2) call first appeared in 4.2BSD). A version of flock(2), possibly implemented in terms of fcntl(2), appears on most Unices.
注意
flock(2) does not lock files over NFS. Use fcntl(2) instead: that does work over NFS, given a sufficiently recent version of Linux and a server which supports locking.
Since kernel 2.0, flock(2) is implemented as a system call in its own right rather than being emulated in the GNU C library as a call to fcntl(2). This yields true BSD semantics: there is no interaction between the types of lock placed by flock(2) and fcntl(2), andflock(2) does not detect deadlock.
flock(2) places advisory locks only; given suitable permissions on a file, a process is free to ignore the use of flock(2) and perform I/O on the file.
flock(2) and fcntl(2) locks have different semantics with respect to forked processes and dup(2). On systems that implement flock() using fcntl(), the semantics of flock() will be different from those described in this manual page.
Converting a lock (shared to exclusive, or vice versa) is not guaranteed to be atomic: the existing lock is first removed, and then a new lock is established. Between these two steps, a pending lock request by another process may be granted, with the result that the conversion either blocks, or fails if LOCK_NB was specified. (This is the original BSD behaviour, and occurs on many other implementations.)
另请参阅
fork()函数
fork - 创建一个子进程
内容简介
#include <sys/types.h>
#include <unistd.h>
pid_t fork(void);
描述
fork() creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.
Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent’s page tables, and to create a unique task structure for the child.
返回值
On success, the PID of the child process is returned in the parent’s thread of execution, and a 0 is returned in the child’s thread of execution. On failure, a -1 will be returned in the parent’s context, no child process will be created, and errno will be set appropriately.
错误
错误码 | 描述 |
EAGAIN | fork() cannot allocate sufficient memory to copy the parent’s page tables and allocate a task structure for the child. |
EAGAIN | It was not possible to create a new process because the caller’sRLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability. |
ENOMEM | fork() failed to allocate the necessary kernel structures because memory is tight. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
另请参阅
alloc_hugepages()函数
alloc_hugepages, free_hugepages - 分配或释放巨大的页面
内容简介
void *alloc_hugepages(int
key
, void *
addr
, size_t
len
,
int
prot
, int
flag
);
int free_hugepages(void *
addr
);
描述
The system calls alloc_hugepages() and free_hugepages() were introduced in Linux 2.5.36 and removed again in 2.5.54. They existed only on i386 and ia64 (when built with CONFIG_HUGETLB_PAGE). In Linux 2.4.20 the syscall numbers exist, but the calls return ENOSYS.
On i386 the memory management hardware knows about ordinary pages (4 KiB) and huge pages (2 or 4 MiB). Similarly ia64 knows about huge pages of several sizes. These system calls serve to map huge pages into the process’ memory or to free them again. Huge pages are locked into memory, and are not swapped.
The key parameter is an identifier. When zero the pages are private, and not inherited by children. When positive the pages are shared with other applications using the samekey, and inherited by child processes.
The addr parameter of free_hugepages() tells which page is being freed: it was the return value of a call to alloc_hugepages(). (The memory is first actually freed when all users have released it.) The addr parameter of alloc_hugepages() is a hint, that the kernel may or may not follow. Addresses must be properly aligned.
The len parameter is the length of the required segment. It must be a multiple of the huge page size.
The prot parameter specifies the memory protection of the segment. It is one of PROT_READ, PROT_WRITE, PROT_EXEC.
The flag parameter is ignored, unless key is positive. In that case, if flag is IPC_CREAT, then a new huge page segment is created when none with the given key existed. If this flag is not set, then ENOENT is returned when no segment with the given key exists.
返回值
On success, alloc_hugepages() returns the allocated virtual address, andfree_hugepages() returns zero. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
ENOSYS | The system call is not supported on this kernel. |
遵循于
These calls existed only in Linux 2.5.36 through to 2.5.54. These calls are specific to Linux on Intel processors, and should not be used in programs intended to be portable. Indeed, the system call numbers are marked for reuse, so programs using these may do something random on a future kernel.
文件
/proc/sys/vm/nr_hugepages Number of configured hugetlb pages. This can be read and written.
/proc/meminfo Gives info on the number of configured hugetlb pages and on their size in the three variables HugePages_Total, HugePages_Free, Hugepagesize.
注意
The system calls are gone. Now the hugetlbfs filesystem can be used instead. Memory backed by huge pages (if the CPU supports them) is obtained by using mmap() to map files in this virtual filesystem.
The maximal number of huge pages can be specified using the hugepages= boot parameter.
fstatat()函数
fstatat - 得到相对文件的状态到一个目录文件描述符
内容简介
#include <sys/stat.h>
int fstatat(int
dirfd
, const char *
path
, struct stat *
buf ", int " flags );
描述
The fstatat() system call operates in exactly the same way as stat(2), except for the differences described in this manual page.
If the pathname given in path is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by stat(2) for a relative pathname).
If the pathname given in path is relative and dirfd is the special value AT_FDCWD, thenpath is interpreted relative to the current working directory of the calling process (likestat(2)).
If the pathname given in path is absolute, then dirfd is ignored.
flags can either be 0, or include the following flag:
标签 | 描述 |
AT_SYMLINK_NOFOLLOW | If path is a symbolic link, do not dereference it: instead return information about the link itself, like lstat(2). (By default, fstatat() dereferences symbolic links, likestat(2).) |
返回值
On success, fstatat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for stat(2) can also occur for fstatat(). The following additional errors can occur for fstatat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
EINVAL | Invalid flag specified in flags. |
ENOTDIR | path is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
See openat(2) for an explanation of the need for fstatat().
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1. A similar system call exists on Solaris.
版本
fstatat() was added to Linux in kernel 2.6.16.
另请参阅
statfs()函数
statfs, fstatfs - 获取文件系统统计信息
内容简介
#include <sys/vfs.h>
/* or <sys/statfs.h> */
int statfs(const char *
path
, struct statfs *
buf
);
int fstatfs(int
fd
, struct statfs *
buf
);
描述
The function statfs() returns information about a mounted file system. path is the pathname of any file within the mounted filesystem. buf is a pointer to a statfs structure defined approximately as follows:
struct statfs { |
文件系统类型:
ADFS_SUPER_MAGIC 0xadf5 |
Nobody knows what f_fsid is supposed to contain (but see below).
Fields that are undefined for a particular file system are set to 0. fstatfs() returns the same information about an open file referenced by descriptor fd.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Error Code | 描述 |
EACCES | (statfs()) Search permission is denied for a component of the path prefix of path. (See also path_resolution(2).) |
EBADF | (fstatfs()) fd is not a valid open file descriptor. |
EFAULT | buf or path points to an invalid address. |
EINTR | This call was interrupted by a signal. |
EIO | An I/O error occurred while reading from the file system. |
ELOOP | (statfs()) Too many symbolic links were encountered in translating path. |
ENAMETOOLONG | (statfs()) path is too long. |
ENOENT | (statfs()) The file referred to by path does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOSYS | The file system does not support this call. |
ENOTDIR | (statfs()) A component of the path prefix of path is not a directory. |
EOVERFLOW | Some values were too large to be represented in the returned struct. |
遵循于
The Linux statfs() was inspired by the 4.4BSD one (but they do not use the same structure).
注意
The kernel has system calls statfs(), fstatfs(), statfs64(), and fstatfs64() to support this library call.
Some systems only have <sys/vfs.h>, other systems also have <sys/statfs.h>, where the former includes the latter. So it seems including the former is the best choice.
LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs() and fstatvfs() instead.
f_fsid 字段
Solaris, Irix and POSIX have a system callstatvfs(2) that returns astruct statvfs (defined in <sys/statvfs.h>) containing an unsigned long f_fsid. Linux, SunOS, HP-UX, 4.4BSD have a system call statfs() that returns a struct statfs (defined in <sys/vfs.h>) containing afsid_t f_fsid, where fsid_t is defined as struct { int val[2]; }. The same holds for FreeBSD, except that it uses the include file <sys/mount.h>.
The general idea is that f_fsid contains some random stuff such that the pair (f_fsid,ino) uniquely determines a file. Some OSes use (a variation on) the device number, or the device number combined with the filesystem type. Several OSes restrict giving out thef_fsid field to the superuser only (and zero it for unprivileged users), because this field is used in the filehandle of the filesystem when NFS-exported, and giving it out is a security concern.
Under some OSes the fsid can be used as second parameter to the b>sysfs() system call.
另请参阅
stat()函数
stat, fstat, lstat - 获取文件状态
内容简介
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int stat(const char *
path
, struct stat *
buf
);
int fstat(int
filedes
, struct stat *
buf
);
int lstat(const char *
path
, struct stat *
buf
);
描述
These functions return information about a file. No permissions are required on the file itself, but — in the case of stat() and lstat() — execute (search) permission is required on all of the directories in path that lead to the file.
stat() stats the file pointed to by path and fills in buf. lstat() is identical to stat(), except that if path is a symbolic link, then the link itself is stat-ed, not the file that it refers to.
fstat() is identical to stat(), except that the file to be stat-ed is specified by the file descriptor filedes.
All of these system calls return a stat structure, which contains the following fields:
struct stat { |
The st_dev field describes the device on which this file resides.
The st_rdev field describes the device that this file (inode) represents.
The st_size field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symlink is the length of the pathname it contains, without a trailing null byte.
The st_blocks field indicates the number of blocks allocated to the file, 512-byte units. (This may be smaller than st_size/512, for example, when the file has holes.)
The st_blksize field gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)
Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atimefield. (See ‘noatime’ in mount(8).)
The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2),utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.
The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2),utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is notchanged for changes in owner, group, hard link count, or mode.
The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).
The following POSIX macros are defined to check the file type using the st_mode field:
标签 | 描述 |
S_ISREG(m) | is it a regular file? |
S_ISDIR(m) | directory? |
S_ISCHR(m) | character device? |
S_ISBLK(m) | block device? |
S_ISFIFO(m) | FIFO (named pipe)? |
S_ISLNK(m) | symbolic link? (Not in POSIX.1-1996.) |
S_ISSOCK(m) | socket? (Not in POSIX.1-1996.) |
以下标志被定义为st_mode字段:
S_IFMT | 0170000 | bitmask for the file type bitfields |
S_IFSOCK | 0140000 | socket |
S_IFLNK | 0120000 | symbolic link |
S_IFREG | 0100000 | regular file |
S_IFBLK | 0060000 | block device |
S_IFDIR | 0040000 | directory |
S_IFCHR | 0020000 | character device |
S_IFIFO | 0010000 | FIFO |
S_ISUID | 0004000 | set UID bit |
S_ISGID | 0002000 | set-group-ID bit (see below) |
S_ISVTX | 0001000 | sticky bit (see below) |
S_IRWXU | 00700 | mask for file owner permissions |
S_IRUSR | 00400 | owner has read permission |
S_IWUSR | 00200 | owner has write permission |
S_IXUSR | 00100 | owner has execute permission |
S_IRWXG | 00070 | mask for group permissions |
S_IRGRP | 00040 | group has read permission |
S_IWGRP | 00020 | group has write permission |
S_IXGRP | 00010 | group has execute permission |
S_IRWXO | 00007 | mask for permissions for others (not in group) |
S_IROTH | 00004 | others have read permission |
S_IWOTH | 00002 | others have write permission |
S_IXOTH | 00001 | others have execute permission |
The set-group-ID bit (S_ISGID) has several special uses. For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective group ID of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, the set-group-ID bit indicates mandatory file/record locking.
The ‘sticky’ bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by a privileged process.
Linux注意事项
Since kernel 2.5.48, the stat structure supports nanosecond resolution for the three file timestamp fields. Glibc exposes the nanosecond component of each field using names either of the form st_atim.tv_nsec, if the _BSD_SOURCE or _SVID_SOURCE feature test macro is defined, or of the form st_atimensec, if neither of these macros is defined. On file systems that do not support sub-second timestamps, these nanosecond fields are returned with the value 0.
For most files under the /proc directory, stat() does not return the file size in the st_sizefield; instead the field is returned with the value 0.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | Search permission is denied for one of the directories in the path prefix of path. (See also path_resolution(2).) |
EBADF | filedes is bad. |
EFAULT | Bad address. |
ELOOP | Too many symbolic links encountered while traversing the path. |
ENAMETOOLONG | File name too long. |
ENOENT | A component of the path path does not exist, or the path is an empty string. |
ENOMEM | Out of memory (i.e. kernel memory). |
ENOTDIR | A component of the path is not a directory. |
遵循于
These system calls conform to SVr4, 4.3BSD, POSIX.1-2001.
Use of the st_blocks and st_blksize fields may be less portable. (They were introduced in BSD. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)
POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both are present in POSIX.1-2001; the former is from SVID 4, the latter from SUSv2.
Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.
其它系统
Values that have been (or are) in use on various systems:
hex | name | ls | octal | description |
f000 | S_IFMT | 170000 | mask for file type | |
0000 | 000000 | SCO out-of-service inode, BSD unknown type | ||
SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file | ||||
1000 | S_IFIFO | p| | 010000 | FIFO (named pipe) |
2000 | S_IFCHR | c | 020000 | character special (V7) |
3000 | S_IFMPC | 030000 | multiplexed character special (V7) | |
4000 | S_IFDIR | d/ | 040000 | directory (V7) |
5000 | S_IFNAM | 050000 | XENIX named special file | |
with two subtypes, distinguished by st_rdev values 1, 2: | ||||
0001 | S_INSEM | s | 000001 | XENIX semaphore subtype of IFNAM |
0002 | S_INSHD | m | 000002 | XENIX shared data subtype of IFNAM |
6000 | S_IFBLK | b | 060000 | block special (V7) |
7000 | S_IFMPB | 070000 | multiplexed block special (V7) | |
8000 | S_IFREG | - | 100000 | regular (V7) |
9000 | S_IFCMP | 110000 | VxFS compressed | |
9000 | S_IFNWK | n | 110000 | network special (HP-UX) |
a000 | S_IFLNK | l@ | 120000 | symbolic link (BSD) |
b000 | S_IFSHAD | 130000 | Solaris shadow inode for ACL (not seen by userspace) | |
c000 | S_IFSOCK | s= | 140000 | socket (BSD; also "S_IFSOC" on VxFS) |
d000 | S_IFDOOR | D> | 150000 | Solaris door |
e000 | S_IFWHT | w% | 160000 | BSD whiteout (not used for inode) |
0200 | S_ISVTX | 001000 | ‘sticky bit’: save swapped text even after use (V7) | |
reserved (SVID-v2) | ||||
On non-directories: don’t cache this file (SunOS) | ||||
On directories: restricted deletion flag (SVID-v4.2) | ||||
0400 | S_ISGID | 002000 | set-group-ID on execution (V7) | |
for directories: use BSD semantics for propagation of GID | ||||
0400 | S_ENFMT | 002000 | SysV file locking enforcement (shared with S_ISGID) | |
0800 | S_ISUID | 004000 | set-user-ID on execution (V7) | |
0800 | S_CDF | 004000 | directory is a context dependent file (HP-UX) |
A sticky command appeared in Version 32V AT&T UNIX.
另请参阅
statvfs()函数
statvfs, fstatvfs - 获取文件系统统计信息
内容简介
#include <sys/statvfs.h>
int statvfs(const char *
path
, struct statvfs *
buf
);
int fstatvfs(int
fd
, struct statvfs *
buf
);
描述
The function statvfs() returns information about a mounted file system. path is the pathname of any file within the mounted filesystem. buf is a pointer to a statvfsstructure defined approximately as follows:
struct statvfs { |
Here the types fsblkcnt_t and fsfilcnt_t are defined in <sys/types.h>. Both used to beunsigned long.
The field f_flag is a bit mask (of mount flags, see mount(8)). Bits defined by POSIX are
标签 | 描述 |
ST_RDONLY | Read-only file system. |
ST_NOSUID | Set-user-ID/set-group-ID bits are ignored by exec(2). |
它是不确定的返回结构的所有成员是否对所有文件系统有意义的值。
fstatvfs() 返回有关由描述符fd指定打开的文件相同的信息。
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
Error Code | 描述 |
EACCES | (statvfs()) Search permission is denied for a component of the path prefix of path. (See also path_resolution(2).) |
EBADF | (fstatvfs()) fd is not a valid open file descriptor. |
EFAULT | Buf or path points to an invalid address. |
EINTR | This call was interrupted by a signal. |
EIO | An I/O error occurred while reading from the file system. |
ELOOP | (statvfs()) Too many symbolic links were encountered in translating path. |
ENAMETOOLONG | (statvfs()) path is too long. |
ENOENT | (statvfs()) The file referred to by path does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOSYS | The file system does not support this call. |
ENOTDIR | (statvfs()) A component of the path prefix of path is not a directory. |
EOVERFLOW | Some values were too large to be represented in the returned struct. |
遵循于
Solaris, Irix, POSIX.1-2001
注意
The Linux kernel has system calls statfs() and fstatfs() to support this library call.
The current glibc implementation of
pathconf(path, _PC_REC_XFER_ALIGN); |
uses the f_frsize, f_frsize, and f_bsize fields of the return value of statvfs(path,buf).
另请参阅
fsync()函数
fsync, fdatasync - 同步文件在内核态与存储设备
内容简介
#include <unistd.h>
int fsync(int
fd
);
int fdatasync(int
fd
);
描述
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).
Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.
fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush.
The aim of fdatasync(2) is to reduce disk activity for applications that do not require all metadata to be synchronised with the disk.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | fd is not a valid file descriptor open for writing. |
EIO | An error occurred during synchronization. |
EROFS, EINVAL | fd is bound to a special file which does not support synchronization. |
注意
If the underlying hard disk has write caching enabled, then the data may not really be on permanent storage when fsync() / fdatasync() return.
When an ext2 file system is mounted with the sync option, directory entries are also implicitly synced by fsync().
On kernels before 2.4, fsync() on big files can be inefficient. An alternative might be to use the O_SYNC flag to open(2).
遵循于
POSIX.1-2001
另请参阅
truncate()函数
truncate, ftruncate - 截断一个文件到指定的长度
内容简介
#include <unistd.h>
#include <sys/types.h>
int truncate(const char *
path
, off_t
length
);
int ftruncate(int
fd
, off_t
length
);
描述
The truncate() and ftruncate() functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.
If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as null bytes (’\0’). The file offset is not changed.
If the size changed, then the st_ctime and st_mtime fields (respectively, time of last status change and time of last modification; see stat(2)) for the file are updated, and the set-user-ID and set-group-ID permission bits may be cleared.
With ftruncate(), the file must be open for writing; with truncate(), the file must be writable.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
For truncate():
Error Code | 描述 |
EACCES | Search permission is denied for a component of the path prefix, or the named file is not writable by the user. (See alsopath_resolution(2).) |
EFAULT | Path points outside the process’s allocated address space. |
EFBIG | The argument length is larger than the maximum file size. (XSI) |
EINTR | A signal was caught during execution. |
EINVAL | The argument length is negative or larger than the maximum file size. |
EIO | An I/O error occurred updating the inode. |
EISDIR | The named file is a directory. |
ELOOP | Too many symbolic links were encountered in translating the pathname. |
ENAMETOOLONG | A component of a pathname exceeded 255 characters, or an entire pathname exceeded 1023 characters. |
ENOENT | The named file does not exist. |
ENOTDIR | A component of the path prefix is not a directory. |
EPERM | The underlying file system does not support extending a file beyond its current size. |
EROFS | The named file resides on a read-only file system. |
ETXTBSY | The file is a pure procedure (shared text) file that is being executed. |
For ftruncate() the same errors apply, but instead of things that can be wrong withpath, we now have things that can be wrong with fd: | |
EBADF | The fd is not a valid descriptor. |
EBADF or EINVAL | The fd is not open for writing. |
EINVAL | The fd does not reference a regular file. |
遵循于
4.4BSD, SVr4, POSIX.1-2001 (these calls first appeared in 4.2BSD).
注意
The above description is for XSI-compliant systems. For non-XSI-compliant systems, the POSIX standard allows two behaviours for ftruncate() when length exceeds the file length (note that truncate() is not specified at all in such an environment): either returning an error, or extending the file.
Like most Unix implementations, Linux follows the XSI requirement when dealing with native file systems. However, some non-native file systems do not permit truncate() and ftruncate() to be used to extend a file beyond its current length: a notable example on Linux is VFAT.
另请参阅
futex()函数
futex - 快速用户空间锁定系统调用
内容简介
#include <linux/futex.h> #include <sys/time.h> int futex(int *uaddr, int op, int val, const struct timespec * timeout , int *uaddr2, int val3); |
描述
The futex() system call provides a method for a program to wait for a value at a given address to change, and a method to wake up anyone waiting on a particular address (while the addresses for the same memory in separate processes may not be equal, the kernel maps them internally so the same memory mapped in different locations will correspond for futex() calls). It is typically used to implement the contended case of a lock in shared memory, as described in futex(7).
When a futex(7) operation did not finish uncontended in userspace, a call needs to be made to the kernel to arbitrate. Arbitration can either mean putting the calling process to sleep or, conversely, waking a waiting process.
Callers of this function are expected to adhere to the semantics as set out in futex(7). As these semantics involve writing non-portable assembly instructions, this in turn probably means that most users will in fact be library authors and not general application developers.
The uaddr argument needs to yiibai to an aligned integer which stores the counter. The operation to execute is passed via the op parameter, along with a value val.
Five operations are currently defined:
标签 | 描述 |
FUTEX_WAIT | |
This operation atomically verifies that the futex addressuaddr still contains the value val, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. The arguments uaddr2and val3 are ignored. For futex(7), this call is executed if decrementing the count gave a negative value (indicating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation. | |
FUTEX_WAKE | This operation wakes at most val processes waiting on this futex address (ie. inside FUTEX_WAIT). The argumentstimeout, uaddr2 and val3 are ignored. For futex(7), this is executed if incrementing the count showed that there were waiters, once the futex value has been set to 1 (indicating that it is available). |
FUTEX_FD | To support asynchronous wakeups, this operation associates a file descriptor with a futex. If another process executes a FUTEX_WAKE, the process will receive the signal number that was passed in val. The calling process must close the returned file descriptor after use. The argumentstimeout, uaddr2 and val3 are ignored. To prevent race conditions, the caller should test if the futex has been upped after FUTEX_FD returns. |
FUTEX_REQUEUE(since Linux 2.5.70) | This operation was introduced in order to avoid a "thundering herd" effect when FUTEX_WAKE is used and all processes woken up need to acquire another futex. This call wakes up val processes, and requeues all other waiters on the futex at address uaddr2. The arguments timeout andval3 are ignored. |
FUTEX_CMP_REQUEUE(since Linux 2.6.7) | There was a race in the intended use of FUTEX_REQUEUE, so FUTEX_CMP_REQUEUE was introduced. This is similar to FUTEX_REQUEUE, but first checks whether the locationuaddr still contains the value val3. If not, an error EAGAIN is returned. The argument timeout is ignored. |
返回值
Depending on which operation was executed, the returned value can have differing meanings.
标签 | 描述 |
FUTEX_WAIT | Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR. |
FUTEX_WAKE | Returns the number of processes woken up. |
FUTEX_FD | Returns the new file descriptor associated with the futex. |
FUTEX_REQUEUE | Returns the number of processes woken up. |
FUTEX_CMP_REQUEUE | Returns the number of processes woken up. |
错误
错误代码 | 描述 |
EACCES | No read access to futex memory. |
EAGAIN | FUTEX_CMP_REQUEUE found an unexpected futex value. (This probably indicates a race; use the safe FUTEX_WAKE now.) |
EFAULT | Error in getting timeout information from userspace. |
EINVAL | An operation was not defined or error in page alignment. |
ENFILE | The system limit on the total number of open files has been reached. |
注意
To reiterate, bare futexes are not intended as an easy to use abstraction for end-users. Implementors are expected to be assembly literate and to have read the sources of the futex userspace library referenced below.
版本
Initial futex support was merged in Linux 2.5.7 but with different semantics from what was described above. A 4-parameter system call with the semantics given here was introduced in Linux 2.5.40. In Linux 2.5.70 one parameter was added. In Linux 2.6.7 a sixth parameter was added — messy, especially on the s390 architecture.
遵循于
This system call is Minux specific.
futimesat()函数
futimes - 改变文件的一个相对的时间戳到一个目录文件描述符
内容简介
#include <fcntl.h> int futimesat(int dirfd, const char *path, const struct timeval times[2]); |
描述
The futimesat() system call operates in exactly the same way as utimes(2), except for the differences described in this manual page.
If the pathname given in pathname is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by utimes(2) for a relative pathname).
If the pathname given in pathname is relative and dirfd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like utimes(2)).
If the pathname given in pathname is absolute, then dirfd is ignored.
返回值
On success, futimesat() returns a 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for utimes(2) can also occur for futimesat(). The following additional errors can occur for futimesat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
ENOTDIR | pathname is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1. A similar system call exists on Solaris.
GLIBC 注意
If the path argument is NULL, then the glibc futimes() wrapper function updates the times for the file referred to by dirfd.
版本
futimesat() was added to Linux in kernel 2.6.16.
另请参阅
getcontext()函数
getcontext, setcontext - 获取或设置用户环境
内容简介
#include <ucontext.h>
int getcontext(ucontext_t *
ucp
);
int setcontext(const ucontext_t *
ucp
);
where:
标签 | 描述 |
ucp | points to a structure defined in <ucontext.h> containing the signal mask, execution stack, and machine registers. |
描述
getcontext(2) gets the current context of the calling process, storing it in the ucontext struct pointed to by ucp.
setcontext(2) sets the context of the calling process to the state stored in the ucontext struct pointed to by ucp. The struct must either have been created by getcontext(2) or have been passed as the third parameter of the sigaction(2) signal handler.
The ucontext struct created by getcontext(2) is defined in <ucontext.h> as follows:
typedef struct ucontext |
RETURN VALUES
getcontext(2) returns 0 on success and -1 on failure. setcontext(2) does not return a value on success and returns -1 on failure.
STANDARDS
These functions comform to: XPG4-UNIX.
注意
When a signal handler executes, the current user context is saved and a new context is created by the kernel. If the calling process leaves the signal handler using longjmp(2), the original context cannot be restored, and the result of future calls to getcontext(2) are unpredictable. To avoid this problem, use siglongjmp(2) or setcontext(2) in signal handlers instead of longjmp(2).
另请参阅
sigaltstack(2), sigprocmask(2), sigsetjmp(3), setjmp(3).
getcwd()函数
内容简介
/* long getcwd(char *buf, unsigned long size); |
描述
The getcwd() function copies an absolute pathname of the current working directory to the array pointed to by buf, which is of length size.
If the current absolute path name would require a buffer longer than size elements, -1is returned, and errno is set to ERANGE; an application should check for this error, and allocate a larger buffer if necessary.
If buf is NULL, the behaviour of getcwd() is undefined.
返回值
-1 on failure (for example, if the current directory is not readable), with errno set accordingly, and the number of characters stored in buf on success. The contents of the array pointed to by buf is undefined on error.
Note that this return value differs from the getcwd(3) library function, which returnsNULL on failure and the address of buf on success.
错误
标签 | 描述 |
ENOMEM | |
if user memory cannot be mapped | |
ENOENT | |
if directory does not exist (i.e. it has been deleted) | |
ERANGE | |
if not enough space available for storing the path | |
EFAULT | |
if memory access violation occurs while copying |
遵循于
The getcwd system call is Linux specific, use the getcwd C library function for portability.
另请参阅
getdents()函数
内容简介
#include <unistd.h> int getdents(unsigned int fd, struct dirent *dirp, unsigned int count); |
描述
This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface.
The system call getdents() reads several dirent structures from the directory pointed at by fd into the memory area pointed to by dirp. The parameter count is the size of the memory area.
The dirent structure is declared as follows:
struct dirent { |
d_ino is an inode number. d_off is the distance from the start of the directory to the start of the next dirent. d_reclen is the size of this entire dirent. d_name is a null-terminated filename.
This call supersedes readdir(2).
返回值
On success, the number of bytes read is returned. On end of directory, 0 is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | Invalid file descriptor fd. |
EFAULT | Argument points outside the calling process’s address space. |
EINVAL | Result buffer is too small. |
ENOENT | No such directory. |
ENOTDIR | |
File descriptor does not refer to a directory. |
遵循于
SVr4.
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
另请参阅
getdomainname()函数
getdomainname, setdomainname -获取/设置域名
内容简介
#include <unistd.h>
int getdomainname(char *
name
, size_t
len
);
int setdomainname(const char *
name
, size_t
len
);
描述
These functions are used to access or to change the domain name of the current processor. If the null-terminated domain name requires more than len bytes,getdomainname() returns the first len bytes (glibc) or returns an error (libc).
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | For setdomainname(): name yiibaied outside of user address space. |
EINVAL | For getdomainname() under libc: name is NULL or name is longer than len bytes. |
EINVAL | For setdomainname(): len was negative or too large. |
EPERM | For setdomainname(): the caller is unprivileged (Linux: does not have the CAP_SYS_ADMIN capability). |
遵循于
POSIX does not specify these calls.
另请参阅
getdtablesize()函数
内容简介
#include <unistd.h>
int getdtablesize(void);
描述
getdtablesize() 返回文件的最大数量进程可以打开的,比一个文件描述符的最大可能值多一个。
返回值
当前限制在每个进程打开的文件数。
注意
getdtablesize() is implemented as a libc library function. The glibc version callsgetrlimit(2) and returns the current RLIMIT_NOFILE limit, or OPEN_MAX when that fails. The libc4 and libc5 versions return OPEN_MAX (set to 256 since Linux 0.98.4).
遵循于
SVr4, 4.4BSD (the getdtablesize() function first appeared in 4.2BSD).
另请参阅
getgid()函数
内容简介
#include <unistd.h>
#include <sys/types.h>
gid_t getgid(void);
gid_t getegid(void);
描述
getgid() 返回当前进程的实际组ID。
getegid() 返回当前进程的有效组ID。
错误
这些函数总是成功的。
遵循于
POSIX.1-2001, 4.3BSD
另请参阅
getuid()函数
内容简介
#include <unistd.h>
#include <sys/types.h>
uid_t getuid(void);
uid_t geteuid(void);
描述
getuid() 返回当前进程的真实用户ID。
geteuid() 返回当前进程的有效用户ID。
错误
这些函数总是成功的。
遵循于
POSIX.1-2001, 4.3BSD.
历史
In Unix V6 the getuid() call returned (euid << 8) + uid. Unix V7 introduced separate callsgetuid() and geteuid().
另请参阅
getgroups()函数
getgroups, setgroups - 补充组的get/set ID列表
内容简介
#include <sys/types.h>
#include <unistd.h>
int getgroups(int
size
, gid_t
list
[]);
#include <grp.h>
int setgroups(size_t
size
, const gid_t *
list
);
描述
标签 | 描述 |
getgroups() | |
Up to size supplementary group IDs (of the calling process) are returned in list. It is unspecified whether the effective group ID of the calling process is included in the returned list. (Thus, an application should also call getegid(2) and add or remove the resulting value.) If size is zero, list is not modified, but the total number of supplementary group IDs for the process is returned. | |
setgroups() | |
Sets the supplementary group IDs for the process. Appropriate privileges (Linux: the CAP_SETGID capability) are required. |
返回值
标签 | 描述 |
getgroups() | |
On success, the number of supplementary group IDs is returned. On error, -1 is returned, and errno is set appropriately. | |
setgroups() | |
On success, zero is returned. On error, -1 is returned, and errnois set appropriately. |
错误
标签 | 描述 |
EFAULT | list has an invalid address. |
EINVAL | For setgroups(), size is greater than NGROUPS (32 for Linux 2.0.32). For getgroups(), size is less than the number of supplementary group IDs, but is not zero. |
EPERM | The calling process has insufficient privilege to call setgroups(). |
注意
A process can have up to at least NGROUPS_MAX supplementary group IDs in addition to the effective group ID. The set of supplementary group IDs is inherited from the parent process and may be changed using setgroups(). The maximum number of supplementary group IDs can be found using sysconf(3):
long ngroups_max; |
The maximal return value of getgroups() cannot be larger than one more than the value obtained this way.
The prototype for setgroups() is only available if _BSD_SOURCE is defined.
遵循于
SVr4, 4.3BSD. The getgroups() function is in POSIX.1-2001. Since setgroups() requires privilege, it is not covered by POSIX.1-2001.
另请参阅
getgroups()函数
gethostid, sethostid - 获取或设置当前主机的唯一标识
内容简介
#include <unistd.h>
long gethostid(void);
int sethostid(long
hostid
);
描述
Get or set a unique 32-bit identifier for the current machine. The 32-bit identifier is intended to be unique among all UNIX systems in existence. This normally resembles the Internet address for the local machine, as returned by gethostbyname(3), and thus usually never needs to be set.
The sethostid() call is restricted to the superuser.
The hostid argument is stored in the file /etc/hostid.
返回值
gethostid() returns the 32-bit identifier for the current host as set by sethostid(2).
遵循于
4.2BSD; these functions were dropped in 4.4BSD. SVr4 includes gethostid() but notsethostid(). POSIX.1-2001 specifies gethostid() but not sethostid().
文件
/etc/hostid
示例
id = gethostid ();
/* This is a no-op unless unsigned int is wider than 32 bits. */ id &= 0xffffffff;
另请参阅
gethostname()函数
gethostname, sethostname - 获取/设置主机名
内容简介
#include <unistd.h>
int gethostname(char *
name
, size_t
len
);
int sethostname(const char *
name
, size_t
len
);
描述
These system calls are used to access or to change the host name of the current processor. The gethostname() system call returns a null-terminated hostname (set earlier by sethostname()) in the array name that has a length of len bytes. In case the null-terminated hostname does not fit, no error is returned, but the hostname is truncated. It is unspecified whether the truncated hostname will be null-terminated.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | name is an invalid address. |
EINVAL | len is negative or, for sethostname(), len is larger than the maximum allowed size, or, for gethostname() on Linux/i386, lenis smaller than the actual size. (In this last case glibc 2.1 uses ENAMETOOLONG.) |
EPERM | For sethostname(), the caller did not have theCAP_SYS_ADMIN capability. |
遵循于
SVr4, 4.4BSD (this interfaces first appeared in 4.2BSD). POSIX.1-2001 specifiesgethostname() but not sethostname().
注意
SUSv2 guarantees that ‘Host names are limited to 255 bytes’. POSIX.1-2001 guarantees that ‘Host names (not including the terminating null byte) are limited to HOST_NAME_MAX bytes’.
glibc注意事项
The GNU C library implements gethostname() as a library function that calls uname(2) and copies up to len bytes from the returned nodename field into name. Having performed the copy, the function then checks if the length of the nodename was greater than or equal to len, and if it is, then the function returns -1 with errno set toENAMETOOLONG. Versions of glibc before 2.2 handle the case where the length of thenodename was greater than or equal to len differently: nothing is copied into name and the function returns -1 with errno set to ENAMETOOLONG.
另请参阅
getitimer()函数
getitimer, setitimer - 获取或设置一个间隔定时器的值
内容简介
#include <sys/time.h> int getitimer(int which, struct itimerval *value); |
描述
该系统为每个进程有三个间隔定时器,在不同的时间域的每个递减。当任何定时器到期时,一信号被发送到处理,定时器(可能)重新启动。
标签 | 描述 |
ITIMER_REAL | decrements in real time, and delivers SIGALRM upon expiration. |
ITIMER_VIRTUAL | decrements only when the process is executing, and deliversSIGVTALRM upon expiration. |
ITIMER_PROF | decrements both when the process executes and when the system is executing on behalf of the process. Coupled withITIMER_VIRTUAL, this timer is usually used to profile the time spent by the application in user and kernel space. SIGPROF is delivered upon expiration. |
计时器的值由以下结构定义:
struct itimerval {
struct timeval it_interval; /* next value */
struct timeval it_value; /* current value */
};
struct timeval {
long tv_sec; /* seconds */
long tv_usec; /* microseconds */
};
The function getitimer() fills the structure indicated by value with the current setting for the timer indicated by which (one of ITIMER_REAL, ITIMER_VIRTUAL, orITIMER_PROF). The element it_value is set to the amount of time remaining on the timer, or zero if the timer is disabled. Similarly, it_interval is set to the reset value. The function setitimer() sets the indicated timer to the value in value. If ovalue is non-zero, the old value of the timer is stored there.
Timers decrement from it_value to zero, generate a signal, and reset to it_interval. A timer which is set to zero (it_value is zero or the timer expires and it_interval is zero) stops.
Both tv_sec and tv_usec are significant in determining the duration of a timer.
Timers will never expire before the requested time, but may expire some (short) time afterwards, which depends on the system timer resolution and on the system load. (But see BUGS below.) Upon expiration, a signal will be generated and the timer reset. If the timer expires while the process is active (always true for ITIMER_VIRTUAL) the signal will be delivered immediately when generated. Otherwise the delivery will be offset by a small time dependent on the system loading.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | value or ovalue are not valid pointers. |
EINVAL | which is not one of ITIMER_REAL, ITIMER_VIRTUAL, orITIMER_PROF. |
注意
A child created via fork(2) does not inherit its parent’s interval timers. Interval timers are preserved across an execve(2).
遵循于
POSIX.1-2001, SVr4, 4.4BSD (this call first appeared in 4.2BSD).
另请参阅
The generation and delivery of a signal are distinct, and only one instance of each of the signals listed above may be pending for a process. Under very heavy loading, an ITIMER_REAL timer may expire before the signal from a previous expiration has been delivered. The second signal in such an event will be lost.
On Linux, timer values are represented in jiffies. If a request is made set a timer with a value whose jiffies representation exceeds MAX_SEC_IN_JIFFIES (defined ininclude/linux/jiffies.h), then the timer is silently truncated to this ceiling value. On Linux/x86 (where, since kernel 2.6.13, the default jiffy is 0.004 seconds), this means that the ceiling value for a timer is approximately 99.42 days.
On certain systems (including x86), Linux kernels before version 2.6.12 have a bug which will produce premature timer expirations of up to one jiffy under some circumstances. This bug is fixed in kernel 2.6.12.
POSIX.1-2001 says that setitimer() should fail if a tv_usec value is specified that is outside of the range 0 to 999999. However, Linux does not give an error, but instead silently adjusts the corresponding seconds value for the timer. In the future (scheduled for March 2007), this non-conformance will be repaired: existing applications should be fixed now to ensure that they supply a properly formed tv_usec value.
get_kernel_syms()函数
get_kernel_syms -检索导出的内核和模块的符号
内容简介
#include <linux/module.h> int get_kernel_syms(struct kernel_sym *table); |
描述
如果 table 为 NULL, get_kernel_syms() 返回可用于查询符号的数目。否则填充结构的一个表:
struct kernel_sym { |
The symbols are interspersed with magic symbols of the form #module-name with the kernel having an empty name. The value associated with a symbol of this form is the address at which the module is loaded.
The symbols exported from each module follow their magic module tag and the modules are returned in the reverse of the order in which they were loaded.
返回值
Returns the number of symbols copied to table. There is no possible error return.
遵循于
get_kernel_syms() is Linux specific.
BUGS
There is no way to indicate the size of the buffer allocated for table. If symbols have been added to the kernel since the program queried for the symbol table size, memory will be corrupted.
The length of exported symbol names is limited to 59 characters.
Because of these limitations, this system call is deprecated in favor of query_module(2) (which is itself nowadays deprecated in favor of other interfaces described on its manual page).
注意
This system call is only present on Linux up until kernel 2.4; it was removed in Linux 2.6.
另请参阅
unimplemented()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用。
描述
这些系统调用中不执行在 Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
getpagesize()函数
内容简介
#include <unistd.h>
int getpagesize(void);
描述
The function getpagesize() returns the number of bytes in a page, where a "page" is the thing used where it says in the description of mmap(2) that files are mapped in page-sized units.
The size of the kind of pages that mmap() uses, is found using
#include <unistd.h> |
(where some systems also allow the synonym _SC_PAGE_SIZE for _SC_PAGESIZE), or
#include <unistd.h> |
HISTORY
This call first appeared in 4.2BSD.
遵循于
SVr4, 4.4BSD, SUSv2. In SUSv2 the getpagesize() call is labeled LEGACY, and in POSIX.1-2001 it has been dropped. HP-UX does not have this call.
注意
Whether getpagesize() is present as a Linux system call depends on the architecture. If it is, it returns the kernel symbol PAGE_SIZE, which is architecture and machine model dependent. Generally, one uses binaries that are architecture but not machine model dependent, in order to have a single binary distribution per architecture. This means that a user program should not find PAGE_SIZE at compile time from a header file, but use an actual system call, at least for those architectures (like sun4) where this dependency exists. Here libc4, libc5, glibc 2.0 fail because their getpagesize() returns a statically derived value, and does not use a system call. Things are OK in glibc 2.1.
另请参阅
getpeername()函数
内容简介
#include <sys/socket.h>
int getpeername(int
s
, struct sockaddr *
name
, socklen_t *
namelen
);
描述
getpeername() 返回连接到套接字s的同伴的名字。namelen 参数应被初始化,以指示的空间指向金额的名字。返回时它包含(以字节为单位)返回的名称的实际大小。该名称被截断,如果提供的缓冲区太小。
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | The argument s is not a valid descriptor. |
EFAULT | The name parameter yiibais to memory not in a valid part of the process address space. |
EINVAL | namelen is invalid (e.g., is negative). |
ENOBUFS | |
Insufficient resources were available in the system to perform the operation. | |
ENOTCONN | |
The socket is not connected. | |
ENOTSOCK | |
The argument s is a file, not a socket. |
遵循于
SVr4, 4.4BSD (the getpeername() function call first appeared in 4.2BSD), POSIX.1-2001.
注意
The third argument of getpeername() is in reality an int * (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).
另请参阅
setpgid()函数
setpgid, getpgid, setpgrp, getpgrp - 设置/获取进程组
内容简介
#include <unistd.h>
int setpgid(pid_t
pid
, pid_t
pgid
);
pid_t getpgid(pid_t
pid
);
int setpgrp(void);
pid_t getpgrp(void);
描述
setpgid() sets the process group ID of the process specified by pid to pgid. If pid is zero, the process ID of the current process is used. If pgid is zero, the process ID of the process specified by pid is used. If setpgid() is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.
getpgid() returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.
The call setpgrp() is equivalent to setpgid(0,0).
Similarly, getpgrp() is equivalent to getpgid(0) . Each process group is a member of a session and each process is a member of the session of which its process group is a member.
Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described intermios(3) are used to get/set the process group of the control terminal.
If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.
If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.
返回值
On success, setpgid() and setpgrp() return zero. On error, -1 is returned, and errno is set appropriately.
getpgid() returns a process group on success. On error, -1 is returned, and errno is set appropriately.
getpgrp() always returns the current process group.
错误
标签 | 描述 |
EACCES | An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve() (setpgid(), setpgrp()). |
EINVAL | pgid is less than 0 (setpgid(), setpgrp()). |
EPERM | An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid(), setpgrp()). |
ESRCH | For getpgid(): pid does not match any process. For setpgid(): pidis not the current process and not a child of the current process. |
遵循于
The functions setpgid() and getpgrp() conform to POSIX.1-2001. The function setpgrp() is from 4.2BSD. The function getpgid() conforms to SVr4.
注意
A child created via fork(2) inherits its parent’s process group ID. The process group ID is preserved across an execve(2).
POSIX took setpgid() from the BSD function setpgrp(). Also System V has a function with the same name, but it is identical to setsid(2).
To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer nlarger than or equal to 500.
另请参阅
getpgrp()函数
setpgid, getpgid, setpgrp, getpgrp - 设置/获取进程组
内容简介
#include <unistd.h>
int setpgid(pid_t pid, pid_t pgid);
pid_t getpgid(pid_t pid);
int setpgrp(void);
pid_t getpgrp(void);
描述
setpgid() sets the process group ID of the process specified by pid to pgid. If pid is zero, the process ID of the current process is used. If pgid is zero, the process ID of the process specified by pid is used. If setpgid() is used to move a process from one process group to another (as is done by some shells when creating pipelines), both process groups must be part of the same session. In this case, the pgid specifies an existing process group to be joined and the session ID of that group must match the session ID of the joining process.
getpgid() returns the process group ID of the process specified by pid. If pid is zero, the process ID of the current process is used.
The call setpgrp() is equivalent to setpgid(0,0).
Similarly, getpgrp() is equivalent to getpgid(0) . Each process group is a member of a session and each process is a member of the session of which its process group is a member.
Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: Processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read. These calls are thus used by programs such as csh(1) to create process groups in implementing job control. The TIOCGPGRP and TIOCSPGRP calls described intermios(3) are used to get/set the process group of the control terminal.
If a session has a controlling terminal, CLOCAL is not set and a hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, the SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal.
If the exit of the process causes a process group to become orphaned, and if any member of the newly-orphaned process group is stopped, then a SIGHUP signal followed by a SIGCONT signal will be sent to each process in the newly-orphaned process group.
返回值
On success, setpgid() and setpgrp() return zero. On error, -1 is returned, and errno is set appropriately.
getpgid() returns a process group on success. On error, -1 is returned, and errno is set appropriately.
getpgrp() always returns the current process group.
错误
标签 | 描述 |
EACCES | An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve() (setpgid(), setpgrp()). |
EINVAL | pgid is less than 0 (setpgid(), setpgrp()). |
EPERM | An attempt was made to move a process into a process group in a different session, or to change the process group ID of one of the children of the calling process and the child was in a different session, or to change the process group ID of a session leader (setpgid(), setpgrp()). |
ESRCH | For getpgid(): pid does not match any process. For setpgid(): pidis not the current process and not a child of the current process. |
遵循于
The functions setpgid() and getpgrp() conform to POSIX.1-2001. The function setpgrp() is from 4.2BSD. The function getpgid() conforms to SVr4.
注意
A child created via fork(2) inherits its parent’s process group ID. The process group ID is preserved across an execve(2).
POSIX took setpgid() from the BSD function setpgrp(). Also System V has a function with the same name, but it is identical to setsid(2).
To get the prototypes under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer nlarger than or equal to 500.
另请参阅
getpid()函数
内容简介
#include <sys/types.h>
#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);
描述
getpid() 返回当前进程的进程ID。 (这是经常使用的生成唯一的临时文件名的程序。)
getppid() 返回当前进程的父进程ID。
遵循于
POSIX.1-2001, 4.3BSD, SVr4
另请参阅
getpmsg()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现的系统调用
内容简介
未实现系统调用
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
getppid()函数
内容简介
#include <sys/types.h>
#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);
描述
getpid() 返回当前进程的进程ID。 (这是经常使用的生成唯一的临时文件名的程序。)
getppid() 返回当前进程的父进程ID。
遵循于
POSIX.1-2001, 4.3BSD, SVr4
另请参阅
getpriority()函数
getpriority, setpriority - 获取/设置程序的调度优先级
内容简介
#include <sys/time.h>
#include <sys/resource.h>
int getpriority(int which, int who);
int setpriority(int which, int who, int prio);
描述
The scheduling priority of the process, process group, or user, as indicated by which andwho is obtained with the getpriority() call and set with the setpriority() call.
The value which is one of PRIO_PROCESS, PRIO_PGRP, or PRIO_USER, and who is interpreted relative to which (a process identifier for PRIO_PROCESS, process group identifier for PRIO_PGRP, and a user ID for PRIO_USER). A zero value for who denotes (respectively) the calling process, the process group of the calling process, or the real user ID of the calling process. Prio is a value in the range -20 to 19 (but see the Notes below). The default priority is 0; lower priorities cause more favorable scheduling.
The getpriority() call returns the highest priority (lowest numerical value) enjoyed by any of the specified processes. The setpriority() call sets the priorities of all of the specified processes to the specified value. Only the superuser may lower priorities.
返回值
Since getpriority() can legitimately return the value -1, it is necessary to clear the external variable errno prior to the call, then check it afterwards to determine if a -1 is an error or a legitimate value. The setpriority() call returns 0 if there is no error, or -1 if there is.
错误
标签 | 描述 |
EINVAL | which was not one of PRIO_PROCESS, PRIO_PGRP, orPRIO_USER. |
ESRCH | No process was located using the which and who values specified. |
In addition to the errors indicated above, setpriority() may fail if: | |
EPERM | A process was located, but its effective user ID did not match either the effective or the real user ID of the caller, and was not privileged (on Linux: did not have the CAP_SYS_NICEcapability). But see NOTES below. |
EACCES | The caller attempted to lower a process priority, but did not have the required privilege (on Linux: did not have theCAP_SYS_NICE capability). Since Linux 2.6.12, this error only occurs if the caller attempts to set a process priority outside the range of the RLIMIT_NICE soft resource limit of the target process; see getrlimit(2) for details. |
注意
A child created by fork(2) inherits its parent’s nice value. The nice value is preserved across execve(2).
The details on the condition for EPERM depend on the system. The above description is what POSIX.1-2001 says, and seems to be followed on all System V-like systems. Linux kernels before 2.6.12 required the real or effective user ID of the caller to match the real user of the process who (instead of its effective user ID). Linux 2.6.12 and later require the effective user ID of the caller to match the real or effective user ID of the process who. All BSD-like systems (SunOS 4.1.3, Ultrix 4.2, 4.3BSD, FreeBSD 4.3, OpenBSD-2.5, ...) behave in the same manner as Linux >= 2.6.12.
The actual priority range varies between kernel versions. Linux before 1.3.36 had -infinity..15. Since kernel 1.3.43 Linux has the range -20..19. Within the kernel, nice values are actually represented using the corresponding range 40..1 (since negative numbers are error codes) and these are the values employed by the setpriority() andgetpriority() system calls. The glibc wrapper functions for these system calls handle the translations between the user-land and kernel representations of the nice value according to the formula unice = 20 - knice.
On some systems, the range of nice values is -20..20.
Including <sys/time.h> is not required these days, but increases portability. (Indeed,<sys/resource.h> defines the rusage structure with fields of type struct timeval defined in <sys/time.h>.)
遵循于
SVr4, 4.4BSD (these function calls first appeared in 4.2BSD), POSIX.1-2001.
另请参阅
getresuid()函数
getresuid, getresgid - 获得真正的,有效的和保存的用户或组ID
内容简介
#define _GNU_SOURCE
#include <unistd.h>
int getresuid(uid_t *ruid, uid_t *euid, uid_t *suid);
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid);
描述
getresuid() and getresgid() (both introduced in Linux 2.1.44) get the real UID, effective UID, and saved set-user-ID (resp. group ID’s) of the current process.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | One of the arguments specified an address outside the calling program’s address space. |
遵循于
These calls are non-standard; they also appear on HP-UX and some of the BSDs.
The prototype is given by glibc since version 2.3.2 provided _GNU_SOURCE is defined.
另请参阅
getrlimit()函数
getrlimit, setrlimit - 获取/设置资源限制
内容简介
#include <sys/time.h>
#include <sys/resource.h>
int getrlimit(int resource, struct rlimit *rlim);
int setrlimit(int resource, const struct rlimit *rlim);
描述
getrlimit() 和setrlimit() 获取和分别设置资源限制。每个资源都有一个相关的软,硬限制,由rlimit 结构(rlim 参数两者之定义 getrlimit() 和 setrlimit()):
struct rlimit { |
The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit: an unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process (under Linux: one with the CAP_SYS_RESOURCE capability) may make arbitrary changes to either limit value.
The value RLIM_INFINITY denotes no limit on a resource (both in the structure returned by getrlimit() and in the structure passed to setrlimit()).
resource must be one of:
标签 | 描述 | |
RLIMIT_AS | ||
The maximum size of the process’s virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) andmremap(2), which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process if no alternate stack has been made available via sigaltstack(2)). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited. | ||
RLIMIT_CORE | ||
Maximum size of core file. When 0 no core dump files are created. When non-zero, larger dumps are truncated to this size. | ||
RLIMIT_CPU | ||
CPU time limit in seconds. When the process reaches the soft limit, it is sent a SIGXCPU signal. The default action for this signal is to terminate the process. However, the signal can be caught, and the handler can return control to the main program. If the process continues to consume CPU time, it will be sentSIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL. (This latter point describes Linux 2.2 through 2.6 behaviour. Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit. Portable applications that need to catch this signal should perform an orderly termination upon first receipt of SIGXCPU.) | ||
RLIMIT_DATA | ||
The maximum size of the process’s data segment (initialized data, uninitialized data, and heap). This limit affects calls tobrk() and sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource. | ||
RLIMIT_FSIZE | ||
The maximum size of files that the process may create. Attempts to extend a file beyond this limit result in delivery of a SIGXFSZsignal. By default, this signal terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g., write() truncate()) fails with the error EFBIG. | ||
RLIMIT_LOCKS (Early Linux 2.4 only) | ||
A limit on the combined number of flock() locks and fcntl() leases that this process may establish. | ||
RLIMIT_MEMLOCK | ||
The maximum number of bytes of memory that may be locked into RAM. In effect this limit is rounded down to the nearest multiple of the system page size. This limit affects mlock(2) andmlockall(2) and the mmap(2) MAP_LOCKED operation. Since Linux 2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it sets a maximum on the total bytes in shared memory segments (see shmget(2)) that may be locked by the real user ID of the calling process. The shmctl(2) SHM_LOCK locks are accounted for separately from the per-process memory locks established by mlock(2), mlockall(2), and mmap(2)MAP_LOCKED; a process can lock bytes up to this limit in each of these two categories. In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process may lock, and this limit instead governs the amount of memory that an unprivileged process may lock. | ||
RLIMIT_MSGQUEUE (Since Linux 2.6.8) | ||
Specifies the limit on the number of bytes that can be allocated for POSIX message queues for the real user ID of the calling process. This limit is enforced for mq_open(3). Each message queue that the user creates counts (until it is removed) against this limit according to the formula:
where attr is the mq_attr structure specified as the fourth argument to mq_open(). The first addend in the formula, which includes sizeof(struct msg_msg *) (4 bytes on Linux/x86), ensures that the user cannot create an unlimited number of zero-length messages (such messages nevertheless each consume some system memory for bookkeeping overhead). | ||
RLIMIT_NICE (since kernel 2.6.12, but see BUGS below) | ||
Specifies a ceiling to which the process’s nice value can be raised using setpriority(2) or nice(2). The actual ceiling for the nice value is calculated as 20 - rlim_cur. (This strangeness occurs because negative numbers cannot be specified as resource limit values, since they typically have special meanings. For example, RLIM_INFINITY typically is the same as -1.) | ||
RLIMIT_NOFILE | ||
Specifies a value one greater than the maximum file descriptor number that can be opened by this process. Attempts (open(),pipe(), dup(), etc.) to exceed this limit yield the error EMFILE. | ||
RLIMIT_NPROC | ||
The maximum number of threads that can be created for the real user ID of the calling process. Upon encountering this limit,fork() fails with the error EAGAIN. | ||
RLIMIT_RSS | ||
Specifies the limit (in pages) of the process’s resident set (the number of virtual pages resident in RAM). This limit only has effect in Linux 2.4.x, x < 30, and there only affects calls tomadvise() specifying MADV_WILLNEED. | ||
RLIMIT_RTPRIO (Since Linux 2.6.12, but see BUGS) | ||
Specifies a ceiling on the real-time priority that may be set for this process using sched_setscheduler(2) andsched_setparam(2). | ||
RLIMIT_SIGPENDING (Since Linux 2.6.8) | ||
Specifies the limit on the number of signals that may be queued for the real user ID of the calling process. Both standard and real-time signals are counted for the purpose of checking this limit. However, the limit is only enforced for sigqueue(2); it is always possible to use kill(2) to queue one instance of any of the signals that are not already queued to the process. | ||
RLIMIT_STACK | ||
The maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)). |
RLIMIT_OFILE is the BSD name for RLIMIT_NOFILE.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | rlim points outside the accessible address space. |
EINVAL | resource is not valid; or, for setrlimit(): rlim->rlim_cur was greater than rlim->rlim_max. |
EPERM | An unprivileged process tried to use setrlimit() to increase a soft or hard limit above the current hard limit; theCAP_SYS_RESOURCE capability is required to do this. Or, the process tried to use setrlimit() to increase the soft or hard RLIMIT_NOFILE limit above the current kernel maximum (NR_OPEN). |
BUGS
In older Linux kernels, the SIGXCPU and SIGKILL signals delivered when a process encountered the soft and hard RLIMIT_CPU limits were delivered one (CPU) second later than they should have been. This was fixed in kernel 2.6.8.
In 2.6.x kernels before 2.6.17, a RLIMIT_CPU limit of 0 is wrongly treated as "no limit" (like RLIM_INFINITY). Since kernel 2.6.17, setting a limit of 0 does have an effect, but is actually treated as a limit of 1 second.
A kernel bug means that RLIMIT_RTPRIO does not work in kernel 2.6.12; the problem is fixed in kernel 2.6.13.
In kernel 2.6.12, there was an off-by-one mismatch between the priority ranges returned by getpriority(2) and RLIMIT_NICE. This had the effect that actual ceiling for the nice value was calculated as 19 - rlim_cur. This was fixed in kernel 2.6.13.
Kernels before 2.4.22 did not diagnose the error EINVAL for setrlimit() when rlim->rlim_cur was greater than rlim->rlim_max.
注意
A child process created via fork(2) inherits its parents resource limits. Resource limits are preserved across execve(2).
遵循于
SVr4, 4.3BSD, POSIX.1-2001. RLIMIT_MEMLOCK and RLIMIT_NPROC derive from BSD and are not specified in POSIX.1-2001; they are present on the BSDs and Linux, but on few other implementations. RLIMIT_RSS derives from BSD and is not specified in POSIX.1-2001; it is nevertheless present on most implementations.RLIMIT_MSGQUEUE, RLIMIT_NICE, RLIMIT_RTPRIO, and RLIMIT_SIGPENDING are Linux specific.
另请参阅
- dup (2)
- fcntl (2)
- fork (2)
- getrusage (2)
- mlock (2)
- mmap (2)
- open (2)
- quotactl (2)
- sbrk (2)
- shmctl (2)
- sigqueue (2)
get_robust_list()函数
get_robust_list, set_robust_list - 获取/设置强健futexes的清单
内容简介
#include <linux/futex.h> long get_robust_list(int pid, struct robust_list_head **head_ptr, size_t * long set_robust_list(struct robust_list_head *head, size_t len); |
描述
The robust futex implementation needs to maintain per-thread lists of robust futexes which are unlocked when the thread exits. These lists are managed in user space, the kernel is only notified about the location of the head of the list.
get_robust_list returns the head of the robust futex list of the thread with TID defined by the pid argument. If pid is 0, the returned head belongs to the current thread.head_ptr is the pointer to the head of the list of robust futexes. The get_robust_listfunction stores the address of the head of the list here. len_ptr is the pointer to the length variable. get_robust_list stores sizeof(**head_ptr) here.
set_robust_list sets the head of the list of robust futexes owned by the current thread to head. len is the size of *head.
返回值
The set_robust_list and get_robust_list functions return zero when the operation is successful, an error code otherwise.
错误
The set_robust_list function fails with EINVAL if the len value does not match the size of structure struct robust_list_head expected by kernel.
The get_robust_list function fails with EPERM if the current process does not have permission to see the robust futex list of the thread with the TID pid, ESRCH if a thread with the TID pid does not exist, or EFAULT if the head of the robust futex list can’t be stored in the space specified by the head argument.
实际应用信息
一个线程只能有一个强大的 futex 清单,因此希望使用该功能的应用程序应该使用的glibc提供强大的互斥体。
系统调用是唯一可用于调试目的,不正常操作所需的。
这两个系统调用是不提供给应用程序的功能,他们可以使用 syscall(3)函数被调用。
另请参阅
getrusage()函数
内容简介
#include <sys/time.h>
#include <sys/resource.h>
int getrusage(int who, struct rusage *usage);
描述
getrusage() 返回当前资源使用,对于无论是 RUSAGE_SELF 或 RUSAGE_CHILDREN. 前者要求所使用当前进程,后者所使用的那些其子已经终止,并且已经在等待资源的资源。
struct rusage { |
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | usage points outside the accessible address space. |
EINVAL | who is invalid. |
遵循于
SVr4, 4.3BSD. POSIX.1-2001 specifies getrusage(), but only specifies the fields ru_utimeand ru_stime.
注意
Including <sys/time.h> is not required these days, but increases portability. (Indeed,struct timeval is defined in <sys/time.h>.)
In Linux kernel versions before 2.6.9, if the disposition of SIGCHLD is set to SIG_IGNthen the resource usages of child processes are automatically included in the value returned by RUSAGE_CHILDREN, although POSIX.1-2001 explicitly prohibits this. This non-conformance is rectified in Linux 2.6.9 and later.
The above struct was taken from 4.3BSD Reno. Not all fields are meaningful under Linux. In linux 2.4 only the fields ru_utime, ru_stime, ru_minflt, and ru_majflt are maintained. Since Linux 2.6, ru_nvcsw and ru_nivcsw are also maintained.
另请参阅
getsid()函数
内容简介
#include <unistd.h>
pid_t getsid(pid_t pid);
描述
getsid(0) 返回调用进程的会话ID. getsid(p) 返回与进程ID的进程的会话ID p. (一个进程的会话ID是会话组长的进程组ID.) On error, (pid_t) -1 will be returned, and errno is set appropriately.
错误
标签 | 描述 |
EPERM | A process with process ID p exists, but it is not in the same session as the current process, and the implementation considers this an error. |
ESRCH | No process with process ID p was found. |
遵循于
SVr4, POSIX.1-2001.
注意
Linux does not return EPERM.
Linux has this system call since Linux 1.3.44. There is libc support since libc 5.2.19.
To get the prototype under glibc, define both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer nlarger than or equal to 500.
另请参阅
getsockname()函数
内容简介
#include <sys/socket.h> int getsockname(int s, struct sockaddr *name, socklen_t *namelen); |
描述
getsockname() 返回当前名称指定的套接字。namelen 参数应被初始化,以指示的空间指向量的名字。返回时,包含名称的实际大小(以字节为单位).
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | The argument s is not a valid descriptor. |
EFAULT | The name parameter points to memory not in a valid part of the process address space. |
EINVAL | namelen is invalid (e.g., is negative). |
ENOBUFS | |
Insufficient resources were available in the system to perform the operation. | |
ENOTSOCK | |
The argument s is a file, not a socket. |
遵循于
SVr4, 4.4BSD (the getsockname() function call appeared in 4.2BSD), POSIX.1-2001.
注意
The third argument of getsockname() is in reality an ‘int *’ (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the present socklen_t, also used by glibc. See also accept(2).
另请参阅
getsockopt()函数
getsockopt, setsockopt - 获取和设置套接字选项
内容简介
#include <sys/types.h>
#include <sys/socket.h>
int getsockopt(int s, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int s, int level, int optname, const void *optval, socklen_t optlen);
描述
getsockopt() and setsockopt() manipulate the options associated with a socket. Options may exist at multiple protocol levels; they are always present at the uppermost socket level.
When manipulating socket options the level at which the option resides and the name of the option must be specified. To manipulate options at the socket level, level is specified as SOL_SOCKET. To manipulate options at any other level the protocol number of the appropriate protocol controlling the option is supplied. For example, to indicate that an option is to be interpreted by the TCP protocol, level should be set to the protocol number of TCP; see getprotoent(3).
The parameters optval and optlen are used to access option values for setsockopt(). Forgetsockopt() they identify a buffer in which the value for the requested option(s) are to be returned. For getsockopt(), optlen is a value-result parameter, initially containing the size of the buffer yiibaied to by optval, and modified on return to indicate the actual size of the value returned. If no option value is to be supplied or returned, optval may be NULL.
Optname and any specified options are passed uninterpreted to the appropriate protocol module for interpretation. The include file <sys/socket.h> contains definitions for socket level options, described below. Options at other protocol levels vary in format and name; consult the appropriate entries in section 4 of the manual.
Most socket-level options utilize an int parameter for optval. For setsockopt(), the parameter should be non-zero to enable a boolean option, or zero if the option is to be disabled.
For a description of the available socket options see socket(7) and the appropriate protocol man pages.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | The argument s is not a valid descriptor. |
EFAULT | The address yiibaied to by optval is not in a valid part of the process address space. For getsockopt(), this error may also be returned if optlen is not in a valid part of the process address space. |
EINVAL | optlen invalid in setsockopt(). |
ENOPROTOOPT | |
The option is unknown at the level indicated. | |
ENOTSOCK | The argument s is a file, not a socket. |
遵循于
SVr4, 4.4BSD (these system calls first appeared in 4.2BSD),
POSIX.1-2001.
注意
The optlen argument of getsockopt and setsockopt is in reality an int [*] (and this is what 4.x BSD and libc4 and libc5 have). Some POSIX confusion resulted in the presentsocklen_t, also used by glibc. See also accept(2).
BUGS
Several of the socket options should be handled at lower levels of the system.
另请参阅
get_thread_area()函数
get_thread_area - 获取一个线程本地存储(TLS)区
内容简介
#include <linux/unistd.h>
#include <asm/ldt.h>
int get_thread_area(struct user_desc *u_info);
描述
get_thread_area() returns an entry in the current thread’s Thread Local Storage (TLS) array. The index of the entry corresponds to the value of u_info->entry_number, passed in by the user. If the value is in bounds, get_thread_info copies the corresponding TLS entry into the area yiibaied to by u_info.
返回值
get_thread_area() returns 0 on success. Otherwise, it returns -1 and sets errno appropriately.
错误
标签 | 描述 |
EFAULT | u_info is an invalid yiibaier. |
EINVAL | u_info->entry_number is out of bounds. |
遵循于
get_thread_area() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
AVAILABILITY
A version of get_thread_area() first appeared in Linux 2.5.32.
另请参阅
gettid()函数
内容简介
#include <sys/types.h> pid_t gettid(void); |
描述
gettid() returns the thread ID of the current process. This is equal to the process ID (as returned by getpid(2)), unless the process is part of a thread group (created by specifying the CLONE_THREAD flag to the clone(2) system call). All processes in the same thread group have the same PID, but each one has a unique TID.
返回值
如果成功,返回当前进程的线程ID。
错误
这个调用永远是成功的。
遵循于
gettid() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
另请参阅
gettimeofday()函数
gettimeofday, settimeofday - 获取/设置时间
内容简介
#include <sys/time.h>
int gettimeofday(struct timeval *tv, struct timezone *tz);
int settimeofday(const struct timeval *tv , const struct timezone *tz);
描述
The functions gettimeofday() and settimeofday() can get and set the time as well as a timezone. The tv argument is a struct timeval (as specified in <sys/time.h>):
struct timeval { |
and gives the number of seconds and microseconds since the Epoch (see time(2)). Thetz argument is a struct timezone:
struct timezone { |
If either tv or tz is NULL, the corresponding structure is not set or returned.
The use of the timezone structure is obsolete; the tz argument should normally be specified as NULL. The tz_dsttime field has never been used under Linux; it has not been and will not be supported by libc or glibc. Each and every occurrence of this field in the kernel source (other than the declaration) is a bug. Thus, the following is purely of historic interest.
The field tz_dsttime contains a symbolic constant (values are given below) that indicates in which part of the year Daylight Saving Time is in force. (Note: its value is constant throughout the year: it does not indicate that DST is in force, it just selects an algorithm.) The daylight saving time algorithms defined are as follows :
DST_NONE /* not on dst */
DST_USA /* USA style dst */
DST_AUST /* Australian style dst */
DST_WET /* Western European dst */
DST_MET /* Middle European dst */
DST_EET /* Eastern European dst */
DST_CAN /* Canada */
DST_GB /* Great Britain and Eire */
DST_RUM /* Rumania */
DST_TUR /* Turkey */
DST_AUSTALT /* Australian style with shift in 1986 */
Of course it turned out that the period in which Daylight Saving Time is in force cannot be given by a simple algorithm, one per country; indeed, this period is determined by unpredictable political decisions. So this method of representing time zones has been abandoned. Under Linux, in a call to settimeofday() the tz_dsttime field should be zero.
Under Linux there is some peculiar ‘warp clock’ semantics associated to thesettimeofday() system call if on the very first call (after booting) that has a non-NULL tzargument, the tv argument is NULL and the tz_minuteswest field is non-zero. In such a case it is assumed that the CMOS clock is on local time, and that it has to be incremented by this amount to get UTC system time. No doubt it is a bad idea to use this feature.
下面的宏定义在一个struct timeval操作:
#define timerisset(tvp)\ |
返回值
gettimeofday() and settimeofday() return 0 for success, or -1 for failure (in which caseerrno is set appropriately).
错误
标签 | 描述 |
EFAULT | One of tv or tz pointed outside the accessible address space. |
EINVAL | Timezone (or something else) is invalid. |
EPERM | The calling process has insufficient privilege to callsettimeofday(); under Linux the CAP_SYS_TIME capability is required. |
注意
The prototype for settimeofday() and the defines for timercmp, timerisset, timerclear,timeradd, timersub are (since glibc2.2.2) only available if _BSD_SOURCE is defined.
Traditionally, the fields of struct timeval were longs.
遵循于
SVr4, 4.3BSD. POSIX.1-2001 describes gettimeofday() but not settimeofday().
另请参阅
getuid()函数
内容简介
#include <unistd.h>
#include <sys/types.h>
uid_t getuid(void);
uid_t geteuid(void);
描述
getuid() 返回当前进程的真实用户ID。
geteuid() 返回当前进程的有效用户ID。
ERRORS
这些函数总是成功的。
CONFORMING TO
POSIX.1-2001, 4.3BSD.
HISTORY
In Unix V6 the getuid() call returned (euid << 8) + uid. Unix V7 introduced separate callsgetuid() and geteuid().
SEE ALSO
getunwind()函数
内容简介
#include <syscall.h> long getunwind (void *buf, size_t buf_size); |
描述
The sys_getunwind function returns size of unwind table, which describes gate page (kernel code that is mapped into user space).
The unwind data is copied to the buffer buf, which has size buf_size. The data is copied only if buf_size is greater than or equal to the size of the unwind data and buf is not NULL. The system call returns the size of the unwind data in both cases.
The first part of the unwind data contains an unwind table. The rest contains the associated unwind info in random order. The unwind table contains a table looking like:
u64 start; (64-bit address of start of function) |
An entry with a START address of zero is the end of table. For more information about the format you can see the IA-64 Software Conventions and Runtime Architecture.
返回值
sys_getunwind 函数返回展开表的大小。
错误
The sys_getunwind function fails with EFAULT if the unwind info can’t be stored in the space specified by the buf argument.
可用性
这个系统调用是仅适用于IA-64架构。
实际应用信息
This system call has been deprecated. It’s highly recommended to get at the kernel’s unwind info by the gate DSO. The address of the ELF header for this DSO is passed to user level via AT_SYSINFO_EHDR.
The system call is not available to application programs as a function; it can be called using the syscall(2) function.
另请参阅
gtty()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver -未实现系统调用
内容简介
未实现系统调用
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
idle()函数
内容简介
#include <unistd.h>
int idle(void);
描述
idle() is an internal system call used during bootstrap. It marks the process’s pages as swappable, lowers its priority, and enters the main scheduling loop. idle() never returns.
Only process 0 may call idle(). Any user process, even a process with superuser permission, will receive EPERM.
返回值
idle() never returns for process 0, and always returns -1 for a user process.
错误
标签 | 描述 |
EPERM | Always, for a user process. |
遵循于
This function is Linux-specific, and should not be used in programs intended to be portable.
注意
Since 2.3.13 this system call does not exist anymore.
outb()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I / O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
它们主要被设计为内部内核使用,但也可以从用户空间使用。
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
inb_p()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
inl()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
inl_p()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
inotify_add_watch()函数
inotify_add_watch - 添加监视到一个初始化的inotify实例
内容简介
#include <sys/inotify.h>
int inotify_add_watch(int fd, const char *pathname, uint32_t mask);
描述
inotify_add_watch() adds a new watch, or modifies an existing watch, for the file whose location is specified in pathname; the caller must have read permission for this file. The fd argument is a file descriptor referring to the inotify instance whose watch list is to be modified. The events to be monitored for pathname are specified in the maskbit-mask argument. See inotify(7) for a description of the bits that can be set in mask.
A successful call to inotify_add_watch() returns the unique watch descriptor associated with pathname for this inotify instance. If pathname was not previously being watched by this inotify instance, then the watch descriptor is newly allocated. If pathname was already being watched, then the descriptor for the existing watch is returned.
The watch descriptor is returned by later read(2)s from the inotify file descriptor. These reads fetch inotify_event structures indicating file system events; the returned watch descriptor identifies the object for which the event occurred.
返回值
On success, inotify_add_watch() returns a non-negative watch descriptor. On error -1 is returned and errno is set appropriately.
错误
标签 | 描述 |
EACCESS | |
Read access to the given file is not permitted. | |
EBADF | The given file descriptor is not valid. |
EFAULT | pathname yiibais outside of the process’s accessible address space. |
EINVAL | The given event mask contains no legal events; or fd is not an inotify file descriptor. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | The user limit on the total number of inotify watches was reached or the kernel failed to allocate a needed resource. |
历史
Inotify was merged into the 2.6.13 Linux kernel.
遵循于
This system call is Linux specific.
另请参阅
inotify_init()函数
内容简介
#include <sys/inotify.h>
int inotify_init(void)
描述
inotify_init() 初始化一个新的inotify实例,并返回一个新的inotify的事件队列相关的文件描述符。
返回值
On success, inotify_init() returns a new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
错误
标签 | 描述 |
EMFILE | The user limit on the total number of inotify instances has been reached. |
ENFILE | The system limit on the total number of file descriptors has been reached. |
ENOMEM | Insufficient kernel memory is available. |
历史
Inotify was merged into the 2.6.13 Linux kernel.
遵循于
This system call is Linux specific.
另请参阅
inotify_rm_watch()函数
inotify_rm_watch - 从inotify实例移除现有的监视
内容简介
#include <sys/inotify.h>
int inotify_rm_watch(int fd, uint32_t wd);
描述
inotify_rm_watch() 删除与从与文件描述符 fd 相关的 inotify 实例的描述符表关联的 wd 监视 .
Removing a watch causes an IN_IGNORED event to be generated for this watch descriptor. (See inotify(7).)
返回值
On success, inotify_rm_watch() returns zero, or -1 if an error occurred (in which case,errno is set appropriately).
错误
标签 | 描述 |
EBADF | fd is not a valid file descriptor. |
EINVAL | The watch descriptor wd is not valid; or fd is not an inotify file descriptor. |
HISTORY
Inotify was merged into the 2.6.13 Linux kernel.
遵循于
This system call is Linux specific.
另请参阅
outb()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
它们主要被设计为内部内核使用,但也可以从用户空间使用。
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
insl()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
insw()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
intro()函数
描述
本手册的第二部分描述了Linux的系统调用。系统调用是一个入口点到Linux内核中。通常情况下,系统调用不直接调用:相反,大多数系统调用都有相应的C库函数的包装而执行所需的步骤(例如,捕获到内核模式),以便调用系统调用。因此,做一个系统调用看起来一样调用一个正常的库函数。
对于Linux系统调用列表,请参阅 syscalls(2).
返回值
On error, most system calls return a negative error number (i.e., the negated value of one of the constants described in errno(3)). The C library wrapper hides this detail from the caller: when a system call returns a negative value, the wrapper copies the absolute value into the errno variable, and returns -1 as the return value of the wrapper.
一个成功的系统调用返回的值取决于调用。许多系统调用返回0表示成功,但有些可以从一个成功的调用返回非零值。详情载于个别的手册页描述。
In some cases, the programmer must define a feature test macro in order to obtain the declaration of a system call from the header file specified in the man page SYNOPSIS section. In such cases, the required macro is described in the man page. For further information on feature test macros, see feature_test_macros(7).
遵循于
某些术语和缩写用于指示的Unix变体和标准在本节所谓符合。看 standards(7).
注意
直接调用
In most cases, it is unnecessary to invoke a system call directly, but there are times when the Standard C library does not implement a nice wrapper function for you. In this case, the programmer must manually invoke the system call using syscall(2). Historically, this was also possible using one of the _syscall macros described in_syscall(2).
作者和版权条款
Look at the header of the manual page source for the author(s) and copyright conditions. Note that these can be different from page to page!
另请参阅
This page is part of release 3.00 of the Linux man-pages project. A description of the project, and information about reporting bugs, can be found at http://www.kernel.org/doc/man-pages/.
inw()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
CONFORMING TO
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
SEE ALSO
inw_p()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
io_cancel()函数
内容简介
#include <libaio.h>
标签 | 描述 |
long io_cancel (aio_context_t ctx_id, struct iocb *iocb, struct io_event *result); |
描述
io_cancel() attempts to cancel an asynchronous I/O operation previously submitted with the io_submit system call. ctx_id is the AIO context ID of the operation to be cancelled. If the AIO context is found, the event will be cancelled and then copied into the memory yiibaied to by result without being placed into the completion queue.
返回值
io_cancel() returns 0 on success; otherwise, it returns one of the errors listed in the "Errors" section.
错误
标签 | 描述 |
EINVAL | The AIO context specified by ctx_id is invalid. |
EFAULT | One of the data structures yiibais to invalid data. |
EAGAIN | The iocb specified was not cancelled. |
ENOSYS | io_cancel() is not implemented on this architecture. |
版本
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
遵循于
io_cancel() is Linux specific and should not be used in programs that are intended to be portable.
另请参阅
io_setup(2), io_destroy(2), io_getevents(2), io_submit(2).
注意
The asynchronous I/O system calls were written by Benjamin LaHaise.
作者
Kent Yoder.
ioctl()函数
内容简介
#include <sys/ioctl.h>
int ioctl(int d, int request, ...);
描述
The ioctl() function manipulates the underlying device parameters of special files. In particular, many operating characteristics of character special files (e.g. terminals) may be controlled with ioctl() requests. The argument d must be an open file descriptor.
The second argument is a device-dependent request code. The third argument is an untyped yiibaier to memory. It’s traditionally char *argp (from the days before void *was valid C), and will be so named for this discussion.
An ioctl() request has encoded in it whether the argument is an in parameter or outparameter, and the size of the argument argp in bytes. Macros and defines used in specifying an ioctl() request are located in the file <sys/ioctl.h>.
返回值
Usually, on success zero is returned. A few ioctl() requests use the return value as an output parameter and return a nonnegative value on success. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | d is not a valid descriptor. |
EFAULT | argp references an inaccessible memory area. |
EINVAL | Request or argp is not valid. |
ENOTTY | d is not associated with a character special device. |
ENOTTY | The specified request does not apply to the kind of object that the descriptor d references. |
注意
In order to use this call, one needs an open file descriptor. Often the open(2) call has unwanted side effects, that can be avoided under Linux by giving it the O_NONBLOCK flag.
遵循于
No single standard. Arguments, returns, and semantics of ioctl(2) vary according to the device driver in question (the call is used as a catch-all for operations that don’t cleanly fit the Unix stream I/O model). See ioctl_list(2) for a list of many of the known ioctl() calls. The ioctl() function call appeared in Version 7 AT&T Unix.
另请参阅
ioctl_list()函数
ioctl_list - 在Linux/i386 中内核的ioctl调用列表
描述
This is Ioctl List 1.3.27, a list of ioctl calls in Linux/i386 kernel 1.3.27. It contains 421 ioctls from /usr/include/{asm,linux}/*.h. For each ioctl, its numerical value, its name, and its argument type are given.
An argument type of ’const struct foo *’ means the argument is input to the kernel. ’struct foo *’ means the kernel outputs the argument. If the kernel uses the argument for both input and output, this is marked with // I-O.
一些读写控制采取更多参数或返回超过一个单一的结构更多的值。这些标记//以上,进一步在一个单独的部分记录。
这个列表是非常不完整的。请电邮修订和批注,Mail: <mec@duracef.shout.net>.
IOCTL结构
ioctl命令的值是32位的常数。原则上这些常量是完全任意的,但人们都试图建立一些结构放进去。
The old Linux situation was that of mostly 16-bit constants, where the last byte is a serial number, and the preceding byte(s) give a type indicating the driver. Sometimes the major number was used: 0x03 for the HDIO_* ioctls, 0x06 for the LP* ioctls. And sometimes one or more ASCII letters were used. For example, TCGETS has value 0x00005401, with 0x54 = ’T’ indicating the terminal driver, and CYGETTIMEOUT has value 0x00435906, with 0x43 0x59 = ’C’ ’Y’ indicating the cyclades driver.
Later (0.98p5) some more information was built into the number. One has 2 direction bits (00: none, 01: write, 10: read, 11: read/write) followed by 14 size bits (giving the size of the argument), followed by an 8-bit type (collecting the ioctls in groups for a common purpose or a common driver), and an 8-bit serial number.
The macros describing this structure live in <asm/ioctl.h> and are _IO(type,nr) and {_IOR,_IOW,_IOWR}(type,nr,size). They use sizeof(size) so that size is a misnomer here: this third parameter is a data type.
Note that the size bits are very unreliable: in lots of cases they are wrong, either because of buggy macros using sizeof(sizeof(struct)), or because of legacy values.
Thus, it seems that the new structure only gave disadvantages: it does not help in checking, but it causes varying values for the various architectures.
返回值
Decent ioctls return 0 on success and -1 on error, while any output value is stored via the argument. However, quite a few ioctls in fact return an output value. This is not yet indicated below.
// Main table.
//
0x00008901 FIOSETOWN const int *
0x00008902 SIOCSPGRP const int *
0x00008903 FIOGETOWN int *
0x00008904 SIOCGPGRP int *
0x00008905 SIOCATMARK int *
0x00008906 SIOCGSTAMP timeval *
//
0x00005401 TCGETS struct termios *
0x00005402 TCSETS const struct termios *
0x00005403 TCSETSW const struct termios *
0x00005404 TCSETSF const struct termios *
0x00005405 TCGETA struct termio *
0x00005406 TCSETA const struct termio *
0x00005407 TCSETAW const struct termio *
0x00005408 TCSETAF const struct termio *
0x00005409 TCSBRK int
0x0000540A TCXONC int
0x0000540B TCFLSH int
0x0000540C TIOCEXCL void
0x0000540D TIOCNXCL void
0x0000540E TIOCSCTTY int
0x0000540F TIOCGPGRP pid_t *
0x00005410 TIOCSPGRP const pid_t *
0x00005411 TIOCOUTQ int *
0x00005412 TIOCSTI const char *
0x00005413 TIOCGWINSZ struct winsize *
0x00005414 TIOCSWINSZ const struct winsize *
0x00005415 TIOCMGET int *
0x00005416 TIOCMBIS const int *
0x00005417 TIOCMBIC const int *
0x00005418 TIOCMSET const int *
0x00005419 TIOCGSOFTCAR int *
0x0000541A TIOCSSOFTCAR const int *
0x0000541B FIONREAD int *
0x0000541B TIOCINQ int *
0x0000541C TIOCLINUX const char * // MORE
0x0000541D TIOCCONS void
0x0000541E TIOCGSERIAL struct serial_struct *
0x0000541F TIOCSSERIAL const struct serial_struct *
0x00005420 TIOCPKT const int *
0x00005421 FIONBIO const int *
0x00005422 TIOCNOTTY void
0x00005423 TIOCSETD const int *
0x00005424 TIOCGETD int *
0x00005425 TCSBRKP int
0x00005426 TIOCTTYGSTRUCT struct tty_struct *
0x00005450 FIONCLEX void
0x00005451 FIOCLEX void
0x00005452 FIOASYNC const int *
0x00005453 TIOCSERCONFIG void
0x00005454 TIOCSERGWILD int *
0x00005455 TIOCSERSWILD const int *
0x00005456 TIOCGLCKTRMIOS struct termios *
0x00005457 TIOCSLCKTRMIOS const struct termios *
0x00005458 TIOCSERGSTRUCT struct async_struct *
0x00005459 TIOCSERGETLSR int *
0x0000545A TIOCSERGETMULTI struct serial_multiport_struct *
0x0000545B TIOCSERSETMULTI const struct serial_multiport_struct *
//
0x000089E0 SIOCAX25GETUID const struct sockaddr_ax25 *
0x000089E1 SIOCAX25ADDUID const struct sockaddr_ax25 *
0x000089E2 SIOCAX25DELUID const struct sockaddr_ax25 *
0x000089E3 SIOCAX25NOUID const int *
0x000089E4 SIOCAX25DIGCTL const int *
0x000089E5 SIOCAX25GETPARMS struct ax25_parms_struct * // I-O
0x000089E6 SIOCAX25SETPARMS const struct ax25_parms-struct *
//
0x00007314 STL_BINTR void
0x00007315 STL_BSTART void
0x00007316 STL_BSTOP void
0x00007317 STL_BRESET void
//
0x00005301 CDROMPAUSE void
0x00005302 CDROMRESUME void
0x00005303 CDROMPLAYMSF const struct cdrom_msf *
0x00005304 CDROMPLAYTRKIND const struct cdrom_ti *
0x00005305 CDROMREADTOCHDR struct cdrom_tochdr *
0x00005306 CDROMREADTOCENTRY struct cdrom_tocentry * // I-O
0x00005307 CDROMSTOP void
0x00005308 CDROMSTART void
0x00005309 CDROMEJECT void
0x0000530A CDROMVOLCTRL const struct cdrom_volctrl *
0x0000530B CDROMSUBCHNL struct cdrom_subchnl * // I-O
0x0000530C CDROMREADMODE2 const struct cdrom_msf * // MORE
0x0000530D CDROMREADMODE1 const struct cdrom_msf * // MORE
0x0000530E CDROMREADAUDIO const struct cdrom_read_audio * // MORE
0x0000530F CDROMEJECT_SW int
0x00005310 CDROMMULTISESSION struct cdrom_multisession * // I-O
0x00005311 CDROM_GET_UPC struct { char [8]; } *
0x00005312 CDROMRESET void
0x00005313 CDROMVOLREAD struct cdrom_volctrl *
0x00005314 CDROMREADRAW const struct cdrom_msf * // MORE
0x00005315 CDROMREADCOOKED const struct cdrom_msf * // MORE
0x00005316 CDROMSEEK const struct cdrom_msf *
//
0x00002000 CM206CTL_GET_STAT int
0x00002001 CM206CTL_GET_LAST_STAT int
//
0x00435901 CYGETMON struct cyclades_monitor *
0x00435902 CYGETTHRESH int *
0x00435903 CYSETTHRESH int
0x00435904 CYGETDEFTHRESH int *
0x00435905 CYSETDEFTHRESH int
0x00435906 CYGETTIMEOUT int *
0x00435907 CYSETTIMEOUT int
0x00435908 CYGETDEFTIMEOUT int *
0x00435909 CYSETDEFTIMEOUT int
//
0x80046601 EXT2_IOC_GETFLAGS int *
0x40046602 EXT2_IOC_SETFLAGS const int *
0x80047601 EXT2_IOC_GETVERSION int *
0x40047602 EXT2_IOC_SETVERSION const int *
//
0x00000000 FDCLRPRM void
0x00000001 FDSETPRM const struct floppy_struct *
0x00000002 FDDEFPRM const struct floppy_struct *
0x00000003 FDGETPRM struct floppy_struct *
0x00000004 FDMSGON void
0x00000005 FDMSGOFF void
0x00000006 FDFMTBEG void
0x00000007 FDFMTTRK const struct format_descr *
0x00000008 FDFMTEND void
0x0000000A FDSETEMSGTRESH int
0x0000000B FDFLUSH void
0x0000000C FDSETMAXERRS const struct floppy_max_errors *
0x0000000E FDGETMAXERRS struct floppy_max_errors *
0x00000010 FDGETDRVTYP struct { char [16]; } *
0x00000014 FDSETDRVPRM const struct floppy_drive_params *
0x00000015 FDGETDRVPRM struct floppy_drive_params *
0x00000016 FDGETDRVSTAT struct floppy_drive_struct *
0x00000017 FDPOLLDRVSTAT struct floppy_drive_struct *
0x00000018 FDRESET int
0x00000019 FDGETFDCSTAT struct floppy_fdc_state *
0x0000001B FDWERRORCLR void
0x0000001C FDWERRORGET struct floppy_write_errors *
0x0000001E FDRAWCMD struct floppy_raw_cmd * // MORE // I-O
0x00000028 FDTWADDLE void
//
0x0000125D BLKROSET const int *
0x0000125E BLKROGET int *
0x0000125F BLKRRPART void
0x00001260 BLKGETSIZE int *
0x00001261 BLKFLSBUF void
0x00001262 BLKRASET int
0x00001263 BLKRAGET int *
0x00000001 FIBMAP int * // I-O
0x00000002 FIGETBSZ int *
//
0x00000301 HDIO_GETGEO struct hd_geometry *
0x00000302 HDIO_GET_UNMASKINTR int *
0x00000304 HDIO_GET_MULTCOUNT int *
0x00000307 HDIO_GET_IDENTITY struct hd_driveid *
0x00000308 HDIO_GET_KEEPSETTINGS int *
0x00000309 HDIO_GET_CHIPSET int *
0x0000030A HDIO_GET_NOWERR int *
0x0000030B HDIO_GET_DMA int *
0x0000031F HDIO_DRIVE_CMD int * // I-O
0x00000321 HDIO_SET_MULTCOUNT int
0x00000322 HDIO_SET_UNMASKINTR int
0x00000323 HDIO_SET_KEEPSETTINGS int
0x00000324 HDIO_SET_CHIPSET int
0x00000325 HDIO_SET_NOWERR int
0x00000326 HDIO_SET_DMA int
//
0x000089F0 EQL_ENSLAVE struct ifreq * // MORE // I-O
0x000089F1 EQL_EMANCIPATE struct ifreq * // MORE // I-O
0x000089F2 EQL_GETSLAVECFG struct ifreq * // MORE // I-O
0x000089F3 EQL_SETSLAVECFG struct ifreq * // MORE // I-O
0x000089F4 EQL_GETMASTRCFG struct ifreq * // MORE // I-O
0x000089F5 EQL_SETMASTRCFG struct ifreq * // MORE // I-O
//
0x000089F0 SIOCDEVPLIP struct ifreq * // I-O
//
0x00005490 PPPIOCGFLAGS int *
0x00005491 PPPIOCSFLAGS const int *
0x00005492 PPPIOCGASYNCMAP int *
0x00005493 PPPIOCSASYNCMAP const int *
0x00005494 PPPIOCGUNIT int *
0x00005495 PPPIOCSINPSIG const int *
0x00005497 PPPIOCSDEBUG const int *
0x00005498 PPPIOCGDEBUG int *
0x00005499 PPPIOCGSTAT struct ppp_stats *
0x0000549A PPPIOCGTIME struct ppp_ddinfo *
0x0000549B PPPIOCGXASYNCMAP struct { int [8]; } *
0x0000549C PPPIOCSXASYNCMAP const struct { int [8]; } *
0x0000549D PPPIOCSMRU const int *
0x0000549E PPPIOCRASYNCMAP const int *
0x0000549F PPPIOCSMAXCID const int *
//
0x000089E0 SIOCAIPXITFCRT const char *
0x000089E1 SIOCAIPXPRISLT const char *
0x000089E2 SIOCIPXCFGDATA struct ipx_config_data *
//
0x00004B60 GIO_FONT struct { char [8192]; } *
0x00004B61 PIO_FONT const struct { char [8192]; } *
0x00004B6B GIO_FONTX struct console_font_desc * // MORE I-O
0x00004B6C PIO_FONTX const struct console_font_desc * //MORE
0x00004B70 GIO_CMAP struct { char [48]; } *
0x00004B71 PIO_CMAP const struct { char [48]; }
0x00004B2F KIOCSOUND int
0x00004B30 KDMKTONE int
0x00004B31 KDGETLED char *
0x00004B32 KDSETLED int
0x00004B33 KDGKBTYPE char *
0x00004B34 KDADDIO int // MORE
0x00004B35 KDDELIO int // MORE
0x00004B36 KDENABIO void // MORE
0x00004B37 KDDISABIO void // MORE
0x00004B3A KDSETMODE int
0x00004B3B KDGETMODE int *
0x00004B3C KDMAPDISP void // MORE
0x00004B3D KDUNMAPDISP void // MORE
0x00004B40 GIO_SCRNMAP struct { char [E_TABSZ]; } *
0x00004B41 PIO_SCRNMAP const struct { char [E_TABSZ]; } *
0x00004B69 GIO_UNISCRNMAP struct { short [E_TABSZ]; } *
0x00004B6A PIO_UNISCRNMAP const struct { short [E_TABSZ]; } *
0x00004B66 GIO_UNIMAP struct unimapdesc * // MORE // I-O
0x00004B67 PIO_UNIMAP const struct unimapdesc * // MORE
0x00004B68 PIO_UNIMAPCLR const struct unimapinit *
0x00004B44 KDGKBMODE int *
0x00004B45 KDSKBMODE int
0x00004B62 KDGKBMETA int *
0x00004B63 KDSKBMETA int
0x00004B64 KDGKBLED int *
0x00004B65 KDSKBLED int
0x00004B46 KDGKBENT struct kbentry * // I-O
0x00004B47 KDSKBENT const struct kbentry *
0x00004B48 KDGKBSENT struct kbsentry * // I-O
0x00004B49 KDSKBSENT const struct kbsentry *
0x00004B4A KDGKBDIACR struct kbdiacrs *
0x00004B4B KDSKBDIACR const struct kbdiacrs *
0x00004B4C KDGETKEYCODE struct kbkeycode * // I-O
0x00004B4D KDSETKEYCODE const struct kbkeycode *
0x00004B4E KDSIGACCEPT int
//
0x00000601 LPCHAR int
0x00000602 LPTIME int
0x00000604 LPABORT int
0x00000605 LPSETIRQ int
0x00000606 LPGETIRQ int *
0x00000608 LPWAIT int
0x00000609 LPCAREFUL int
0x0000060A LPABORTOPEN int
0x0000060B LPGETSTATUS int *
0x0000060C LPRESET void
0x0000060D LPGETSTATS struct lp_stats *
//
0x000089E0 SIOCGETVIFCNT struct sioc_vif_req * // I-O
0x000089E1 SIOCGETSGCNT struct sioc_sg_req * // I-O
//
0x40086D01 MTIOCTOP const struct mtop *
0x801C6D02 MTIOCGET struct mtget *
0x80046D03 MTIOCPOS struct mtpos *
0x80206D04 MTIOCGETCONFIG struct mtconfiginfo *
0x40206D05 MTIOCSETCONFIG const struct mtconfiginfo *
//
0x000089E0 SIOCNRGETPARMS struct nr_parms_struct * // I-O
0x000089E1 SIOCNRSETPARMS const struct nr_parms_struct *
0x000089E2 SIOCNRDECOBS void
0x000089E3 SIOCNRRTCTL const int *
//
0x00009000 DDIOCSDBG const int *
0x00005382 CDROMAUDIOBUFSIZ int
//
0x00005470 TIOCSCCINI void
0x00005471 TIOCCHANINI const struct scc_modem *
0x00005472 TIOCGKISS struct ioctl_command * // I-O
0x00005473 TIOCSKISS const struct ioctl_command *
0x00005474 TIOCSCCSTAT struct scc_stat *
//
0x00005382 SCSI_IOCTL_GET_IDLUN struct { int [2]; } *
0x00005383 SCSI_IOCTL_TAGGED_ENABLE void
0x00005384 SCSI_IOCTL_TAGGED_DISABLE void
0x00005385 SCSI_IOCTL_PROBE_HOST const int * // MORE
//
0x80027501 SMB_IOC_GETMOUNTUID uid_t *
//
0x0000890B SIOCADDRT const struct rtentry * // MORE
0x0000890C SIOCDELRT const struct rtentry * // MORE
0x00008910 SIOCGIFNAME char []
0x00008911 SIOCSIFLINK void
0x00008912 SIOCGIFCONF struct ifconf * // MORE // I-O
0x00008913 SIOCGIFFLAGS struct ifreq * // I-O
0x00008914 SIOCSIFFLAGS const struct ifreq *
0x00008915 SIOCGIFADDR struct ifreq * // I-O
0x00008916 SIOCSIFADDR const struct ifreq *
0x00008917 SIOCGIFDSTADDR struct ifreq * // I-O
0x00008918 SIOCSIFDSTADDR const struct ifreq *
0x00008919 SIOCGIFBRDADDR struct ifreq * // I-O
0x0000891A SIOCSIFBRDADDR const struct ifreq *
0x0000891B SIOCGIFNETMASK struct ifreq * // I-O
0x0000891C SIOCSIFNETMASK const struct ifreq *
0x0000891D SIOCGIFMETRIC struct ifreq * // I-O
0x0000891E SIOCSIFMETRIC const struct ifreq *
0x0000891F SIOCGIFMEM struct ifreq * // I-O
0x00008920 SIOCSIFMEM const struct ifreq *
0x00008921 SIOCGIFMTU struct ifreq * // I-O
0x00008922 SIOCSIFMTU const struct ifreq *
0x00008923 OLD_SIOCGIFHWADDR struct ifreq * // I-O
0x00008924 SIOCSIFHWADDR const struct ifreq * // MORE
0x00008925 SIOCGIFENCAP int *
0x00008926 SIOCSIFENCAP const int *
0x00008927 SIOCGIFHWADDR struct ifreq * // I-O
0x00008929 SIOCGIFSLAVE void
0x00008930 SIOCSIFSLAVE void
0x00008931 SIOCADDMULTI const struct ifreq *
0x00008932 SIOCDELMULTI const struct ifreq *
0x00008940 SIOCADDRTOLD void
0x00008941 SIOCDELRTOLD void
0x00008950 SIOCDARP const struct arpreq *
0x00008951 SIOCGARP struct arpreq * // I-O
0x00008952 SIOCSARP const struct arpreq *
0x00008960 SIOCDRARP const struct arpreq *
0x00008961 SIOCGRARP struct arpreq * // I-O
0x00008962 SIOCSRARP const struct arpreq *
0x00008970 SIOCGIFMAP struct ifreq * // I-O
0x00008971 SIOCSIFMAP const struct ifreq *
//
0x00005100 SNDCTL_SEQ_RESET void
0x00005101 SNDCTL_SEQ_SYNC void
0xC08C5102 SNDCTL_SYNTH_INFO struct synth_info * // I-O
0xC0045103 SNDCTL_SEQ_CTRLRATE int * // I-O
0x80045104 SNDCTL_SEQ_GETOUTCOUNT int *
0x80045105 SNDCTL_SEQ_GETINCOUNT int *
0x40045106 SNDCTL_SEQ_PERCMODE void
0x40285107 SNDCTL_FM_LOAD_INSTR const struct sbi_instrument *
0x40045108 SNDCTL_SEQ_TESTMIDI const int *
0x40045109 SNDCTL_SEQ_RESETSAMPLES const int *
0x8004510A SNDCTL_SEQ_NRSYNTHS int *
0x8004510B SNDCTL_SEQ_NRMIDIS int *
0xC074510C SNDCTL_MIDI_INFO struct midi_info * // I-O
0x4004510D SNDCTL_SEQ_THRESHOLD const int *
0xC004510E SNDCTL_SYNTH_MEMAVL int * // I-O
0x4004510F SNDCTL_FM_4OP_ENABLE const int *
0xCFB85110 SNDCTL_PMGR_ACCESS struct patmgr_info * // I-O
0x00005111 SNDCTL_SEQ_PANIC void
0x40085112 SNDCTL_SEQ_OUTOFBAND const struct seq_event_rec *
0xC0045401 SNDCTL_TMR_TIMEBASE int * // I-O
0x00005402 SNDCTL_TMR_START void
0x00005403 SNDCTL_TMR_STOP void
0x00005404 SNDCTL_TMR_CONTINUE void
0xC0045405 SNDCTL_TMR_TEMPO int * // I-O
0xC0045406 SNDCTL_TMR_SOURCE int * // I-O
0x40045407 SNDCTL_TMR_METRONOME const int *
0x40045408 SNDCTL_TMR_SELECT int * // I-O
0xCFB85001 SNDCTL_PMGR_IFACE struct patmgr_info * // I-O
0xC0046D00 SNDCTL_MIDI_PRETIME int * // I-O
0xC0046D01 SNDCTL_MIDI_MPUMODE const int *
0xC0216D02 SNDCTL_MIDI_MPUCMD struct mpu_command_rec * // I-O
0x00005000 SNDCTL_DSP_RESET void
0x00005001 SNDCTL_DSP_SYNC void
0xC0045002 SNDCTL_DSP_SPEED int * // I-O
0xC0045003 SNDCTL_DSP_STEREO int * // I-O
0xC0045004 SNDCTL_DSP_GETBLKSIZE int * // I-O
0xC0045006 SOUND_PCM_WRITE_CHANNELS int * // I-O
0xC0045007 SOUND_PCM_WRITE_FILTER int * // I-O
0x00005008 SNDCTL_DSP_POST void
0xC0045009 SNDCTL_DSP_SUBDIVIDE int * // I-O
0xC004500A SNDCTL_DSP_SETFRAGMENT int * // I-O
0x8004500B SNDCTL_DSP_GETFMTS int *
0xC0045005 SNDCTL_DSP_SETFMT int * // I-O
0x800C500C SNDCTL_DSP_GETOSPACE struct audio_buf_info *
0x800C500D SNDCTL_DSP_GETISPACE struct audio_buf_info *
0x0000500E SNDCTL_DSP_NONBLOCK void
0x80045002 SOUND_PCM_READ_RATE int *
0x80045006 SOUND_PCM_READ_CHANNELS int *
0x80045005 SOUND_PCM_READ_BITS int *
0x80045007 SOUND_PCM_READ_FILTER int *
0x00004300 SNDCTL_COPR_RESET void
0xCFB04301 SNDCTL_COPR_LOAD const struct copr_buffer *
0xC0144302 SNDCTL_COPR_RDATA struct copr_debug_buf * // I-O
0xC0144303 SNDCTL_COPR_RCODE struct copr_debug_buf * // I-O
0x40144304 SNDCTL_COPR_WDATA const struct copr_debug_buf *
0x40144305 SNDCTL_COPR_WCODE const struct copr_debug_buf *
0xC0144306 SNDCTL_COPR_RUN struct copr_debug_buf * // I-O
0xC0144307 SNDCTL_COPR_HALT struct copr_debug_buf * // I-O
0x4FA44308 SNDCTL_COPR_SENDMSG const struct copr_msg *
0x8FA44309 SNDCTL_COPR_RCVMSG struct copr_msg *
0x80044D00 SOUND_MIXER_READ_VOLUME int *
0x80044D01 SOUND_MIXER_READ_BASS int *
0x80044D02 SOUND_MIXER_READ_TREBLE int *
0x80044D03 SOUND_MIXER_READ_SYNTH int *
0x80044D04 SOUND_MIXER_READ_PCM int *
0x80044D05 SOUND_MIXER_READ_SPEAKER int *
0x80044D06 SOUND_MIXER_READ_LINE int *
0x80044D07 SOUND_MIXER_READ_MIC int *
0x80044D08 SOUND_MIXER_READ_CD int *
0x80044D09 SOUND_MIXER_READ_IMIX int *
0x80044D0A SOUND_MIXER_READ_ALTPCM int *
0x80044D0B SOUND_MIXER_READ_RECLEV int *
0x80044D0C SOUND_MIXER_READ_IGAIN int *
0x80044D0D SOUND_MIXER_READ_OGAIN int *
0x80044D0E SOUND_MIXER_READ_LINE1 int *
0x80044D0F SOUND_MIXER_READ_LINE2 int *
0x80044D10 SOUND_MIXER_READ_LINE3 int *
0x80044D1C SOUND_MIXER_READ_MUTE int *
0x80044D1D SOUND_MIXER_READ_ENHANCE int *
0x80044D1E SOUND_MIXER_READ_LOUD int *
0x80044DFF SOUND_MIXER_READ_RECSRC int *
0x80044DFE SOUND_MIXER_READ_DEVMASK int *
0x80044DFD SOUND_MIXER_READ_RECMASK int *
0x80044DFB SOUND_MIXER_READ_STEREODEVS int *
0x80044DFC SOUND_MIXER_READ_CAPS int *
0xC0044D00 SOUND_MIXER_WRITE_VOLUME int * // I-O
0xC0044D01 SOUND_MIXER_WRITE_BASS int * // I-O
0xC0044D02 SOUND_MIXER_WRITE_TREBLE int * // I-O
0xC0044D03 SOUND_MIXER_WRITE_SYNTH int * // I-O
0xC0044D04 SOUND_MIXER_WRITE_PCM int * // I-O
0xC0044D05 SOUND_MIXER_WRITE_SPEAKER int * // I-O
0xC0044D06 SOUND_MIXER_WRITE_LINE int * // I-O
0xC0044D07 SOUND_MIXER_WRITE_MIC int * // I-O
0xC0044D08 SOUND_MIXER_WRITE_CD int * // I-O
0xC0044D09 SOUND_MIXER_WRITE_IMIX int * // I-O
0xC0044D0A SOUND_MIXER_WRITE_ALTPCM int * // I-O
0xC0044D0B SOUND_MIXER_WRITE_RECLEV int * // I-O
0xC0044D0C SOUND_MIXER_WRITE_IGAIN int * // I-O
0xC0044D0D SOUND_MIXER_WRITE_OGAIN int * // I-O
0xC0044D0E SOUND_MIXER_WRITE_LINE1 int * // I-O
0xC0044D0F SOUND_MIXER_WRITE_LINE2 int * // I-O
0xC0044D10 SOUND_MIXER_WRITE_LINE3 int * // I-O
0xC0044D1C SOUND_MIXER_WRITE_MUTE int * // I-O
0xC0044D1D SOUND_MIXER_WRITE_ENHANCE int * // I-O
0xC0044D1E SOUND_MIXER_WRITE_LOUD int * // I-O
0xC0044DFF SOUND_MIXER_WRITE_RECSRC int * // I-O
//
0x000004D2 UMSDOS_READDIR_DOS struct umsdos_ioctl * // I-O
0x000004D3 UMSDOS_UNLINK_DOS const struct umsdos_ioctl *
0x000004D4 UMSDOS_RMDIR_DOS const struct umsdos_ioctl *
0x000004D5 UMSDOS_STAT_DOS struct umsdos_ioctl * // I-O
0x000004D6 UMSDOS_CREAT_EMD const struct umsdos_ioctl *
0x000004D7 UMSDOS_UNLINK_EMD const struct umsdos_ioctl *
0x000004D8 UMSDOS_READDIR_EMD struct umsdos_ioctl * // I-O
0x000004D9 UMSDOS_GETVERSION struct umsdos_ioctl *
0x000004DA UMSDOS_INIT_EMD void
0x000004DB UMSDOS_DOS_SETUP const struct umsdos_ioctl *
0x000004DC UMSDOS_RENAME_DOS const struct umsdos_ioctl *
//
0x00005600 VT_OPENQRY int *
0x00005601 VT_GETMODE struct vt_mode *
0x00005602 VT_SETMODE const struct vt_mode *
0x00005603 VT_GETSTATE struct vt_stat *
0x00005604 VT_SENDSIG void
0x00005605 VT_RELDISP int
0x00005606 VT_ACTIVATE int
0x00005607 VT_WAITACTIVE int
0x00005608 VT_DISALLOCATE int
0x00005609 VT_RESIZE const struct vt_sizes *
0x0000560A VT_RESIZEX const struct vt_consize *
// More arguments.
Some ioctl’s take a pointer to a structure which contains additional
pointers. These are documented here in alphabetical order.
CDROMREADAUDIO takes an input pointer ’const struct cdrom_read_audio *’.
The ’buf’ field points to an output buffer
of length ’nframes * CD_FRAMESIZE_RAW’.
CDROMREADCOOKED, CDROMREADMODE1, CDROMREADMODE2, and CDROMREADRAW take
an input pointer ’const struct cdrom_msf *’. They use the same pointer
as an output pointer to ’char []’. The length varies by request. For
CDROMREADMODE1, most drivers use ’CD_FRAMESIZE’, but the Optics Storage
driver uses ’OPT_BLOCKSIZE’ instead (both have the numerical value
2048).
CDROMREADCOOKED char [CD_FRAMESIZE]
CDROMREADMODE1 char [CD_FRAMESIZE or OPT_BLOCKSIZE]
CDROMREADMODE2 char [CD_FRAMESIZE_RAW0]
CDROMREADRAW char [CD_FRAMESIZE_RAW]
EQL_ENSLAVE, EQL_EMANCIPATE, EQL_GETSLAVECFG, EQL_SETSLAVECFG,
EQL_GETMASTERCFG, and EQL_SETMASTERCFG take a ’struct ifreq *’.
The ’ifr_data’ field is a pointer to another structure as follows:
EQL_ENSLAVE const struct slaving_request *
EQL_EMANCIPATE const struct slaving_request *
EQL_GETSLAVECFG struct slave_config * // I-O
EQL_SETSLAVECFG const struct slave_config *
EQL_GETMASTERCFG struct master_config *
EQL_SETMASTERCFG const struct master_config *
FDRAWCMD takes a ’struct floppy raw_cmd *’. If ’flags & FD_RAW_WRITE’
is non-zero, then ’data’ points to an input buffer of length ’length’.
If ’flags & FD_RAW_READ’ is non-zero, then ’data’ points to an output
buffer of length ’length’.
GIO_FONTX and PIO_FONTX take a ’struct console_font_desc *’ or
a ’const struct console_font_desc *’, respectively. ’chardata’ points to
a buffer of ’char [charcount]’. This is an output buffer for GIO_FONTX
and an input buffer for PIO_FONTX.
GIO_UNIMAP and PIO_UNIMAP take a ’struct unimapdesc *’ or
a ’const struct unimapdesc *’, respectively. ’entries’ points to a buffer
of ’struct unipair [entry_ct]’. This is an output buffer for GIO_UNIMAP
and an input buffer for PIO_UNIMAP.
KDADDIO, KDDELIO, KDDISABIO, and KDENABIO enable or disable access to
I/O ports. They are essentially alternate interfaces to ’ioperm’.
KDMAPDISP and KDUNMAPDISP enable or disable memory mappings or I/O port
access. They are not implemented in the kernel.
SCSI_IOCTL_PROBE_HOST takes an input pointer ’const int *’, which is a
length. It uses the same pointer as an output pointer to a ’char []’
buffer of this length.
SIOCADDRT and SIOCDELRT take an input pointer whose type depends on
the protocol:
Most protocols const struct rtentry *
AX.25 const struct ax25_route *
NET/ROM const struct nr_route_struct *
SIOCGIFCONF takes a ’struct ifconf *’. The ’ifc_buf’ field points to a
buffer of length ’ifc_len’ bytes, into which the kernel writes a list of
type ’struct ifreq []’.
SIOCSIFHWADDR takes an input pointer whose type depends on the protocol:
Most protocols const struct ifreq *
AX.25 const char [AX25_ADDR_LEN]
TIOCLINUX takes a ’const char *’. It uses this to distinguish several
independent sub-cases. In the table below, ’N + foo’ means ’foo’ after
an N-byte pad. ’struct selection’ is implicitly defined
in ’drivers/char/selection.c’
TIOCLINUX-2 1 + const struct selection *
TIOCLINUX-3 void
TIOCLINUX-4 void
TIOCLINUX-5 4 + const struct { long [8]; } *
TIOCLINUX-6 char *
TIOCLINUX-7 char *
TIOCLINUX-10 1 + const char *
// Duplicate ioctls
This list does not include ioctls in the range SIOCDEVPRIVATE and
SIOCPROTOPRIVATE.
0x00000001 FDSETPRM FIBMAP
0x00000002 FDDEFPRM FIGETBSZ
0x00005382 CDROMAUDIOBUFSIZ SCSI_IOCTL_GET_IDLUN
0x00005402 SNDCTL_TMR_START TCSETS
0x00005403 SNDCTL_TMR_STOP TCSETSW
0x00005404 SNDCTL_TMR_CONTINUE TCSETSF
io_destroy()函数
内容简介
#include <libaio.h>
标签 | 描述 |
int io_destroy (io_context_t ctx); |
描述
io_destroy() removes the asynchronous I/O context from the list of I/O contexts and then destroys it. io_destroy() can also cancel any outstanding asynchronous I/O actions on ctx and block on completion.
返回值
io_destroy() 成功返回0.
错误
标签 | 描述 |
EINVAL | The AIO context specified by ctx is invalid. |
EFAULT | The context yiibaied to is invalid. |
ENOSYS | io_destroy() is not implemented on this architecture. |
遵循于
io_destroy() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
版本
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
另请参阅
io_setup(2), io_submit(2), io_getevents(2), io_cancel(2).
注意
The asynchronous I/O system calls were written by Benjamin LaHaise.
作者
Kent Yoder.
io_getevents()函数
io_getevents - 读取异步I/ O事件从队列中完成
内容简介
#include <linux/time.h>
#include <libaio.h>
标签 | 描述 |
long io_getevents (aio_context_t ctx_id, long min_nr, long nr, struct io_event*events, struct timespec *timeout); |
描述
io_getevents() attempts to read at least min_nr events and up to nr events from the completion queue of the AIO context specified by ctx_id. timeout specifies the amount of time to wait for events, where a NULL timeout waits until at least min_nr events have been seen. Note that timeout is relative and will be updated if not NULL and the operation blocks.
返回值
io_getevents() returns the number of events read: 0 if no events are available or <min_nr if the timeout has elapsed.
错误
标签 | 描述 |
EINVAL | ctx_id is invalid. min_nr is out of range or nr is out of range. |
EFAULT | Either events or timeout is an invalid yiibaier. |
ENOSYS | io_getevents() is not implemented on this architecture. |
遵循于
io_getevents() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
版本
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
另请参阅
io_setup(2), io_submit(2), io_getevents(2), io_cancel(2), io_destroy(2).
注意
The asynchronous I/O system calls were written by Benjamin LaHaise.
作者
Kent Yoder.
ioperm()函数
内容简介
#include <unistd.h> /* for libc5 */
#include <sys/io.h> /* for glibc */
int ioperm(unsigned long from, unsigned long num, int turn_on);
描述
Ioperm sets the port access permission bits for the process for num bytes starting from port address from to the value turn_on. The use of ioperm() requires root privileges.
Only the first 0x3ff I/O ports can be specified in this manner. For more ports, the iopl() function must be used. Permissions are not inherited on fork(), but on exec() they are. This is useful for giving port access permissions to non-privileged tasks.
这个调用主要是为i386体系结构。在许多其它体系结构不存在或将总是返回一个错误。
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | Invalid values for from or num. |
EIO | (on ppc) This call is not supported. |
EPERM | The calling process has insufficient privilege to call ioperm(); theCAP_SYS_RAWIO capability is required. |
遵循于
ioperm() 是Linux特有的,应在拟移植的程序不能使用。
注意
Libc5 treats it as a system call and has a prototype in <unistd.h>. Glibc1 does not have a prototype. Glibc2 has a prototype both in <sys/io.h> and in <sys/perm.h>. Avoid the latter, it is available on i386 only.
另请参阅
iopl()函数
内容简介
#include <sys/io.h>
int iopl(int level);
描述
iopl() 改变当前进程的I/ O特权级别,在级别 level 指定。 .
This call is necessary to allow 8514-compatible X servers to run under Linux. Since these X servers require access to all 65536 I/O ports, the ioperm() call is not sufficient.
In addition to granting unrestricted I/O port access, running at a higher I/O privilege level also allows the process to disable interrupts. This will probably crash the system, and is not recommended.
Permissions are inherited by fork() and exec().
对于一个正常的过程I / O的优先级为0。
这个调用主要是为i386体系结构。在许多其它体系结构不存在或将总是返回一个错误。
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | level is greater than 3. |
ENOSYS | This call is unimplemented. |
EPERM | The calling process has insufficient privilege to call iopl(); theCAP_SYS_RAWIO capability is required. |
遵循于
iopl() is Linux specific and should not be used in processes intended to be portable.
注意
Libc5 treats it as a system call and has a prototype in <unistd.h>. Glibc1 does not have a prototype. Glibc2 has a prototype both in <sys/io.h> and in <sys/perm.h>. Avoid the latter, it is available on i386 only.
另请参阅
ioprio_set()函数
ioprio_get, ioprio_set - 获取/设置I / O调度类和优先级
内容简介
int ioprio_get(int |
描述
ioprio_get() and ioprio_set() 系统调用分别获取和设置一个或多个进程的I / O调度类和优先级。
The which and who arguments identify the process(es) on which the system calls operate. The which argument determines how who is interpreted, and has one of the following values:
标签 | 描述 |
IOPRIO_WHO_PROCESS | |
who is a process ID identifying a single process. | |
IOPRIO_WHO_PGRP | |
who is a process group ID identifying all the members of a process group. | |
IOPRIO_WHO_USER | |
who is a user ID identifying all of the processes that have a matching real UID. | |
If which is specified as IOPRIO_WHO_PGRP or IOPRIO_WHO_USER when callingioprio_get(), and more than one process matches who, then the returned priority will be the highest one found among all of the matching processes. One priority is said to be higher than another one if it belongs to a higher priority class (IOPRIO_CLASS_RT is the highest priority class; IOPRIO_CLASS_IDLE is the lowest) or if it belongs to the same priority class as the other process but has a higher priority level (a lower priority number means a higher priority level). The ioprio argument given to ioprio_set() is a bit mask that specifies both the scheduling class and the priority to be assigned to the target process(es). The following macros are used for assembling and dissecting ioprio values: | |
IOPRIO_PRIO_VALUE(class, data) | |
Given a scheduling class and priority (data), this macro combines the two values to produce an ioprio value, which is returned as the result of the macro. | |
IOPRIO_PRIO_CLASS(mask) | |
Given mask (an ioprio value), this macro returns its I/O class component, that is, one of the values IOPRIO_CLASS_RT,IOPRIO_CLASS_BE, or IOPRIO_CLASS_IDLE. | |
IOPRIO_PRIO_DATA(mask) | |
Given mask (an ioprio value), this macro returns its priority (data) component. |
See the NOTES section for more information on scheduling classes and priorities.
I/O priorities are supported for reads and for synchronous (O_DIRECT, O_SYNC) writes. I/O priorities are not supported for asynchronous writes because they are issued outside the context of the program dirtying the memory, and thus program-specific priorities do not apply.
返回值
On success, ioprio_get() returns the ioprio value of the process with highest I/O priority of any of the processes that match the criteria specified in which and who. On error, -1 is returned, and errno is set to indicate the error.
On success, ioprio_set() returns 0. On error, -1 is returned, and errno is set to indicate the error.
错误
标签 | 描述 |
EPERM | The calling process does not have the privilege needed to assign this ioprio to the specified process(es). See the NOTES section for more information on required privileges forioprio_set(). |
ESRCH | No process(es) could be found that matched the specification inwhich and who. |
EINVAL | Invalid value for which or ioprio. Refer to the NOTES section for available scheduler classes and priority levels for ioprio. |
VERSIONS
These system calls have been available on Linux since kernel 2.6.13.
遵循于
这些系统调用是Linux特有的。
注意
Glibc does not provide wrapper for these system calls; call them using syscall(2).
These system calls only have an effect when used in conjunction with an I/O scheduler that supports I/O priorities. As at kernel 2.6.17 the only such scheduler is the Completely Fair Queuing (CFQ) I/O scheduler.
Selecting an I/O Scheduler
I/O Schedulers are selected on a per-device basis via the special file/sys/block/<device>/queue/scheduler.
One can view the current I/O scheduler via the /sys file system. For example, the following command displays a list of all schedulers currently loaded in the kernel:
$ cat /sys/block/hda/queue/scheduler |
The scheduler surrounded by brackets is the one actually in use for the device (hda in the example). Setting another scheduler is done by writing the name of the new scheduler to this file. For example, the following command will set the scheduler for thehda device to cfq:
$ su |
完全公平队列(CFQ)的I / O调度
Since v3 (aka CFQ Time Sliced) CFQ implements I/O nice levels similar to those of CPU scheduling. These nice levels are grouped in three scheduling classes each one containing one or more priority levels:
标签 | 描述 |
IOPRIO_CLASS_RT (1) | |
This is the real-time I/O class. This scheduling class is given higher priority than any other class: processes from this class are given first access to the disk every time. Thus this I/O class needs to be used with some care: one I/O real-time process can starve the entire system. Within the real-time class, there are 8 levels of class data (priority) that determine exactly how much time this process needs the disk for on each service. The highest real-time priority level is 0; the lowest is 7. In the future this might change to be more directly mappable to performance, by passing in a desired data rate instead. | |
IOPRIO_CLASS_BE (2) | |
This is the best-effort scheduling class, which is the default for any process that hasn’t set a specific I/O priority. The class data (priority) determines how much I/O bandwidth the process will get. Best-effort priority levels are analogous to CPU nice values (see getpriority(2)). The priority level determines a priority relative to other processes in the best-effort scheduling class. Priority levels range from 0 (highest) to 7 (lowest). | |
IOPRIO_CLASS_IDLE (3) | |
This is the idle scheduling class. Processes running at this level only get I/O time when no one else needs the disk. The idle class has no class data. Attention is required when assigning this priority class to a process, since it may become starved if higher priority processes are constantly accessing the disk. |
Refer to Documentation/block/ioprio.txt for more information on the CFQ I/O Scheduler and an example program.
所需的权限设置I/ O优先级
权限更改进程的优先级被授予或拒绝基于两个参数:
标签 | 描述 |
Process ownership | |
An unprivileged process may only set the I/O priority of a process whose real UID matches the real or effective UID of the calling process. A process which has the CAP_SYS_NICEcapability can change the priority of any process. | |
What is the desired priority | |
Attempts to set very high priorities (IOPRIO_CLASS_RT) or very low ones (IOPRIO_CLASS_IDLE) require theCAP_SYS_ADMIN capability. |
A call to ioprio_set() must follow both rules, or the call will fail with the error EPERM.
BUGS
Glibc does not yet provide a suitable header file defining the function prototypes and macros described on this page. Suitable definitions can be found in linux/ioprio.h.
另请参阅
Documentation/block/ioprio.txt in the kernel source tree.
ioprio_set()函数
ioprio_get, ioprio_set - 获取/设置I / O调度类和优先级
内容简介
int ioprio_get(int |
描述
ioprio_get() and ioprio_set() 系统调用分别获取和设置一个或多个进程的I / O调度类和优先级。
The which and who arguments identify the process(es) on which the system calls operate. The which argument determines how who is interpreted, and has one of the following values:
标签 | 描述 |
IOPRIO_WHO_PROCESS | |
who is a process ID identifying a single process. | |
IOPRIO_WHO_PGRP | |
who is a process group ID identifying all the members of a process group. | |
IOPRIO_WHO_USER | |
who is a user ID identifying all of the processes that have a matching real UID. | |
If which is specified as IOPRIO_WHO_PGRP or IOPRIO_WHO_USER when callingioprio_get(), and more than one process matches who, then the returned priority will be the highest one found among all of the matching processes. One priority is said to be higher than another one if it belongs to a higher priority class (IOPRIO_CLASS_RT is the highest priority class; IOPRIO_CLASS_IDLE is the lowest) or if it belongs to the same priority class as the other process but has a higher priority level (a lower priority number means a higher priority level). The ioprio argument given to ioprio_set() is a bit mask that specifies both the scheduling class and the priority to be assigned to the target process(es). The following macros are used for assembling and dissecting ioprio values: | |
IOPRIO_PRIO_VALUE(class, data) | |
Given a scheduling class and priority (data), this macro combines the two values to produce an ioprio value, which is returned as the result of the macro. | |
IOPRIO_PRIO_CLASS(mask) | |
Given mask (an ioprio value), this macro returns its I/O class component, that is, one of the values IOPRIO_CLASS_RT,IOPRIO_CLASS_BE, or IOPRIO_CLASS_IDLE. | |
IOPRIO_PRIO_DATA(mask) | |
Given mask (an ioprio value), this macro returns its priority (data) component. |
See the NOTES section for more information on scheduling classes and priorities.
I/O priorities are supported for reads and for synchronous (O_DIRECT, O_SYNC) writes. I/O priorities are not supported for asynchronous writes because they are issued outside the context of the program dirtying the memory, and thus program-specific priorities do not apply.
返回值
On success, ioprio_get() returns the ioprio value of the process with highest I/O priority of any of the processes that match the criteria specified in which and who. On error, -1 is returned, and errno is set to indicate the error.
On success, ioprio_set() returns 0. On error, -1 is returned, and errno is set to indicate the error.
错误
标签 | 描述 |
EPERM | The calling process does not have the privilege needed to assign this ioprio to the specified process(es). See the NOTES section for more information on required privileges forioprio_set(). |
ESRCH | No process(es) could be found that matched the specification inwhich and who. |
EINVAL | Invalid value for which or ioprio. Refer to the NOTES section for available scheduler classes and priority levels for ioprio. |
VERSIONS
These system calls have been available on Linux since kernel 2.6.13.
遵循于
这些系统调用是Linux特有的。
注意
Glibc does not provide wrapper for these system calls; call them using syscall(2).
These system calls only have an effect when used in conjunction with an I/O scheduler that supports I/O priorities. As at kernel 2.6.17 the only such scheduler is the Completely Fair Queuing (CFQ) I/O scheduler.
Selecting an I/O Scheduler
I/O Schedulers are selected on a per-device basis via the special file/sys/block/<device>/queue/scheduler.
One can view the current I/O scheduler via the /sys file system. For example, the following command displays a list of all schedulers currently loaded in the kernel:
$ cat /sys/block/hda/queue/scheduler |
The scheduler surrounded by brackets is the one actually in use for the device (hda in the example). Setting another scheduler is done by writing the name of the new scheduler to this file. For example, the following command will set the scheduler for thehda device to cfq:
$ su |
完全公平队列(CFQ)的I / O调度
Since v3 (aka CFQ Time Sliced) CFQ implements I/O nice levels similar to those of CPU scheduling. These nice levels are grouped in three scheduling classes each one containing one or more priority levels:
标签 | 描述 |
IOPRIO_CLASS_RT (1) | |
This is the real-time I/O class. This scheduling class is given higher priority than any other class: processes from this class are given first access to the disk every time. Thus this I/O class needs to be used with some care: one I/O real-time process can starve the entire system. Within the real-time class, there are 8 levels of class data (priority) that determine exactly how much time this process needs the disk for on each service. The highest real-time priority level is 0; the lowest is 7. In the future this might change to be more directly mappable to performance, by passing in a desired data rate instead. | |
IOPRIO_CLASS_BE (2) | |
This is the best-effort scheduling class, which is the default for any process that hasn’t set a specific I/O priority. The class data (priority) determines how much I/O bandwidth the process will get. Best-effort priority levels are analogous to CPU nice values (see getpriority(2)). The priority level determines a priority relative to other processes in the best-effort scheduling class. Priority levels range from 0 (highest) to 7 (lowest). | |
IOPRIO_CLASS_IDLE (3) | |
This is the idle scheduling class. Processes running at this level only get I/O time when no one else needs the disk. The idle class has no class data. Attention is required when assigning this priority class to a process, since it may become starved if higher priority processes are constantly accessing the disk. |
Refer to Documentation/block/ioprio.txt for more information on the CFQ I/O Scheduler and an example program.
所需的权限设置I/ O优先级
权限更改进程的优先级被授予或拒绝基于两个参数:
标签 | 描述 |
Process ownership | |
An unprivileged process may only set the I/O priority of a process whose real UID matches the real or effective UID of the calling process. A process which has the CAP_SYS_NICEcapability can change the priority of any process. | |
What is the desired priority | |
Attempts to set very high priorities (IOPRIO_CLASS_RT) or very low ones (IOPRIO_CLASS_IDLE) require theCAP_SYS_ADMIN capability. |
A call to ioprio_set() must follow both rules, or the call will fail with the error EPERM.
BUGS
Glibc does not yet provide a suitable header file defining the function prototypes and macros described on this page. Suitable definitions can be found in linux/ioprio.h.
另请参阅
Documentation/block/ioprio.txt in the kernel source tree.
io_setup()函数
内容简介
#include <libaio.h>
标签 | 描述 |
int io_setup (int maxevents, io_context_t *ctxp); |
描述
io_setup() creates an asynchronous I/O context capable of receiving at leastmaxevents. ctxp must not yiibai to an AIO context that already exists, and must be initialized to 0 prior to the call. On successful creation of the AIO context, *ctxp is filled in with the resulting handle.
返回值
io_setup() returns 0 on success; otherwise, one of the errors listed in the "Errors" section is returned.
错误
标签 | 描述 |
EINVAL | ctxp is not initialized, or the specified maxevents exceeds internal limits. maxevents should be greater than 0. |
EFAULT | An invalid yiibaier is passed for ctxp. |
ENOMEM | Insufficient kernel resources are available. |
EAGAIN | The specified maxevents exceeds the user’s limit of available events. |
ENOSYS | io_setup() is not implemented on this architecture. |
遵循于
io_setup() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
VERSIONS
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
另请参阅
io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2).
注意
The asynchronous I/O system calls were written by Benjamin LaHaise.
AUTHOR
Kent Yoder.
io_submit()函数
内容简介
#include <libaio.h>
标签 | 描述 |
long io_submit (aio_context_t ctx_id, long nr, struct iocb **iocbpp); |
描述
io_submit() queues nr I/O request blocks for processing in the AIO context ctx_id.iocbpp should be an array of nr AIO request blocks, which will be submitted to contextctx_id.
返回值
io_submit() returns the number of iocbs submitted and 0 if nr is zero.
错误
标签 | 描述 |
EINVAL | The aio_context specified by ctx_id is invalid. nr is less than 0. The iocb at *iocbpp[0] is not properly initialized, or the operation specified is invalid for the file descriptor in the iocb. |
EFAULT | One of the data structures yiibais to invalid data. |
EBADF | The file descriptor specified in the first iocb is invalid. |
EAGAIN | Insufficient resources are available to queue any iocbs. |
ENOSYS | io_submit() is not implemented on this architecture. |
遵循于
io_submit() 是Linux特有的,并应在该旨在是可移植的程序不被使用。
版本
The asynchronous I/O system calls first appeared in Linux 2.5, August 2002.
另请参阅
io_setup(2), io_destroy(2), io_getevents(2), io_cancel(2).
注意
The asynchronous I/O system calls were written by Benjamin LaHaise.
作者
Kent Yoder.
ipc()函数
内容简介
int ipc(unsigned int |
描述
ipc() is a common kernel entry point for the System V IPC calls for messages, semaphores, and shared memory. call determines which IPC function to invoke; the other arguments are passed through to the appropriate call.
User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about ipc().
遵循于
ipc() 是Linux特有的,并应在拟移植的程序不能使用。
另请参阅
- msgctl (2)
- msgget (2)
- msgrcv (2)
- msgsnd (2)
- semctl (2)
- semget (2)
- semop (2)
- shmat (2)
- shmctl (2)
- shmdt (2)
- shmget (2)
isastream()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
kexec_load()函数
内容简介
#include <syscall.h> long kexec_load(unsigned long entry, unsigned long nr_segments, |
描述
kexec_load 加载从当前地址空间中的新内核。这个系统调用只能用于由root。
条目是一个指向新加载的可执行映像的入口点。这是内核将跳转到并开始执行新加载的图像的指令的存储器位置。
nr_segments denotes the number of segments which will be passed to kexec_load. The value must not be greater than KEXEC_SEGMENT_MAX.
segments denotes a pointer to the first element of an array of kexec_segmentelements. A kexec_segment element contains the details of a segment to be loaded in memory.
flags Sixteen most significant bits of the flag are used to communicate the architecture information (KEXEC_ARCH_*). The values for various architectures are same as defined by ELF specifications. Lower sixteen bits have been reserved for miscellaneous information. Currently only one bit is being used and rest fifteen have been reserved for future use. The least significant bit (KEXEC_ON_CRASH) can be set to inform the kernel that the memory memory image being loaded is to be executed upon a system crash and not regular boot. For regular boot, this bit is cleared.
返回值
On success, zero is returned. On error, nonzero value is returned, and errno is set appropriately.
错误
EPERM the calling process has not sufficient permissions (is not root).
EINVAL the flags argument contains an invalid combination of flags, or nr_segments is greater than KEXEC_SEGMENT_MAX.
ENOMEM there is not enough memory to store the kernel image.
EBUSY the memory location which should be written to is not available now.
可用性
This syscall is implemented only since kernel 2.6.1
keyctl()函数
内容简介
#include <keyutils.h> long keyctl(int cmd, ...); |
描述
keyctl() 有许多功能可用:
标签 | 描述 |
KEYCTL_GET_KEYRING_ID | |
Ask for a keyring’s ID. | |
KEYCTL_JOIN_SESSION_KEYRING | |
Join or start named session keyring. | |
KEYCTL_UPDATE | |
Update a key. | |
KEYCTL_REVOKE | |
Revoke a key. | |
KEYCTL_CHOWN | |
Set ownership of a key. | |
KEYCTL_SETPERM | |
Set perms on a key. | |
KEYCTL_DESCRIBE | |
Describe a key. | |
KEYCTL_CLEAR | |
Clear contents of a keyring. | |
KEYCTL_LINK | |
Link a key into a keyring. | |
KEYCTL_UNLINK | |
Unlink a key from a keyring. | |
KEYCTL_SEARCH | |
Search for a key in a keyring. | |
KEYCTL_READ | |
Read a key or keyring’s contents. | |
KEYCTL_INSTANTIATE | |
Instantiate a partially constructed key. | |
KEYCTL_NEGATE | |
Negate a partially constructed key. | |
KEYCTL_SET_REQKEY_KEYRING | |
Set default request-key keyring. | |
KEYCTL_SET_TIMEOUT | |
Set timeout on a key. | |
KEYCTL_ASSUME_AUTHORITY | |
Assume authority to instantiate key. |
These are wrapped by libkeyutils into individual functions to permit compiler the compiler to check types. See the See Also section at the bottom.
返回值
On success keyctl() returns the serial number of the key it found. On error, the value -1will be returned and errno will have been set to an appropriate error.
错误
标签 | 描述 |
ENOKEY | No matching key was found or an invalid key was specified. |
EKEYEXPIRED | |
An expired key was found or specified. | |
EKEYREVOKED | |
A revoked key was found or specified. | |
EKEYREJECTED | |
A rejected key was found or specified. | |
EDQUOT | The key quota for the caller’s user would be exceeded by creating a key or linking it to the keyring. |
EACCES | A key operation wasn’t permitted. |
LINKING
Although this is a Linux system call, it is not present in libc but can be found rather inlibkeyutils. When linking, -lkeyutils should be specified to the linker.
另请参阅
add_key(2), request_key(2), keyctl_get_keyring_ID(3), keyctl_join_session_keyring(3), keyctl_update(3), keyctl_revoke(3), keyctl_chown(3), keyctl_setperm(3), keyctl_describe(3), keyctl_clear(3), keyctl_link(3), keyctl_unlink(3), keyctl_search(3), keyctl_read(3), keyctl_instantiate(3), keyctl_negate(3), keyctl_set_reqkey_keyring(3), keyctl_set_timeout(3), keyctl_assume_authority(3), keyctl_describe_alloc(3), keyctl_read_alloc(3), request-key(8)
kill()函数
内容简介
#include <sys/types.h> int kill(pid_t pid, int sig); |
描述
kill() 系统调用可以用来发送任何信号,任何进程组或进程。
If pid is positive, then signal sig is sent to pid.
If pid equals 0, then sig is sent to every process in the process group of the current process.
If pid equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init), but see below.
If pid is less than -1, then sig is sent to every process in the process group -pid.
If sig is 0, then no signal is sent, but error checking is still performed.
For a process to have permission to send a signal it must either be privileged (under Linux: have the CAP_KILL capability), or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the target process. In the case of SIGCONT it suffices when the sending and receiving processes belong to the same session.
返回值
On success (at least one signal was sent), zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | An invalid signal was specified. |
EPERM | The process does not have permission to send the signal to any of the target processes. |
ESRCH | The pid or process group does not exist. Note that an existing process might be a zombie, a process which already committed termination, but has not yet been wait()ed for. |
注意
可发送任务一把手,init进程的唯一信号,是那些已经初始化安装了明确的信号处理程序。这样做是为了保证系统不放倒意外。
POSIX.1-2001 requires that kill(-1,sig) send sig to all processes that the current process may send signals to, except possibly for some implementation-defined system processes. Linux allows a process to signal itself, but on Linux the call kill(-1,sig) does not signal the current process.
POSIX.1-2001 requires that if a process sends a signal to itself, and the sending thread does not have the signal blocked, and no other thread has it unblocked or is waiting for it in sigwait(), at least one unblocked signal must be delivered to the sending thread before the kill().
BUGS
In 2.6 kernels up to and including 2.6.7, there was a bug that meant that when sending signals to a process group, kill() failed with the error EPERM if the caller did have permission to send the signal to any (rather than all) of the members of the process group. Notwithstanding this error return, the signal was still delivered to all of the processes for which the caller had permission to signal.
LINUX HISTORY
Across different kernel versions, Linux has enforced different rules for the permissions required for an unprivileged process to send a signal to another process. In kernels 1.0 to 1.2.2, a signal could be sent if the effective user ID of the sender matched that of the receiver, or the real user ID of the sender matched that of the receiver. From kernel 1.2.3 until 1.3.77, a signal could be sent if the effective user ID of the sender matched either the real or effective user ID of the receiver. The current rules, which conform to POSIX.1-2001, were adopted in kernel 1.3.78.
遵循于
SVr4, 4.3BSD, POSIX.1-2001
另请参阅
killpg()函数
内容简介
#include <signal.h>
int killpg(int pgrp, int sig);
描述
killpg() sends the signal sig to the process group pgrp. See signal(7) for a list of signals. If pgrp is 0, killpg() sends the signal to the sending process’s process group.
(POSIX says: If pgrp is less than or equal to 1, the behaviour is undefined.)
For a process to have permission to send a signal it must either be privileged (under Linux: have the CAP_KILL capability), or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the target process. In the case of SIGCONT it suffices when the sending and receiving processes belong to the same session.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | Sig is not a valid signal number. |
EPERM | The process does not have permission to send the signal to any of the target processes. |
ESRCH | No process can be found in the process group specified by pgrp. |
ESRCH | The process group was given as 0 but the sending process does not have a process group. |
注意
There are various differences between the permission checking in BSD-type systems and System V-type systems. See the POSIX rationale for kill(). A difference not mentioned by POSIX concerns the return value EPERM: BSD documents that no signal is sent and EPERM returned when the permission check failed for at least one target process, while POSIX documents EPERM only when the permission check failed for all target processes.
遵循于
SVr4, 4.4BSD (The killpg() function call first appeared in 4BSD), POSIX.1-2001.
另请参阅
lchown()函数
chown, fchown, lchown -更改文件的所有权
内容简介
#include <sys/types.h>
#include <unistd.h>
int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);
int lchown(const char *path, uid_t owner, gid_t group);
描述
These system calls change the owner and group of the file specified by path or by fd. Only a privileged process (Linux: one with the CAP_CHOWN capability) may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. A privileged process (Linux: with CAP_CHOWN) may change the group arbitrarily.
If the owner or group is specified as -1, then that ID is not changed.
When the owner or group of an executable file are changed by a non-superuser, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown(); the Linux behaviour depends on the kernel version. In case of a non-group-executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not cleared by a chown().
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
ERRORS
Depending on the file system, other errors can be returned. The more general errors forchown() are listed below.
标签 | 描述 |
EACCES | Search permission is denied on a component of the path prefix. (See also path_resolution(2).) |
EFAULT | path yiibais outside your accessible address space. |
ELOOP | Too many symbolic links were encountered in resolving path. |
ENAMETOOLONG | |
path is too long. | |
ENOENT | The file does not exist. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | |
A component of the path prefix is not a directory. | |
EPERM | The calling process did not have the required permissions (see above) to change owner and/or group. |
EROFS | The named file resides on a read-only file system. |
The general errors for fchown() are listed below: | |
EBADF | The descriptor is not valid. |
EIO | A low-level I/O error occurred while modifying the inode. |
ENOENT | See above. |
EPERM | See above. |
EROFS | See above. |
NOTES
In versions of Linux prior to 2.1.81 (and distinct from 2.1.46), chown() did not follow symbolic links. Since Linux 2.1.81, chown() does follow symbolic links, and there is a new system call lchown() that does not follow symbolic links. Since Linux 2.1.86, this new call (that has the same semantics as the old chown()) has got the same syscall number, and chown() got the newly introduced number.
The prototype for fchown() is only available if _BSD_SOURCE is defined.
CONFORMING TO
4.4BSD, SVr4, POSIX.1-2001.
The 4.4BSD version can only be used by the superuser (that is, ordinary users cannot give away files).
RESTRICTIONS
The chown() semantics are deliberately violated on NFS file systems which have UID mapping enabled. Additionally, the semantics of all system calls which access the file contents are violated, because chown() may cause immediate access revocation on already open files. Client side caching may lead to a delay between the time where ownership have been changed to allow access for a user and the time where the file can actually be accessed by the user on other clients.
SEE ALSO
linkat()函数
内容简介
#include <unistd.h> int linkat(int olddirfd, const char *oldpath, int newdirfd, const char *newpath, int flags); |
描述
The linkat() system call operates in exactly the same way as link(2), except for the differences described in this manual page.
If the pathname given in oldpath is relative, then it is interpreted relative to the directory referred to by the file descriptor olddirfd (rather than relative to the current working directory of the calling process, as is done by link(2) for a relative pathname).
If the pathname given in oldpath is relative and olddirfd is the special value AT_FDCWD, then oldpath is interpreted relative to the current working directory of the calling process (like link(2)).
If the pathname given in oldpath is absolute, then olddirfd is ignored.
The interpretation of newpath is as for oldpath, except that a relative pathname is interpreted relative to the directory referred to by the file descriptor newdirfd.
The flags argument is currently unused, and must be specified as 0.
返回值
On success, linkat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for link(2) can also occur for linkat(). The following additional errors can occur for linkat():
标签 | 描述 |
EBADF | olddirfd or newdirfd is not a valid file descriptor. |
ENOTDIR | |
oldpath is a relative path and olddirfd is a file descriptor referring to a file other than a directory; or similar for newpath andnewdirfd |
注意
See openat(2) for an explanation of the need for linkat().
遵循于
这个系统调用是非标准的,但建议列入POSIX.1将来的修订版。
版本
linkat() was added to Linux in kernel 2.6.16.
另请参阅
link()函数
内容简介
#include <unistd.h>
int link(const char *oldpath, const char *newpath);
描述
link() 创建一个新的链接(也称为硬链接)到现有文件中。
If newpath exists it will not be overwritten.
This new name may be used exactly as the old one for any operation; both names refer to the same file (and so have the same permissions and ownership) and it is impossible to tell which name was the `original’.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | Write access to the directory containing newpath is denied, or search permission is denied for one of the directories in the path prefix of oldpath or newpath. (See also path_resolution(2).) |
EEXIST | newpath already exists. |
EFAULT | oldpath or newpath yiibais outside your accessible address space. |
EIO | An I/O error occurred. |
ELOOP | Too many symbolic links were encountered in resolving oldpathor newpath. |
EMLINK | The file referred to by oldpath already has the maximum number of links to it. |
ENAMETOOLONG | |
oldpath or newpath was too long. | |
ENOENT | A directory component in oldpath or newpath does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | The device containing the file has no room for the new directory entry. |
ENOTDIR | |
A component used as a directory in oldpath or newpath is not, in fact, a directory. | |
EPERM | oldpath is a directory. |
EPERM | The filesystem containing oldpath and newpath does not support the creation of hard links. |
EROFS | The file is on a read-only filesystem. |
EXDEV | oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple yiibais, butlink(2) does not work across different mount yiibais, even if the same filesystem is mounted on both.) |
注意
Hard links, as created by link(), cannot span filesystems. Use symlink() if this is required.
POSIX.1-2001 says that link() should dereference oldpath if it is a symbolic link. However, Linux does not do so: if oldpath is a symbolic link, then newpath is created as a (hard) link to the same symbolic link file (i.e., newpath becomes a symbolic link to the same file that oldpath refers to). Some other implementations behave in the same manner as Linux.
遵循于
SVr4, 4.3BSD, POSIX.1-2001 (except as noted above).
BUGS
On NFS file systems, the return code may be wrong in case the NFS server performs the link creation and dies before it can say so. Use stat(2) to find out if the link got created.
另请参阅
listen()函数
内容简介
#include <sys/socket.h>
int listen(int sockfd, int backlog);
描述
To accept connections, a socket is first created with socket(2), a willingness to accept incoming connections and a queue limit for incoming connections are specified withlisten(), and then the connections are accepted with accept(2). The listen() call applies only to sockets of type SOCK_STREAM or SOCK_SEQPACKET.
The backlog parameter defines the maximum length the queue of pending connections may grow to. If a connection request arrives with the queue full the client may receive an error with an indication of ECONNREFUSED or, if the underlying protocol supports retransmission, the request may be ignored so that retries succeed.
注意
The behaviour of the backlog parameter on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using the tcp_max_syn_backlog sysctl. When syncookies are enabled there is no logical maximum length and this sysctl setting is ignored. See tcp(7) for more information.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EADDRINUSE | |
Another socket is already listening on the same port. | |
EBADF | The argument sockfd is not a valid descriptor. |
ENOTSOCK | |
The argument sockfd is not a socket. | |
EOPNOTSUPP | |
The socket is not of a type that supports the listen() operation. |
遵循于
4.4BSD, POSIX.1-2001. The listen() function call first appeared in 4.2BSD.
BUGS
If the socket is of type AF_INET, and the backlog argument is greater than the constantSOMAXCONN (128 in Linux 2.0 & 2.2), it is silently truncated to SOMAXCONN.
另请参阅
_llseek()函数
_llseek - 重新读取/写入文件偏移量
内容简介
#include <sys/types.h> int _llseek(unsigned int fd, unsigned long offset_high, unsigned long offset_low, loff_t *result, unsigned int whence); |
描述
The _llseek() function repositions the offset of the open file associated with the file descriptor fd to (offset_high<<32) | offset_low bytes relative to the beginning of the file, the current position in the file, or the end of the file, depending on whether whence is SEEK_SET, SEEK_CUR, or SEEK_END, respectively.
It returns the resulting file position in the argument result.
返回值
Upon successful completion, _llseek() returns 0. Otherwise, a value of -1 is returned and errno is set to indicate the error.
错误
标签 | 描述 |
EBADF | fd is not an open file descriptor. |
EFAULT | Problem with copying results to user space. |
EINVAL | whence is invalid. |
遵循于
这个函数是Linux特有的,应该在旨在是可移植的程序不被使用。
注意
glibc不提供包装,这个系统调用,它调用 syscall(2).
另请参阅
llseek()函数
内容简介
#include <sys/types.h> int _llseek(unsigned int fd, unsigned long offset_high, unsigned long offset_low, loff_t *result, unsigned int whence); |
描述
The _llseek() function repositions the offset of the open file associated with the file descriptor fd to (offset_high<<32) | offset_low bytes relative to the beginning of the file, the current position in the file, or the end of the file, depending on whether whence isSEEK_SET, SEEK_CUR, or SEEK_END, respectively. It returns the resulting file position in the argument result.
返回值
Upon successful completion, _llseek() returns 0. Otherwise, a value of -1 is returned and errno is set to indicate the error.
错误
标签 | 描述 |
EBADF | fd is not an open file descriptor. |
EFAULT | Problem with copying results to user space. |
EINVAL | whence is invalid. |
遵循于
This function is Linux specific, and should not be used in programs intended to be portable.
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
另请参阅
lock()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用。
内容简介
未实现系统调用。
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
lookup_dcookie()函数
内容简介
int lookup_dcookie(u64 cookie, char * buffer, size_t len);
描述
查找的值cookie中的cookie是一个不透明的标识符,唯一地标识一个特定的目录项中指定的目录项的完整路径。给出的缓冲区填入目录项的完整路径。
For lookup_dcookie() to return successfully, the kernel must still hold a cookie reference to the directory entry.
注意
lookup_dcookie() is a special-purpose system call, currently used only by the oprofile profiler. It relies on a kernel driver to register cookies for directory entries.
The path returned may be suffixed by the string " (deleted)" if the directory entry has been removed.
返回值
On success, lookup_dcookie() returns the length of the path string copied into the buffer. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | The buffer was not valid. |
EINVAL | The kernel has no registered cookie/directory entry mappings at the time of lookup, or the cookie does not refer to a valid directory entry. |
ENAMETOOLONG | |
The name could not fit in the buffer. | |
ENOMEM | The kernel could not allocate memory for the temporary buffer holding the path. |
EPERM | The process does not have the capability CAP_SYS_ADMINrequired to look up cookie values. |
ERANGE | The buffer was not large enough to hold the path of the directory entry. |
遵循于
lookup_dcookie() is Linux-specific.
可用性
Since Linux 2.5.43. The ENAMETOOLONG error return was added in 2.5.70.
lseek()函数
内容简介
#include <sys/types.h>
#include <unistd.h>
off_t lseek(int fildes, off_t offset, int whence);
描述
The lseek() function repositions the offset of the open file associated with the file descriptor fildes to the argument offset according to the directive whence as follows:
标签 | 描述 |
SEEK_SET | |
The offset is set to offset bytes. | |
SEEK_CUR | |
The offset is set to its current location plus offset bytes. | |
SEEK_END | |
The offset is set to the size of the file plus offset bytes. |
The lseek() function allows the file offset to be set beyond the end of the file (but this does not change the size of the file). If data is later written at this yiibai, subsequent reads of the data in the gap (a "hole") return null bytes (’\0’) until data is actually written into the gap.
返回值
Upon successful completion, lseek() returns the resulting offset location as measured in bytes from the beginning of the file. Otherwise, a value of (off_t)-1 is returned and errnois set to indicate the error.
错误
标签 | 描述 |
EBADF | fildes is not an open file descriptor. |
EINVAL | whence is not one of SEEK_SET, SEEK_CUR, SEEK_END; or the resulting file offset would be negative, or beyond the end of a seekable device. |
EOVERFLOW | |
The resulting file offset cannot be represented in an off_t. | |
ESPIPE | fildes is associated with a pipe, socket, or FIFO. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
RESTRICTIONS
Some devices are incapable of seeking and POSIX does not specify which devices must support lseek().
Linux specific restrictions: using lseek() on a tty device returns ESPIPE.
注意
本文档的使用那里的是英文不正确,但维持历史原因。
当与下面的宏转换旧的代码,用于何处替换值:
old | new |
0 | SEEK_SET |
1 | SEEK_CUR |
2 | SEEK_END |
L_SET | SEEK_SET |
L_INCR | SEEK_CUR |
L_XTND | SEEK_END |
SVr1-3 returns long instead of off_t, BSD returns int.
Note that file descriptors created by dup(2) or fork(2) share the current file position yiibaier, so seeking on such files may be subject to race conditions.
另请参阅
lstat()函数
内容简介
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int stat(const char *path, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *path, struct stat *buf);
描述
These functions return information about a file. No permissions are required on the file itself, but — in the case of stat() and lstat() — execute (search) permission is required on all of the directories in path that lead to the file.
stat() stats the file pointed to by path and fills in buf.
lstat() is identical to stat(), except that if path is a symbolic link, then the link itself is stat-ed, not the file that it refers to.
fstat() is identical to stat(), except that the file to be stat-ed is specified by the file descriptor filedes.
All of these system calls return a stat structure, which contains the following fields:
struct stat { |
The st_dev field describes the device on which this file resides.
The st_rdev field describes the device that this file (inode) represents.
The st_size field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symlink is the length of the pathname it contains, without a trailing null byte.
The st_blocks field indicates the number of blocks allocated to the file, 512-byte units. (This may be smaller than st_size/512, for example, when the file has holes.)
The st_blksize field gives the "preferred" blocksize for efficient file system I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.)
Not all of the Linux filesystems implement all of the time fields. Some file system types allow mounting in such a way that file accesses do not cause an update of the st_atimefield. (See ‘noatime’ in mount(8).)
The field st_atime is changed by file accesses, e.g. by execve(2), mknod(2), pipe(2),utime(2) and read(2) (of more than zero bytes). Other routines, like mmap(2), may or may not update st_atime.
The field st_mtime is changed by file modifications, e.g. by mknod(2), truncate(2),utime(2) and write(2) (of more than zero bytes). Moreover, st_mtime of a directory is changed by the creation or deletion of files in that directory. The st_mtime field is notchanged for changes in owner, group, hard link count, or mode.
The field st_ctime is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.).
The following POSIX macros are defined to check the file type using the st_mode field:
标签 | 描述 |
S_ISREG(m) | is it a regular file? |
S_ISDIR(m) | directory? |
S_ISCHR(m) | character device? |
S_ISBLK(m) | block device? |
S_ISFIFO(m) | FIFO (named pipe)? |
S_ISLNK(m) | symbolic link? (Not in POSIX.1-1996.) |
S_ISSOCK(m) | socket? (Not in POSIX.1-1996.) |
The following flags are defined for the st_mode field:
S_IFMT | 0170000 | bitmask for the file type bitfields |
S_IFSOCK | 0140000 | socket |
S_IFLNK | 0120000 | symbolic link |
S_IFREG | 0100000 | regular file |
S_IFBLK | 0060000 | block device |
S_IFDIR | 0040000 | directory |
S_IFCHR | 0020000 | character device |
S_IFIFO | 0010000 | FIFO |
S_ISUID | 0004000 | set UID bit |
S_ISGID | 0002000 | set-group-ID bit (see below) |
S_ISVTX | 0001000 | sticky bit (see below) |
S_IRWXU | 00700 | mask for file owner permissions |
S_IRUSR | 00400 | owner has read permission |
S_IWUSR | 00200 | owner has write permission |
S_IXUSR | 00100 | owner has execute permission |
S_IRWXG | 00070 | mask for group permissions |
S_IRGRP | 00040 | group has read permission |
S_IWGRP | 00020 | group has write permission |
S_IXGRP | 00010 | group has execute permission |
S_IRWXO | 00007 | mask for permissions for others (not in group) |
S_IROTH | 00004 | others have read permission |
S_IWOTH | 00002 | others have write permission |
S_IXOTH | 00001 | others have execute permission |
The set-group-ID bit (S_ISGID) has several special uses. For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective group ID of the creating process, and directories created there will also get the S_ISGID bit set. For a file that does not have the group execution bit (S_IXGRP) set, the set-group-ID bit indicates mandatory file/record locking.
The ‘sticky’ bit (S_ISVTX) on a directory means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, and by a privileged process.
LINUX 注意
Since kernel 2.5.48, the stat structure supports nanosecond resolution for the three file timestamp fields. Glibc exposes the nanosecond component of each field using names either of the form st_atim.tv_nsec, if the _BSD_SOURCE or _SVID_SOURCE feature test macro is defined, or of the form st_atimensec, if neither of these macros is defined. On file systems that do not support sub-second timestamps, these nanosecond fields are returned with the value 0.
For most files under the /proc directory, stat() does not return the file size in the st_sizefield; instead the field is returned with the value 0.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | Search permission is denied for one of the directories in the path prefix of path. (See also path_resolution(2).) |
EBADF | filedes is bad. |
EFAULT | Bad address. |
ELOOP | Too many symbolic links encountered while traversing the path. |
ENAMETOOLONG | |
File name too long. | |
ENOENT | A component of the path path does not exist, or the path is an empty string. |
ENOMEM | Out of memory (i.e. kernel memory). |
ENOTDIR | |
A component of the path is not a directory. |
遵循于
These system calls conform to SVr4, 4.3BSD, POSIX.1-2001.
Use of the st_blocks and st_blksize fields may be less portable. (They were introduced in BSD. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.)
POSIX does not describe the S_IFMT, S_IFSOCK, S_IFLNK, S_IFREG, S_IFBLK, S_IFDIR, S_IFCHR, S_IFIFO, S_ISVTX bits, but instead demands the use of the macros S_ISDIR(), etc. The S_ISLNK and S_ISSOCK macros are not in POSIX.1-1996, but both are present in POSIX.1-2001; the former is from SVID 4, the latter from SUSv2.
Unix V7 (and later systems) had S_IREAD, S_IWRITE, S_IEXEC, where POSIX prescribes the synonyms S_IRUSR, S_IWUSR, S_IXUSR.
其它系统
Values that have been (or are) in use on various systems:
hex | name | ls | octal | description |
f000 | S_IFMT | 170000 | mask for file type | |
0000 | 000000 | SCO out-of-service inode, BSD unknown type | ||
SVID-v2 and XPG2 have both 0 and 0100000 for ordinary file | ||||
1000 | S_IFIFO | p| | 010000 | FIFO (named pipe) |
2000 | S_IFCHR | c | 020000 | character special (V7) |
3000 | S_IFMPC | 030000 | multiplexed character special (V7) | |
4000 | S_IFDIR | d/ | 040000 | directory (V7) |
5000 | S_IFNAM | 050000 | XENIX named special file | |
with two subtypes, distinguished by st_rdev values 1, 2: | ||||
0001 | S_INSEM | s | 000001 | XENIX semaphore subtype of IFNAM |
0002 | S_INSHD | m | 000002 | XENIX shared data subtype of IFNAM |
6000 | S_IFBLK | b | 060000 | block special (V7) |
7000 | S_IFMPB | 070000 | multiplexed block special (V7) | |
8000 | S_IFREG | - | 100000 | regular (V7) |
9000 | S_IFCMP | 110000 | VxFS compressed | |
9000 | S_IFNWK | n | 110000 | network special (HP-UX) |
a000 | S_IFLNK | l@ | 120000 | symbolic link (BSD) |
b000 | S_IFSHAD | 130000 | Solaris shadow inode for ACL (not seen by userspace) | |
c000 | S_IFSOCK | s= | 140000 | socket (BSD; also "S_IFSOC" on VxFS) |
d000 | S_IFDOOR | D> | 150000 | Solaris door |
e000 | S_IFWHT | w% | 160000 | BSD whiteout (not used for inode) |
0200 | S_ISVTX | 001000 | ‘sticky bit’: save swapped text even after use (V7) | |
reserved (SVID-v2) | ||||
On non-directories: don’t cache this file (SunOS) | ||||
On directories: restricted deletion flag (SVID-v4.2) | ||||
0400 | S_ISGID | 002000 | set-group-ID on execution (V7) | |
for directories: use BSD semantics for propagation of GID | ||||
0400 | S_ENFMT | 002000 | SysV file locking enforcement (shared with S_ISGID) | |
0800 | S_ISUID | 004000 | set-user-ID on execution (V7) | |
0800 | S_CDF | 004000 | directory is a context dependent file (HP-UX) |
A sticky command appeared in Version 32V AT&T UNIX.
另请参阅
madvise()函数
内容简介
#include <sys/mman.h>
int madvise(void *start, size_t length, int advice);
描述
The madvise() system call advises the kernel about how to handle paging input/output in the address range beginning at address start and with size length bytes. It allows an application to tell the kernel how it expects to use some mapped or shared memory areas, so that the kernel can choose appropriate read-ahead and caching techniques. This call does not influence the semantics of the application (except in the case ofMADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.
The advice is indicated in the advice parameter which can be
标签 | 描述 |
MADV_NORMAL | |
No special treatment. This is the default. | |
MADV_RANDOM | |
Expect page references in random order. (Hence, read ahead may be less useful than normally.) | |
MADV_SEQUENTIAL | |
Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.) | |
MADV_WILLNEED | |
Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.) | |
MADV_DONTNEED | |
Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.) Subsequent accesses of pages in this range will succeed, but will result either in re-loading of the memory contents from the underlying mapped file (see mmap()) or zero-fill-on-demand pages for mappings without an underlying file. |
返回值
On success madvise() returns zero. On error, it returns -1 and errno is set appropriately.
错误
标签 | 描述 |
EAGAIN | A kernel resource was temporarily unavailable. |
EBADF | The map exists, but the area maps something that isn’t a file. |
EINVAL | The value len is negative, start is not page-aligned, advice is not a valid value, or the application is attempting to release locked or shared pages (with MADV_DONTNEED). |
EIO | (for MADV_WILLNEED) Paging in this area would exceed the process’s maximum resident set size. |
ENOMEM | (for MADV_WILLNEED) Not enough memory: paging in failed. |
ENOMEM | Addresses in the specified range are not currently mapped, or are outside the address space of the process. |
LINUX 注意事项
The current Linux implementation (2.4.0) views this system call more as a command than as advice and hence may return an error when it cannot do what it usually would do in response to this advice. (See the ERRORS description above.) This is nonstandard behaviour.
The Linux implementation requires that the address start be page-aligned, and allowslength to be zero. If there are some parts of the specified address range that are not mapped, the Linux version of madvise() ignores them and applies the call to the rest (but returns ENOMEM from the system call, as it should).
历史
The madvise() function first appeared in 4.4BSD.
遵循于
POSIX.1b. POSIX.1-2001 describes posix_madvise() with constants POSIX_MADV_NORMAL, etc., with a behaviour close to that described here. There is a similar posix_fadvise() for file access.
另请参阅
mincore()函数
内容简介
#include <unistd.h>
#include <sys/mman.h>
int mincore(void *start, size_t length, unsigned char *vec);
描述
The mincore() function requests a vector describing which pages of a file are in core and can be read without disk access. The kernel will supply data for length bytes following the start address. On return, the kernel will have filled vec with bytes, of which the least significant bit indicates if a page is core resident. (The other bits are undefined, reserved for possible later use.) Of course this is only a snapshot: pages that are not locked in core can come and go any moment, and the contents of vec may be stale already when this call returns.
For mincore() to return successfully, start must lie on a page boundary. It is the caller’s responsibility to round up to the nearest page. The length parameter need not be a multiple of the page size. The vector vec must be large enough to contain (length+PAGE_SIZE-1) / PAGE_SIZE bytes. One may obtain the page size fromgetpagesize(2).
返回值
On success, mincore() returns zero. On error, -1 is returned, and errno is set appropriately.
错误
EAGAIN kernel is temporarily out of resources
标签 | 描述 |
EFAULT | vec yiibais to an invalid address |
EINVAL | start is not a multiple of the page size. |
ENOMEM | len is greater than (TASK_SIZE - start). (This could occur if a negative value is specified for len, since that value will be interpreted as a large unsigned integer.) In Linux 2.6.11 and earlier, the error EINVAL was returned for this condition. |
ENOMEM | address to address + length contained unmapped memory, or memory not part of a file. |
BUGS
Up to now (Linux 2.6.5), mincore() does not return correct information for MAP_PRIVATE mappings.
遵循于
mincore() is not specified in POSIX.1-2001, and it is not available on all Unix implementations.
历史
The mincore() function first appeared in 4.4BSD.
可用性
Since Linux 2.3.99pre1 and glibc 2.2.
另请参阅
mkdirat()函数
内容简介
#include <sys/stat.h> int mkdirat(int dirfd, const char *pathname, mode_t mode); |
描述
The mkdirat() system call operates in exactly the same way as mkdir(2), except for the differences described in this manual page.
If the pathname given in pathname is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by mkdir(2) for a relative pathname).
If the pathname given in pathname is relative and dirfd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like mkdir(2)).
If the pathname given in pathname is absolute, then dirfd is ignored.
返回值
On success, mkdirat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for mkdir(2) can also occur for mkdirat(). The following additional errors can occur for mkdirat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
ENOTDIR | |
pathname is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
See openat(2) for an explanation of the need for mkdirat().
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1.
版本
mkdirat() was added to Linux in kernel 2.6.16.
另请参阅
mkdir()函数
内容简介
#include <sys/stat.h> int mkdir(const char *pathname, mode_t mode); |
描述
mkdir() 试图创建一个名为路径名的目录。 pathname.
The parameter mode specifies the permissions to use. It is modified by the process’sumask in the usual way: the permissions of the created directory are (mode & ~umask & 0777). Other mode bits of the created directory depend on the operating system. For Linux, see below.
The newly created directory will be owned by the effective user ID of the process. If the directory containing the file has the set-group-ID bit set, or if the filesystem is mounted with BSD group semantics, the new directory will inherit the group ownership from its parent; otherwise it will be owned by the effective group ID of the process.
If the parent directory has the set-group-ID bit set then so will the newly created directory.
返回值
mkdir() returns zero on success, or -1 if an error occurred (in which case, errno is set appropriately).
错误
标签 | 描述 |
EACCES | The parent directory does not allow write permission to the process, or one of the directories in pathname did not allow search permission. (See also path_resolution(2).) |
EEXIST | pathname already exists (not necessarily as a directory). This includes the case where pathname is a symbolic link, dangling or not. |
EFAULT | pathname points outside your accessible address space. |
ELOOP | Too many symbolic links were encountered in resolvingpathname. |
ENAMETOOLONG | |
pathname was too long. | |
ENOENT | A directory component in pathname does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | The device containing pathname has no room for the new directory. |
ENOSPC | The new directory cannot be created because the user’s disk quota is exhausted. |
ENOTDIR | |
A component used as a directory in pathname is not, in fact, a directory. | |
EPERM | The filesystem containing pathname does not support the creation of directories. |
EROFS | pathname refers to a file on a read-only filesystem. |
遵循于
SVr4, BSD, POSIX.1-2001.
注意
Under Linux apart from the permission bits, only the S_ISVTX mode bit is honored. That is, under Linux the created directory actually gets mode (mode & ~umask & 01777). See also stat(2).
There are many infelicities in the protocol underlying NFS. Some of these affect mkdir().
另请参阅
- mkdir (1)
- chmod (2)
- mkdirat (2)
- mknod (2)
- mount (2)
- path_resolution (2)
- rmdir (2)
- stat (2)
- umask (2)
- unlink (2)
mknod()函数
内容简介
#include <sys/types.h> int mknod(const char *pathname, mode_t mode, dev_t dev); |
描述
The system call mknod() creates a filesystem node (file, device special file or named pipe) named pathname, with attributes specified by mode and dev.
The mode argument specifies both the permissions to use and the type of node to be created. It should be a combination (using bitwise OR) of one of the file types listed below and the permissions for the new node.
The permissions are modified by the process’s umask in the usual way: the permissions of the created node are (mode & ~umask).
The file type must be one of S_IFREG, S_IFCHR, S_IFBLK, S_IFIFO or S_IFSOCK to specify a normal file (which will be created empty), character special file, block special file, FIFO (named pipe), or Unix domain socket, respectively. (Zero file type is equivalent to type S_IFREG.)
If the file type is S_IFCHR or S_IFBLK then dev specifies the major and minor numbers of the newly created device special file; otherwise it is ignored.
If pathname already exists, or is a symbolic link, this call fails with an EEXIST error.
The newly created node will be owned by the effective user ID of the process. If the directory containing the node has the set-group-ID bit set, or if the filesystem is mounted with BSD group semantics, the new node will inherit the group ownership from its parent directory; otherwise it will be owned by the effective group ID of the process.
返回值
mknod() returns zero on success, or -1 if an error occurred (in which case, errno is set appropriately).
错误
标签 | 描述 |
EACCES | The parent directory does not allow write permission to the process, or one of the directories in the path prefix of pathnamedid not allow search permission. (See also path_resolution(2).) |
EEXIST | pathname already exists. |
EFAULT | pathname points outside your accessible address space. |
EINVAL | mode requested creation of something other than a normal file, device special file, FIFO or socket. |
ELOOP | Too many symbolic links were encountered in resolvingpathname. |
ENAMETOOLONG | |
pathname was too long. | |
ENOENT | A directory component in pathname does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | The device containing pathname has no room for the new node. |
ENOTDIR | |
A component used as a directory in pathname is not, in fact, a directory. | |
EPERM | mode requested creation of something other than a regular file, FIFO (named pipe), or Unix domain socket, and the caller is not privileged (Linux: does not have the CAP_MKNOD capability); also returned if the filesystem containing pathname does not support the type of node requested. |
EROFS | pathname refers to a file on a read-only filesystem. |
遵循于
SVr4, 4.4BSD, POSIX.1-2001 (but see below).
注意
POSIX.1-2001 says: "The only portable use of mknod() is to create a FIFO-special file. Ifmode is not S_IFIFO or dev is not 0, the behavior of mknod() is unspecified."
Under Linux, this call cannot be used to create directories. One should make directories with mkdir(2), and FIFOs with mkfifo(2).
There are many infelicities in the protocol underlying NFS. Some of these affect mknod().
另请参阅
mlockall()函数
mlock, munlock, mlockall, munlockall - 锁定和解锁内存
内容简介
#include <sys/mman.h> int mlock(const void *addr, size_t len); int munlock(const void *addr, size_t len); int mlockall(int flags); int munlockall(void); |
描述
mlock() and mlockall() respectively lock part or all of the calling process’s virtual address space into RAM, preventing that memory from being paged to the swap area.munlock() and munlockall() perform the converse operation, respectively unlocking part or all of the calling process’s virtual address space, so that pages in the specified virtual address range may once more to be swapped out if required by the kernel memory manager. Memory locking and unlocking are performed in units of whole pages.
mlock() and munlock()
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
munlock() unlocks pages in the address range starting at addr and continuing for lenbytes. After this call, all pages that contain a part of the specified memory range can be moved to external swap space again by the kernel.
mlockall() and munlockall()
mlockall() locks all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory, and memory-mapped files. All mapped pages are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
The flags argument is constructed as the bitwise OR of one or more of the following constants:
标签 | 描述 |
MCL_CURRENT | Lock all pages which are currently mapped into the address space of the process. |
MCL_FUTURE | Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions. |
If MCL_FUTURE has been specified, then a later system call (e.g., mmap(2), sbrk(2),malloc(3)), may fail if it would cause the number of locked bytes to exceed the permitted maximum (see below). In the same circumstances, stack growth may likewise fail: the kernel will deny stack expansion and deliver a SIGSEGV signal to the process.
munlockall() unlocks all pages mapped into the address space of the calling process.
注意
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler withsched_setscheduler(2). Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system’s RAM to disk, regardless of memory locks.)
Real-time processes that are using mlockall() to prevent delays on page faults should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and writes to the memory occupied by this array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.
Memory locks are not inherited by a child created via fork(2) and are automatically removed (unlocked) during an execve(2) or when the process terminates.
The memory lock on an address range is automatically removed if the address range is unmapped via munmap(2).
Memory locks do not stack, i.e., pages which have been locked several times by calls tomlock() or mlockall() will be unlocked by a single call to munlock() for the corresponding range or by munlockall(). Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
LINUX 注意事项
Under Linux, mlock() and munlock() automatically round addr down to the nearest page boundary. However, POSIX.1-2001 allows an implementation to require that addr is page aligned, so portable applications should ensure this.
限额和权限
In Linux 2.6.8 and earlier, a process must be privileged (CAP_IPC_LOCK) in order to lock memory and the RLIMIT_MEMLOCK soft resource limit defines a limit on how much memory the process may lock.
Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process can lock and the RLIMIT_MEMLOCK soft resource limit instead defines a limit on how much memory an unprivileged process may lock.
返回值
On success these system calls return 0. On error, -1 is returned, errno is set appropriately, and no changes are made to any locks in the address space of the process.
错误
标签 | 描述 |
ENOMEM | (Linux 2.6.9 and later) the caller had a non-zeroRLIMIT_MEMLOCK soft resource limit, but tried to lock more memory than the limit permitted. This limit is not enforced if the process is privileged (CAP_IPC_LOCK). |
ENOMEM | (Linux 2.4 and earlier) the calling process tried to lock more than half of RAM. |
EPERM | (Linux 2.6.9 and later) the caller was not privileged (CAP_IPC_LOCK) and its RLIMIT_MEMLOCK soft resource limit was 0. |
EPERM | (Linux 2.6.8 and earlier) The calling process has insufficient privilege to call munlockall(). Under Linux the CAP_IPC_LOCKcapability is required. |
For mlock() and munlock(): | |
EINVAL | len was negative. |
EINVAL | (Not on Linux) addr was not a multiple of the page size. |
ENOMEM | Some of the specified address range does not correspond to mapped pages in the address space of the process. |
For mlockall(): | |
EINVAL | Unknown flags were specified. |
For munlockall(): | |
EPERM | (Linux 2.6.8 and earlier) The caller was not privileged (CAP_IPC_LOCK). |
BUGS
In the 2.4 series Linux kernels up to and including 2.4.17, a bug caused the mlockall()MCL_FUTURE flag to be inherited across a fork(2). This was rectified in kernel 2.4.18.
Since kernel 2.6.9, if a privileged process calls mlockall(MCL_FUTURE) and later drops privileges (loses the CAP_IPC_LOCK capability by, for example, setting its effective UID to a non-zero value), then subsequent memory allocations (e.g., mmap(2), brk(2)) will fail if the RLIMIT_MEMLOCK resource limit is encountered.
可用性
On POSIX systems on which mlock() and munlock() are available,_POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the number of bytes in a page can be determined from the constant PAGESIZE (if defined) in <limits.h> or by calling sysconf(_SC_PAGESIZE).
On POSIX systems on which mlockall() and munlockall() are available,_POSIX_MEMLOCK is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
POSIX.1-2001, SVr4
另请参阅
mlock()函数
mlock, munlock, mlockall, munlockall - 锁定和解锁内存
内容简介
#include <sys/mman.h> int mlock(const void *addr, size_t len); int munlock(const void *addr, size_t len); int mlockall(int flags); int munlockall(void); |
描述
mlock() and mlockall() respectively lock part or all of the calling process’s virtual address space into RAM, preventing that memory from being paged to the swap area.munlock() and munlockall() perform the converse operation, respectively unlocking part or all of the calling process’s virtual address space, so that pages in the specified virtual address range may once more to be swapped out if required by the kernel memory manager. Memory locking and unlocking are performed in units of whole pages.
mlock() and munlock()
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
munlock() unlocks pages in the address range starting at addr and continuing for lenbytes. After this call, all pages that contain a part of the specified memory range can be moved to external swap space again by the kernel.
mlockall() and munlockall()
mlockall() locks all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory, and memory-mapped files. All mapped pages are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
The flags argument is constructed as the bitwise OR of one or more of the following constants:
标签 | 描述 |
MCL_CURRENT | Lock all pages which are currently mapped into the address space of the process. |
MCL_FUTURE | Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions. |
If MCL_FUTURE has been specified, then a later system call (e.g., mmap(2), sbrk(2),malloc(3)), may fail if it would cause the number of locked bytes to exceed the permitted maximum (see below). In the same circumstances, stack growth may likewise fail: the kernel will deny stack expansion and deliver a SIGSEGV signal to the process.
munlockall() unlocks all pages mapped into the address space of the calling process.
注意
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler withsched_setscheduler(2). Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system’s RAM to disk, regardless of memory locks.)
Real-time processes that are using mlockall() to prevent delays on page faults should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and writes to the memory occupied by this array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.
Memory locks are not inherited by a child created via fork(2) and are automatically removed (unlocked) during an execve(2) or when the process terminates.
The memory lock on an address range is automatically removed if the address range is unmapped via munmap(2).
Memory locks do not stack, i.e., pages which have been locked several times by calls tomlock() or mlockall() will be unlocked by a single call to munlock() for the corresponding range or by munlockall(). Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
LINUX 注意事项
Under Linux, mlock() and munlock() automatically round addr down to the nearest page boundary. However, POSIX.1-2001 allows an implementation to require that addr is page aligned, so portable applications should ensure this.
限额和权限
In Linux 2.6.8 and earlier, a process must be privileged (CAP_IPC_LOCK) in order to lock memory and the RLIMIT_MEMLOCK soft resource limit defines a limit on how much memory the process may lock.
Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process can lock and the RLIMIT_MEMLOCK soft resource limit instead defines a limit on how much memory an unprivileged process may lock.
返回值
On success these system calls return 0. On error, -1 is returned, errno is set appropriately, and no changes are made to any locks in the address space of the process.
错误
标签 | 描述 |
ENOMEM | (Linux 2.6.9 and later) the caller had a non-zeroRLIMIT_MEMLOCK soft resource limit, but tried to lock more memory than the limit permitted. This limit is not enforced if the process is privileged (CAP_IPC_LOCK). |
ENOMEM | (Linux 2.4 and earlier) the calling process tried to lock more than half of RAM. |
EPERM | (Linux 2.6.9 and later) the caller was not privileged (CAP_IPC_LOCK) and its RLIMIT_MEMLOCK soft resource limit was 0. |
EPERM | (Linux 2.6.8 and earlier) The calling process has insufficient privilege to call munlockall(). Under Linux the CAP_IPC_LOCKcapability is required. |
For mlock() and munlock(): | |
EINVAL | len was negative. |
EINVAL | (Not on Linux) addr was not a multiple of the page size. |
ENOMEM | Some of the specified address range does not correspond to mapped pages in the address space of the process. |
For mlockall(): | |
EINVAL | Unknown flags were specified. |
For munlockall(): | |
EPERM | (Linux 2.6.8 and earlier) The caller was not privileged (CAP_IPC_LOCK). |
BUGS
In the 2.4 series Linux kernels up to and including 2.4.17, a bug caused the mlockall()MCL_FUTURE flag to be inherited across a fork(2). This was rectified in kernel 2.4.18.
Since kernel 2.6.9, if a privileged process calls mlockall(MCL_FUTURE) and later drops privileges (loses the CAP_IPC_LOCK capability by, for example, setting its effective UID to a non-zero value), then subsequent memory allocations (e.g., mmap(2), brk(2)) will fail if the RLIMIT_MEMLOCK resource limit is encountered.
可用性
On POSIX systems on which mlock() and munlock() are available,_POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the number of bytes in a page can be determined from the constant PAGESIZE (if defined) in <limits.h> or by calling sysconf(_SC_PAGESIZE).
On POSIX systems on which mlockall() and munlockall() are available,_POSIX_MEMLOCK is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
POSIX.1-2001, SVr4
另请参阅
mmap2()函数
内容简介
#include <sys/mman.h> void *mmap2(void *start, size_t length, int prot, int flags, int fd, off_t pgoffset); |
描述
The mmap2() system call operates in exactly the same way as mmap(2), except that the final argument specifies the offset into the file in 4kB units (instead of bytes). This enables applications that use a 32-bit off_t to map larger files (typically up to 2^44 bytes).
返回值
On success, mmap2() returns a pointer to the mapped area. On error -1 is returned anderrno is set appropriately.
错误
标签 | 描述 |
EFAULT | Problem with getting the data from userspace. |
遵循于
这个系统调用是Linux特有的。
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
mmap2() is available since Linux 2.3.31. It is Linux specific, and should be avoided in portable applications. On 32-bit systems, mmap2() is used to implement the mmap64() function that is part of the LFS (Large File Summit).
另请参阅
mmap()函数
mmap, munmap - 映射或取消映射文件或设备到内存
内容简介
#include <sys/mman.h> void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); int munmap(void *start, size_t length); |
描述
The mmap() function asks to map length bytes starting at offset offset from the file (or other object) specified by the file descriptor fd into memory, preferably at address start. This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap().
The prot argument describes the desired memory protection (and must not conflict with the open mode of the file). It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.
标签 | 描述 |
PROT_EXEC | Pages may be executed. |
PROT_READ | Pages may be read. |
PROT_WRITE | Pages may be written. |
PROT_NONE | Pages may not be accessed. |
flags参数指定的映射对象,映射选项的类型和页面的映射拷贝所做的修改是否是私有的过程或要与其他文献共享。它位
标签 | 描述 |
MAP_FIXED | Do not select a different address than the one specified. If the memory region specified by start and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail. If MAP_FIXED is specified, start must be a multiple of the page size. Use of this option is discouraged. |
MAP_SHARED | Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file. The file may not actually be updated until msync(2) ormunmap(2) are called. |
MAP_PRIVATE | Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region. |
You must specify exactly one of MAP_SHARED and MAP_PRIVATE.
The above three flags are described in POSIX.1-2001. Linux also knows about the following non-standard flags:
标签 | 描述 |
MAP_DENYWRITE | |
This flag is ignored. (Long ago, it signalled that attempts to write to the underlying file should fail with ETXTBUSY. But this was a source of denial-of-service attacks.) | |
MAP_EXECUTABLE | |
This flag is ignored. | |
MAP_NORESERVE | |
Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memoryin proc(5). In kernels before 2.6, this flag only had effect for private writable mappings. | |
MAP_LOCKED (since Linux 2.5.37) | |
Lock the pages of the mapped region into memory in the manner of mlock(). This flag is ignored in older kernels. | |
MAP_GROWSDOWN | |
Used for stacks. Indicates to the kernel VM system that the mapping should extend downwards in memory. | |
MAP_ANONYMOUS | |
The mapping is not backed by any file; the fd and offsetarguments are ignored. The use of this flag in conjunction withMAP_SHARED is only supported on Linux since kernel 2.4. | |
MAP_ANON | |
Alias for MAP_ANONYMOUS. Deprecated. | |
MAP_FILE | |
Compatibility flag. Ignored. | |
MAP_32BIT | |
Put the mapping into the first 2GB of the process address space. Ignored when MAP_FIXED is set. This flag is currently only supported on x86-64 for 64bit programs. | |
MAP_POPULATE (since Linux 2.5.46) | |
Populate (prefault) page tables for a file mapping, by performing read-ahead on the file. Later accesses to the mapping will not be bocked by page faults. | |
MAP_NONBLOCK (since Linux 2.5.46) | |
Only meaningful in conjunction with MAP_POPULATE. Don’t perform read-ahead: only create page tables entries for pages that are already present in RAM. |
Some systems document the additional flags MAP_AUTOGROW, MAP_AUTORESRV, MAP_COPY, and MAP_LOCAL.
fd should be a valid file descriptor, unless MAP_ANONYMOUS is set. IfMAP_ANONYMOUS is set, then fd is ignored on Linux. However, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this.
offset should be a multiple of the page size as returned by getpagesize(2).
Memory mapped by mmap() is preserved across fork(2), with the same attributes.
A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.
The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.
The address start must be a multiple of the page size. All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. It is not an error if the indicated range does not contain any mapped pages.
For file-backed mappings, the st_atime field for the mapped file may be updated at any time between the mmap() and the corresponding unmapping; the first reference to a mapped page will update the field if it has not been already.
The st_ctime and st_mtime field for a file mapped with PROT_WRITE andMAP_SHARED will be updated after a write to the mapped region, and before a subsequent msync() with the MS_SYNC or MS_ASYNC flag, if one occurs.
返回值
On success, mmap() returns a pointer to the mapped area. On error, the valueMAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. On success, munmap() returns 0, on failure -1, and errno is set (probably to EINVAL).
注意
It is architecture dependent whether PROT_READ includes PROT_EXEC or not. Portable programs should always set PROT_EXEC if they intend to execute code in the new mapping.
错误
标签 | 描述 |
EACCES | A file descriptor refers to a non-regular file. Or MAP_PRIVATEwas requested, but fd is not open for reading. Or MAP_SHAREDwas requested and PROT_WRITE is set, but fd is not open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the file is append-only. |
EAGAIN | The file has been locked, or too much memory has been locked (see setrlimit(2)). |
EBADF | fd is not a valid file descriptor (and MAP_ANONYMOUS was not set). |
EINVAL | We don’t like start or length or offset. (E.g., they are too large, or not aligned on a page boundary.) |
ENFILE | The system limit on the total number of open files has been reached. |
ENODEV | The underlying filesystem of the specified file does not support memory mapping. |
ENOMEM | No memory is available, or the process’s maximum number of mappings would have been exceeded. |
EPERM | The prot argument asks for PROT_EXEC but the mapped area belongs to a file on a filesystem that was mounted no-exec. |
ETXTBSY | |
MAP_DENYWRITE was set but the object specified by fd is open for writing. | |
Use of a mapped region can result in these signals: | |
SIGSEGV | |
Attempted write into a region mapped as read-only. | |
SIGBUS | Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file). |
可用性
On POSIX systems on which mmap(), msync() and munmap() are available,_POSIX_MAPPED_FILES is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
SVr4, 4.4BSD, POSIX.1-2001.
BUGS
On Linux there are no guarantees like those suggested above underMAP_NORESERVE. By default, any process can be killed at any moment when the system runs out of memory.
In kernels before 2.6.7, the MAP_POPULATE flag only has effect if prot is specified asPROT_NONE.
另请参阅
B.O. Gallmeister, POSIX.4, O’Reilly, pp. 128-129 and 389-391.
modify_ldt()函数
内容简介
#include <sys/types.h> int modify_ldt(int func, void *ptr, unsigned long bytecount); |
描述
modify_ldt() 读取或一个进程写入本地描述符表(ldt)。 ldt 是使用i386处理器每个进程的内存管理表。对于该表的详细信息,请参阅英特尔386处理器手册。
When func is 0, modify_ldt() reads the ldt into the memory pointed to by ptr. The number of bytes read is the smaller of bytecount and the actual size of the ldt.
When func is 1, modify_ldt() modifies one ldt entry. ptr points to a modify_ldt_ldt_sstructure and bytecount must equal the size of this structure.
返回值
On success, modify_ldt() returns either the actual number of bytes read (for reading) or 0 (for writing). On failure, modify_ldt() returns -1 and sets errno.
错误
标签 | 描述 |
EFAULT | ptr points outside the address space. |
EINVAL | ptr is 0, or func is 1 and bytecount is not equal to the size of the structure modify_ldt_ldt_s, or func is 1 and the new ldt entry has invalid values. |
ENOSYS | func is neither 0 nor 1. |
遵循于
这个调用是Linux特有的,应在拟移植的程序不能使用。
注意
glibc不提供包装,这个系统调用,调用它 syscall(2).
另请参阅
mount()函数
内容简介
#include <sys/mount.h> int mount(const char *source, const char *target, const char *filesystemtype, unsigned long mountflags, const void *data); int umount(const char *target); int umount2(const char *target, int flags); |
描述
mount() attaches the filesystem specified by source (which is often a device name, but can also be a directory name or a dummy) to the directory specified by target.
umount() and umount2() remove the attachment of the (topmost) filesystem mounted on target.
Appropriate privilege (Linux: the CAP_SYS_ADMIN capability) is required to mount and unmount filesystems.
Since Linux 2.4 a single filesystem can be visible at multiple mount points, and multiple mounts can be stacked on the same mount point.
Values for the filesystemtype argument supported by the kernel are listed in/proc/filesystems (like "minix", "ext2", "msdos", "proc", "nfs", "iso9660" etc.). Further types may become available when the appropriate modules are loaded.
The mountflags argument may have the magic number 0xC0ED (MS_MGC_VAL) in the top 16 bits (this was required in kernel versions prior to 2.4, but is no longer required and ignored if specified), and various mount flags (as defined in <linux/fs.h> for libc4 and libc5 and in <sys/mount.h> for glibc2) in the low order 16 bits:
标签 | 描述 |
MS_BIND | |
(Linux 2.4 onwards) Perform a bind mount, making a file or a directory subtree visible at another point within a file system. Bind mounts may cross file system boundaries and spanchroot(2) jails. The filesystemtype, mountflags, and dataarguments are ignored. | |
MS_DIRSYNC (since Linux 2.5.19) | |
Make directory changes on this file system synchronous. (This property can be obtained for individual directories or subtrees using chattr(8).) | |
MS_MANDLOCK | |
Permit mandatory locking on files in this file system. (Mandatory locking must still be enabled on a per-file basis, as described infcntl(2).) | |
MS_MOVE | |
Move a subtree. source specifies an existing mount point andtarget specifies the new location. The move is atomic: at no point is the subtree unmounted. The filesystemtype, mountflags, and data arguments are ignored. | |
MS_NOATIME | |
Do not update access times for (all types of) files on this file system. | |
MS_NODEV | |
Do not allow access to devices (special files) on this file system. | |
MS_NODIRATIME | |
Do not update access times for directories on this file system. | |
MS_NOEXEC | |
Do not allow programs to be executed from this file system. | |
MS_NOSUID | |
Do not honour set-user-ID and set-group-ID bits when executing programs from this file system. | |
MS_RDONLY | |
Mount file system read-only. | |
MS_REMOUNT | |
Remount an existing mount. This is allows you to change themountflags and data of an existing mount without having to unmount and remount the file system. source and target should be the same values specified in the initial mount() call;filesystemtype is ignored. The following mountflags can be changed: MS_RDONLY,MS_SYNCHRONOUS, MS_MANDLOCK; before kernel 2.6.16, the following could also be changed: MS_NOATIME andMS_NODIRATIME; and, additionally, before kernel 2.4, the following could also be changed: MS_NOSUID, MS_NODEV,MS_NOEXEC. | |
MS_SYNCHRONOUS | |
Make writes on this file system synchronous (as though theO_SYNC flag to open(2) was specified for all file opens to this file system). | |
From Linux 2.4 onwards, the MS_NODEV, MS_NOEXEC, and MS_NOSUID flags are settable on a per-mount-point basis. From kernel 2.6.16 onwards, MS_NOATIME andMS_NODIRATIME are also settable on a per-mount-point basis. | |
The data argument is interpreted by the different file systems. Typically it is a string of comma-separated options understood by this file system. See mount(8) for details of the options available for each filesystem type. | |
Linux 2.1.116 added the umount2() system call, which, like umount(), unmounts a target, but allows additional flags controlling the behaviour of the operation: | |
MNT_FORCE (since Linux 2.1.116) | |
Force unmount even if busy. (Only for NFS mounts.) | |
MNT_DETACH (since Linux 2.4.11) | |
Perform a lazy unmount: make the mount point unavailable for new accesses, and actually perform the unmount when the mount point ceases to be busy. | |
MNT_EXPIRE (since Linux 2.6.8) | |
Mark the mount point as expired. If a mount point is not currently in use, then an initial call to umount2() with this flag fails with the error EAGAIN, but marks the mount point as expired. The mount point remains expired as long as it isn’t accessed by any process. A second umount2() call specifyingMNT_EXPIRE unmounts an expired mount point. This flag cannot be specified with either MNT_FORCE or MNT_DETACH. |
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
下面给出的误差值,导致文件系统类型无关的错误。每个文件系统类型可能有自己特殊的错误和自己的特殊行为。详情请参阅内核源代码。
标签 | 描述 |
EACCES | A component of a path was not searchable. (See alsopath_resolution(2).) Or, mounting a read-only filesystem was attempted without giving the MS_RDONLY flag. Or, the block device source is located on a filesystem mounted with theMS_NODEV option. |
EAGAIN | A call to umount2() specifying MNT_EXPIRE successfully marked an unbusy file system as expired. |
EBUSY | source is already mounted. Or, it cannot be remounted read-only, because it still holds files open for writing. Or, it cannot be mounted on target because target is still busy (it is the working directory of some task, the mount point of another device, has open files, etc.). Or, it could not be unmounted because it is busy. |
EFAULT | One of the pointer arguments points outside the user address space. |
EINVAL | source had an invalid superblock. Or, a remount (MS_REMOUNT) was attempted, but source was not already mounted on target. Or, a move (MS_MOVE) was attempted, butsource was not a mount point, or was ’/’. Or, an unmount was attempted, but target was not a mount point. Or, umount2() was called with MNT_EXPIRE and either MNT_DETACH orMNT_FORCE. |
ELOOP | Too many link encountered during pathname resolution. Or, a move was attempted, while target is a descendant of source. |
EMFILE | (In case no block device is required:) Table of dummy devices is full. |
ENAMETOOLONG | |
A pathname was longer than MAXPATHLEN. | |
ENODEV | filesystemtype not configured in the kernel. |
ENOENT | A pathname was empty or had a nonexistent component. |
ENOMEM | The kernel could not allocate a free page to copy filenames or data into. |
ENOTBLK | |
source is not a block device (and a device was required). | |
ENOTDIR | |
The second argument, or a prefix of the first argument, is not a directory. | |
ENXIO | The major number of the block device source is out of range. |
EPERM | The caller does not have the required privileges. |
遵循于
这些功能是Linux特有的,应在拟移植的程序不能使用。
历史
The original umount() function was called as umount(device) and would return ENOTBLK when called with something other than a block device. In Linux 0.98p4 a call umount(dir)was added, in order to support anonymous devices. In Linux 2.3.99-pre7 the callumount(device) was removed, leaving only umount(dir) (since now devices can be mounted in more than one place, so specifying the device does not suffice).
The original MS_SYNC flag was renamed MS_SYNCHRONOUS in 1.1.69 when a different MS_SYNC was added to <mman.h>.
Before Linux 2.4 an attempt to execute a set-user-ID or set-group-ID program on a filesystem mounted with MS_NOSUID would fail with EPERM. Since Linux 2.4 the set-user-ID and set-group-ID bits are just silently ignored in this case.
另请参阅
move_pages()函数
move_pages - 移动一组的处理的页面不同的NUMA节点
内容简介
#include <syscall.h> |
描述
move_pages 移动一组页面中执行的进程的地址空间,以不同的NUMA节点。该功能也可以被用来确定哪些页面被映射到当前节点。
pid is the process whose pages will be moved. The value 0 specifies the current process.
The argument nr_pages specifies the number of pages which would require moving.
addresses is an array of addresses of pages which would require moving
nodes is an array of numbers of nodes to move the corresponding pages to. If set toNULL, status is filled with current NUMA node IDs, but no migrations occur.
The flags argument describes the type of pages which will be moved:
标签 | 描述 |
MPOL_MF_MOVE | |
Syscall will move only pages which are mapped only by the process pid. | |
MPOL_MF_MOVE_ALL | |
Syscall will move pages which are mapped by multiple processes too (this mode needs to have sufficient permissions). |
The status field is only valid if move_pages finished successfully. This field contains the status of the specified pages. If the nodes argument is NULL or the migration succeeded, it is set to the node ID. Otherwise it contains a negative number, one of the following error codes:
-EFAULT the specified address does not point to a valid mapping
-ENOENT the page does not exist
-EPERM the page can’t be moved (it is mlocked)
-EACCES the page is shared by multiple processes and the flag MPOL_MF_MOVE_ALLwas not set
-EBUSY the page could not be moved - it is busy now
-EFAULT the page address is not valid
-ENOMEM insufficient memory
-EIO the page can’t be written
-EINVAL the page can’t be moved because the file system does not implement the necessary interface
返回值
If nodes is not NULL, move_pages returns the number of valid migration requests which could not currently be performed. Otherwise it returns 0.
An error indication is returned on error.
错误
EACCES one of the nodes specified by the nodes argument is not allowed for the specified process.
EINVAL the man pages to be moved are in kernel thread or the flag parameter is invalid.
ENODEV one of the nodes specified by the nodes argument is not available.
ENOENT there is no page which would be moved.
EPERM the flag is set to MPOL_MF_MOVE_ALL or pid doesn’t specify the current process, and the process has insufficient privileges.
ENOMEM insufficient memory
E2BIG the number of pages to move is too big
ESRCH the process pid can’t be found
可用性
This syscall is implemented only on the i386 and IA-64 architectures since kernel 2.6.
mprotect()函数
内容简介
#include <sys/mman.h> int mprotect(const void *addr, size_t len, int prot); |
描述
The function mprotect() specifies the desired protection for the memory page(s) containing part or all of the interval [addr,addr+len-1]. If an access is disallowed by the protection given it, the program receives a SIGSEGV.
prot is a bitwise-or of the following values:
标签 | 描述 |
PROT_NONE | The memory cannot be accessed at all. |
PROT_READ | The memory can be read. |
PROT_WRITE | The memory can be written to. |
PROT_EXEC | The memory can contain executing code. |
The new protection replaces any existing protection. For example, if the memory had previously been marked PROT_READ, and mprotect() is then called with protPROT_WRITE, it will no longer be readable.
返回值
On success, mprotect() returns zero. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | The memory cannot be given the specified access. This can happen, for example, if you mmap(2) a file to which you have read-only access, then ask mprotect() to mark it PROT_WRITE. |
EFAULT | The memory cannot be accessed. |
EINVAL | addr is not a valid pointer, or not a multiple of PAGESIZE. |
ENOMEM | Internal kernel structures could not be allocated. Or: addresses in the range [addr, addr+len] are invalid for the address space of the process, or specify one or more pages that are not mapped. |
实例:
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>
#include
#include
#include
#include
#include
/* for PAGESIZE */
#ifndef PAGESIZE
#define PAGESIZE 4096
#endif
int
main(void)
{
char *p;
char c;
/* Allocate a buffer; it will have the default
protection of PROT_READ|PROT_WRITE. */
p = malloc(1024+PAGESIZE-1);
if (!p) {
perror("Couldn’t malloc(1024)");
exit(errno);
}
/* Align to a multiple of PAGESIZE, assumed to be a power of two */
p = (char *)(((int) p + PAGESIZE-1) & ~(PAGESIZE-1));
c = p[666]; /* Read; ok */
p[666] = 42; /* Write; ok */
/* Mark the buffer read-only. */
if (mprotect(p, 1024, PROT_READ)) {
perror("Couldn’t mprotect");
exit(errno);
}
c = p[666]; /* Read; ok */
p[666] = 42; /* Write; program dies on SIGSEGV */
exit(0);
}
遵循于
SVr4, POSIX.1-2001. POSIX says that mprotect() can be used only on regions of memory obtained from mmap(2).
注意
On Linux it is always legal to call mprotect() on any address in a process’ address space (except for the kernel vsyscall area). In particular it can be used to change existing code mappings to be writable.
Whether PROT_EXEC has any effect different from PROT_READ is architecture and kernel version dependent.
另请参阅
mpx()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
mq_getsetattr()函数
内容简介
#include <sys/types.h> mqd_t mq_getsetattr(mqd_t mqdes, struct mq_attr *newattr, struct mq_attr *oldattr); |
描述
不要使用这个系统调用。
This is the low-level system call used to implement mq_getattr(3) and mq_setattr(3). For an explanation of how this system call operates, see the description ofmq_setattr(3).
遵循于
This interface is non-standard; avoid its use.
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2). (Actually, never call it unless you are writing a libc!)
另请参阅
mremap()函数
内容简介
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/mman.h>
void * mremap(void *old_address, size_t old_size , size_t new_size, int flags);
描述
mremap() 扩大(或缩小)现有的内存映射,潜在的移动它在同一时间(由flags参数和可用的虚拟地址空间控制)。
old_address is the old address of the virtual memory block that you want to expand (or shrink). Note that old_address has to be page aligned. old_size is the old size of the virtual memory block. new_size is the requested size of the virtual memory block after the resize.
In Linux the memory is divided into pages. A user process has (one or) several linear virtual memory segments. Each virtual memory segment has one or more mappings to real memory pages (in the page table). Each virtual memory segment has its own protection (access rights), which may cause a segmentation violation if the memory is accessed incorrectly (e.g., writing to a read-only segment). Accessing virtual memory outside of the segments will also cause a segmentation violation.
mremap() uses the Linux page table scheme. mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficientrealloc().
The flags bit-mask argument may be 0, or include the following flag:
标签 | 描述 |
MREMAP_MAYMOVE | |
By default, if there is not sufficient space to expand a mapping at its current location, then mremap() fails. If this flag is specified, then the kernel is permitted to relocate the mapping to a new virtual address, if necessary. If the mapping is relocated, then absolute yiibaiers into the old mapping location become invalid (offsets relative to the starting address of the mapping should be employed). | |
MREMAP_FIXED (since Linux 2.3.31) | |
This flag serves a similar purpose to the MAP_FIXED flag ofmmap(2). If this flag is specified, then mremap() accepts a fifth argument, void *new_address, which specifies a page-aligned address to which the mapping must be moved. Any previous mapping at the address range specified by new_address andnew_size is unmapped. If MREMAP_FIXED is specified, thenMREMAP_MAYMOVE must also be specified. |
If the memory segment specified by old_address and old_size is locked (using mlock() or similar), then this lock is maintained when the segment is resized and/or relocated. As a consequence, the amount of memory locked by the process may change.
返回值
On success mremap() returns a yiibaier to the new virtual memory area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately.
错误
标签 | 描述 |
EAGAIN | The caller tried to expand a memory segment that is locked, but this was not possible without exceeding the RLIMIT_MEMLOCK resource limit. |
EFAULT | "Segmentation fault." Some address in the range old_address toold_address+old_size is an invalid virtual memory address for this process. You can also get EFAULT even if there exist mappings that cover the whole address space requested, but those mappings are of different types. |
EINVAL | An invalid argument was given. Possible causes are: old_addresswas not page aligned; a value other than MREMAP_MAYMOVEor MREMAP_FIXED was specified in flags; new_size was zero;new_size or new_address was invalid; or the new address range specified by new_address and new_size overlapped the old address range specified by old_address and old_size; orMREMAP_FIXED was specified without also specifyingMREMAP_MAYMOVE. |
ENOMEM | The memory area cannot be expanded at the current virtual address, and the MREMAP_MAYMOVE flag is not set in flags. Or, there is not enough (virtual) memory available. |
注意
Prior to version 2.4, glibc did not expose the definition of MREMAP_FIXED, and the prototype for mremap() did not allow for the new_address argument.
遵循于
This call is Linux-specific, and should not be used in programs intended to be portable. 4.2BSD had a (never actually implemented) mremap(2) call with completely different semantics.
另请参阅
你最喜欢的操作系统课本上分页内存的详细信息。 (现代操作系统由Andrew S.坦南鲍姆,里面的Linux由兰道夫Bentson,UNIX操作系统的莫里斯J.巴赫的设计。)
msgctl()函数
内容简介
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
int msgctl(int msqid, int cmd, struct msqid_ds *buf);
描述
struct msqid_ds {
struct ipc_perm msg_perm; /* Ownership and permissions
time_t msg_stime; /* Time of last msgsnd() */
time_t msg_rtime; /* Time of last msgrcv() */
time_t msg_ctime; /* Time of last change */
unsigned long __msg_cbytes; /* Current number of bytes in
queue (non-standard) */
msgqnum_t msg_qnum; /* Current number of messages
in queue */
msglen_t msg_qbytes; /* Maximum number of bytes
allowed in queue */
pid_t msg_lspid; /* PID of last msgsnd() */
pid_t msg_lrpid; /* PID of last msgrcv() */
};
The ipc_perm structure is defined in <sys/ipc.h> as follows (the highlighted fields are settable using IPC_SET):
struct ipc_perm {
key_t key; /* Key supplied to msgget() */
uid_t
uid
; /* Effective UID of owner */
gid_t
gid
; /* Effective GID of owner */
uid_t cuid; /* Effective UID of creator */
gid_t cgid; /* Effective GID of creator */
unsigned short
mode
; /* Permissions */
unsigned short seq; /* Sequence number */
};
Valid values for cmd are:
标签 | 描述 | |
IPC_STAT | ||
Copy information from the kernel data structure associated withmsqid into the msqid_ds structure pointed to by buf. The caller must have read permission on the message queue. | ||
IPC_SET | ||
Write the values of some members of the msqid_ds structure pointed to by buf to the kernel data structure associated with this message queue, updating also its msg_ctime member. The following members of the structure are updated: msg_qbytes,msg_perm.uid, msg_perm.gid, and (the least significant 9 bits of)msg_perm.mode. The effective UID of the calling process must match the owner (msg_perm.uid) or creator (msg_perm.cuid) of the message queue, or the caller must be privileged. Appropriate privilege (Linux: the CAP_IPC_RESOURCEcapability) is required to raise the msg_qbytes value beyond the system parameter MSGMNB. | ||
IPC_RMID | ||
Immediately remove the message queue, awakening all waiting reader and writer processes (with an error return and errno set to EIDRM). The calling process must have appropriate privileges or its effective user ID must be either that of the creator or owner of the message queue. | ||
IPC_INFO (Linux specific) | ||
Returns information about system-wide message queue limits and parameters in the structure pointed to by buf. This structure is of type msginfo (thus, a cast is required), defined in<sys/msg.h> if the _GNU_SOURCE feature test macro is defined:
The msgmni, msgmax, and msgmnb settings can be changed via/proc files of the same name; see proc(5) for details. | ||
MSG_INFO (Linux specific) | ||
Returns a msginfo structure containing the same information as for IPC_INFO, except that the following fields are returned with information about system resources consumed by message queues: the msgpool field returns the number of message queues that currently exist on the system; the msgmap field returns the total number of messages in all queues on the system; and the msgtql field returns the total number of bytes in all messages in all queues on the system. | ||
MSG_STAT (Linux specific) | ||
Returns a msqid_ds structure as for IPC_STAT. However, themsqid argument is not a queue identifier, but instead an index into the kernel’s internal array that maintains information about all message queues on the system. |
返回值
On success, IPC_STAT, IPC_SET, and IPC_RMID return 0. A successful IPC_INFO orMSG_INFO operation returns the index of the highest used entry in the kernel’s internal array recording information about all message queues. (This information can be used with repeated MSG_STAT operations to obtain information about all queues on the system.) A successful MSG_STAT operation returns the identifier of the queue whose index was given in msqid.
On error, -1 is returned with errno indicating the error.
错误
On failure, errno is set to one of the following:
标签 | 描述 |
EACCES | The argument cmd is equal to IPC_STAT or MSG_STAT, but the calling process does not have read permission on the message queue msqid, and does not have the CAP_IPC_OWNERcapability. |
EFAULT | The argument cmd has the value IPC_SET or IPC_STAT, but the address pointed to by buf isn’t accessible. |
EIDRM | The message queue was removed. |
EINVAL | Invalid value for cmd or msqid. Or: for a MSG_STAT operation, the index value specified in msqid referred to an array slot that is currently unused. |
EPERM | The argument cmd has the value IPC_SET or IPC_RMID, but the effective user ID of the calling process is not the creator (as found in msg_perm.cuid) or the owner (as found inmsg_perm.uid) of the message queue, and the process is not privileged (Linux: it does not have the CAP_SYS_ADMINcapability). |
注意
The IPC_INFO, MSG_STAT and MSG_INFO operations are used by the ipcs(8) program to provide information on allocated resources. In the future these may modified or moved to a /proc file system interface.
Various fields in the struct msqid_ds were shorts under Linux 2.2 and have become longs under Linux 2.4. To take advantage of this, a recompilation under glibc-2.1.91 or later should suffice. (The kernel distinguishes old and new calls by an IPC_64 flag incmd.)
遵循于
SVr4, POSIX.1-2001.
另请参阅
msgget()函数
内容简介
#include <sys/types.h> |
int msgget(key_t key, int msgflg);
描述
The msgget() system call returns the message queue identifier associated with the value of the key argument. A new message queue is created if key has the valueIPC_PRIVATE or key isn’t IPC_PRIVATE, no message queue with the given key keyexists, and IPC_CREAT is specified in msgflg.
If msgflg specifies both IPC_CREAT and IPC_EXCL and a message queue already exists for key, then msgget() fails with errno set to EEXIST. (This is analogous to the effect of the combination O_CREAT | O_EXCL for open(2).)
Upon creation, the least significant bits of the argument msgflg define the permissions of the message queue. These permission bits have the same format and semantics as the permissions specified for the mode argument of open(2). (The execute permissions are not used.)
If a new message queue is created, then its associated data structure msqid_ds (seemsgctl(2)) is initialised as follows:
标签 | 描述 |
msg_perm.cuid and msg_perm.uid are set to the effective user ID of the calling process. | |
msg_perm.cgid and msg_perm.gid are set to the effective group ID of the calling process. | |
The least significant 9 bits of msg_perm.mode are set to the least significant 9 bits of msgflg. | |
msg_qnum, msg_lspid, msg_lrpid, msg_stime and msg_rtime are set to 0. | |
msg_ctime is set to the current time. | |
msg_qbytes is set to the system limit MSGMNB. |
如果消息队列中已经存在的权限进行了验证,并进行检查,看它是否被标记销毁。
返回值
If successful, the return value will be the message queue identifier (a nonnegative integer), otherwise -1 with errno indicating the error.
错误
On failure, errno is set to one of the following values:
标签 | 描述 |
EACCES | A message queue exists for key, but the calling process does not have permission to access the queue, and does not have the CAP_IPC_OWNER capability. |
EEXIST | A message queue exists for key and msgflg specified bothIPC_CREAT and IPC_EXCL. |
ENOENT | No message queue exists for key and msgflg did not specifyIPC_CREAT. |
ENOMEM | A message queue has to be created but the system does not have enough memory for the new data structure. |
ENOSPC | A message queue has to be created but the system limit for the maximum number of message queues (MSGMNI) would be exceeded. |
注意
IPC_PRIVATE isn’t a flag field but a key_t type. If this special value is used for key, the system call ignores everything but the least significant 9 bits of msgflg and creates a new message queue (on success).
The following is a system limit on message queue resources affecting a msgget() call:
标签 | 描述 |
MSGMNI | System wide maximum number of message queues: policy dependent (on Linux, this limit can be read and modified via/proc/sys/kernel/msgmni). |
BUGS
The name choice IPC_PRIVATE was perhaps unfortunate, IPC_NEW would more clearly show its function.
遵循于
SVr4, POSIX.1-2001.
LINUX 注意
Until version 2.3.20 Linux would return EIDRM for a msgget() on a message queue scheduled for deletion.
另请参阅
msgop()函数
内容简介
#include <sys/types.h> |
int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg);
ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg);
描述
The msgsnd() and msgrcv() system calls are used, respectively, to send messages to, and receive messages from, a message queue. The calling process must have write permission on the message queue in order to send a message, and read permission to receive a message.
The msgp argument is a pointer to caller-defined structure of the following general form:
struct msgbuf {
long mtype; /* message type, must be > 0 */
char mtext[1]; /* message data */
};
The mtext field is an array (or other structure) whose size is specified by msgsz, a non-negative integer value. Messages of zero length (i.e., no mtext field) are permitted. Themtype field must have a strictly positive integer value. This value can be used by the receiving process for message selection (see the description of msgrcv() below).
The msgsnd() system call appends a copy of the message pointed to by msgp to the message queue whose identifier is specified by msqid.
If sufficient space is available in the queue, msgsnd() succeeds immediately. (The queue capacity is defined by the msg_bytes field in the associated data structure for the message queue. During queue creation this field is initialised to MSGMNB bytes, but this limit can be modified using msgctl().) If insufficient space is available in the queue, then the default behaviour of msgsnd() is to block until space becomes available. IfIPC_NOWAIT is specified in msgflg, then the call instead fails with the error EAGAIN.
A blocked msgsnd() call may also fail if the queue is removed (in which case the system call fails with errno set to EIDRM), or a signal is caught (in which case the system call fails with errno set to EINTR). (msgsnd and msgrcv are never automatically restarted after being interrupted by a signal handler, regardless of the setting of theSA_RESTART flag when establishing a signal handler.)
在成功完成消息队列的数据结构更新如下:
标签 | 描述 |
msg_lspid is set to the process ID of the calling process. | |
msg_qnum is incremented by 1. | |
msg_stime is set to the current time. | |
The system call msgrcv() removes a message from the queue specified by msqid and places it in the buffer pointed to msgp. | |
The argument msgsz specifies the maximum size in bytes for the member mtext of the structure pointed to by the msgp argument. If the message text has length greater than msgsz, then the behaviour depends on whether MSG_NOERROR is specified inmsgflg. If MSG_NOERROR is specified, then the message text will be truncated (and the truncated part will be lost); if MSG_NOERROR is not specified, then the message isn’t removed from the queue and the system call fails returning -1 with errno set toE2BIG. | |
The argument msgtyp specifies the type of message requested as follows: | |
If msgtyp is 0, then the first message in the queue is read. | |
If msgtyp is greater than 0, then the first message in the queue of type msgtyp is read, unless MSG_EXCEPT was specified inmsgflg, in which case the first message in the queue of type not equal to msgtyp will be read. | |
If msgtyp is less than 0, then the first message in the queue with the lowest type less than or equal to the absolute value ofmsgtyp will be read. | |
The msgflg argument is a bit mask constructed by ORing together zero or more of the following flags: | |
IPC_NOWAIT | |
Return immediately if no message of the requested type is in the queue. The system call fails with errno set to ENOMSG. | |
MSG_EXCEPT | |
Used with msgtyp greater than 0 to read the first message in the queue with message type that differs from msgtyp. | |
MSG_NOERROR | |
To truncate the message text if longer than msgsz bytes. | |
If no message of the requested type is available and IPC_NOWAIT isn’t specified inmsgflg, the calling process is blocked until one of the following conditions occurs: | |
A message of the desired type is placed in the queue. | |
The message queue is removed from the system. In this case the system call fails with errno set to EIDRM. | |
The calling process catches a signal. In this case the system call fails with errno set to EINTR. | |
Upon successful completion the message queue data structure is updated as follows: | |
msg_lrpid is set to the process ID of the calling process. | |
msg_qnum is decremented by 1. | |
msg_rtime is set to the current time. |
RETURN VALUE
On failure both functions return -1 with errno indicating the error, otherwise msgsnd() returns 0 and msgrcv() returns the number of bytes actually copied into the mtextarray.
ERRORS
When msgsnd() fails, errno will be set to one among the following values:
标签 | 描述 |
EACCES | The calling process does not have write permission on the message queue, and does not have the CAP_IPC_OWNERcapability. |
EAGAIN | The message can’t be sent due to the msg_qbytes limit for the queue and IPC_NOWAIT was specified in msgflg. |
EFAULT | The address pointed to by msgp isn’t accessible. |
EIDRM | The message queue was removed. |
EINTR | Sleeping on a full message queue condition, the process caught a signal. |
EINVAL | Invalid msqid value, or non-positive mtype value, or invalidmsgsz value (less than 0 or greater than the system valueMSGMAX). |
ENOMEM | The system does not have enough memory to make a copy of the message pointed to by msgp. |
When msgrcv() fails, errno will be set to one among the following values: | |
E2BIG | The message text length is greater than msgsz andMSG_NOERROR isn’t specified in msgflg. |
EACCES | The calling process does not have read permission on the message queue, and does not have the CAP_IPC_OWNERcapability. |
EAGAIN | No message was available in the queue and IPC_NOWAIT was specified in msgflg. |
EFAULT | The address pointed to by msgp isn’t accessible. |
EIDRM | While the process was sleeping to receive a message, the message queue was removed. |
EINTR | While the process was sleeping to receive a message, the process caught a signal. |
EINVAL | msgqid was invalid, or msgsz was less than 0. |
ENOMSG | IPC_NOWAIT was specified in msgflg and no message of the requested type existed on the message queue. |
CONFORMING TO
SVr4, POSIX.1-2001.
NOTES
The msgp argument is declared as struct msgbuf * with libc4, libc5, glibc 2.0, glibc 2.1. It is declared as void * with glibc 2.2 and later, as required by SUSv2 and SUSv3.
The following limits on message queue resources affect the msgsnd() call:
标签 | 描述 |
MSGMAX | Maximum size for a message text: 8192 bytes (on Linux, this limit can be read and modified via /proc/sys/kernel/msgmax). |
MSGMNB | Default maximum size in bytes of a message queue: 16384 bytes (on Linux, this limit can be read and modified via/proc/sys/kernel/msgmnb). The superuser can increase the size of a message queue beyond MSGMNB by a msgctl() system call. |
The implementation has no intrinsic limits for the system wide maximum number of message headers (MSGTQL) and for the system wide maximum size in bytes of the message pool (MSGPOOL).
SEE ALSO
msgsnd()函数
内容简介
#include <sys/types.h> |
int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg);
ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg);
描述
The msgsnd() and msgrcv() system calls are used, respectively, to send messages to, and receive messages from, a message queue. The calling process must have write permission on the message queue in order to send a message, and read permission to receive a message.
The msgp argument is a pointer to caller-defined structure of the following general form:
struct msgbuf {
long mtype; /* message type, must be > 0 */
char mtext[1]; /* message data */
};
The mtext field is an array (or other structure) whose size is specified by msgsz, a non-negative integer value. Messages of zero length (i.e., no mtext field) are permitted. Themtype field must have a strictly positive integer value. This value can be used by the receiving process for message selection (see the description of msgrcv() below).
The msgsnd() system call appends a copy of the message pointed to by msgp to the message queue whose identifier is specified by msqid.
If sufficient space is available in the queue, msgsnd() succeeds immediately. (The queue capacity is defined by the msg_bytes field in the associated data structure for the message queue. During queue creation this field is initialised to MSGMNB bytes, but this limit can be modified using msgctl().) If insufficient space is available in the queue, then the default behaviour of msgsnd() is to block until space becomes available. IfIPC_NOWAIT is specified in msgflg, then the call instead fails with the error EAGAIN.
A blocked msgsnd() call may also fail if the queue is removed (in which case the system call fails with errno set to EIDRM), or a signal is caught (in which case the system call fails with errno set to EINTR). (msgsnd and msgrcv are never automatically restarted after being interrupted by a signal handler, regardless of the setting of theSA_RESTART flag when establishing a signal handler.)
Upon successful completion the message queue data structure is updated as follows:
标签 | 描述 |
msg_lspid is set to the process ID of the calling process. | |
msg_qnum is incremented by 1. | |
msg_stime is set to the current time. | |
The system call msgrcv() removes a message from the queue specified by msqid and places it in the buffer pointed to msgp. | |
The argument msgsz specifies the maximum size in bytes for the member mtext of the structure pointed to by the msgp argument. If the message text has length greater than msgsz, then the behaviour depends on whether MSG_NOERROR is specified inmsgflg. If MSG_NOERROR is specified, then the message text will be truncated (and the truncated part will be lost); if MSG_NOERROR is not specified, then the message isn’t removed from the queue and the system call fails returning -1 with errno set toE2BIG. | |
The argument msgtyp specifies the type of message requested as follows: | |
If msgtyp is 0, then the first message in the queue is read. | |
If msgtyp is greater than 0, then the first message in the queue of type msgtyp is read, unless MSG_EXCEPT was specified inmsgflg, in which case the first message in the queue of type not equal to msgtyp will be read. | |
If msgtyp is less than 0, then the first message in the queue with the lowest type less than or equal to the absolute value ofmsgtyp will be read. | |
The msgflg argument is a bit mask constructed by ORing together zero or more of the following flags: | |
IPC_NOWAIT | |
Return immediately if no message of the requested type is in the queue. The system call fails with errno set to ENOMSG. | |
MSG_EXCEPT | |
Used with msgtyp greater than 0 to read the first message in the queue with message type that differs from msgtyp. | |
MSG_NOERROR | |
To truncate the message text if longer than msgsz bytes. | |
If no message of the requested type is available and IPC_NOWAIT isn’t specified inmsgflg, the calling process is blocked until one of the following conditions occurs: | |
A message of the desired type is placed in the queue. | |
The message queue is removed from the system. In this case the system call fails with errno set to EIDRM. | |
The calling process catches a signal. In this case the system call fails with errno set to EINTR. | |
Upon successful completion the message queue data structure is updated as follows: | |
msg_lrpid is set to the process ID of the calling process. | |
msg_qnum is decremented by 1. | |
msg_rtime is set to the current time. |
返回值
On failure both functions return -1 with errno indicating the error, otherwise msgsnd() returns 0 and msgrcv() returns the number of bytes actually copied into the mtextarray.
错误
When msgsnd() fails, errno will be set to one among the following values:
标签 | 描述 |
EACCES | The calling process does not have write permission on the message queue, and does not have the CAP_IPC_OWNERcapability. |
EAGAIN | The message can’t be sent due to the msg_qbytes limit for the queue and IPC_NOWAIT was specified in msgflg. |
EFAULT | The address pointed to by msgp isn’t accessible. |
EIDRM | The message queue was removed. |
EINTR | Sleeping on a full message queue condition, the process caught a signal. |
EINVAL | Invalid msqid value, or non-positive mtype value, or invalidmsgsz value (less than 0 or greater than the system valueMSGMAX). |
ENOMEM | The system does not have enough memory to make a copy of the message pointed to by msgp. |
When msgrcv() fails, errno will be set to one among the following values: | |
E2BIG | The message text length is greater than msgsz andMSG_NOERROR isn’t specified in msgflg. |
EACCES | The calling process does not have read permission on the message queue, and does not have the CAP_IPC_OWNERcapability. |
EAGAIN | No message was available in the queue and IPC_NOWAIT was specified in msgflg. |
EFAULT | The address pointed to by msgp isn’t accessible. |
EIDRM | While the process was sleeping to receive a message, the message queue was removed. |
EINTR | While the process was sleeping to receive a message, the process caught a signal. |
EINVAL | msgqid was invalid, or msgsz was less than 0. |
ENOMSG | IPC_NOWAIT was specified in msgflg and no message of the requested type existed on the message queue. |
遵循于
SVr4, POSIX.1-2001.
注意
The msgp argument is declared as struct msgbuf * with libc4, libc5, glibc 2.0, glibc 2.1. It is declared as void * with glibc 2.2 and later, as required by SUSv2 and SUSv3.
The following limits on message queue resources affect the msgsnd() call:
标签 | 描述 |
MSGMAX | Maximum size for a message text: 8192 bytes (on Linux, this limit can be read and modified via /proc/sys/kernel/msgmax). |
MSGMNB | Default maximum size in bytes of a message queue: 16384 bytes (on Linux, this limit can be read and modified via/proc/sys/kernel/msgmnb). The superuser can increase the size of a message queue beyond MSGMNB by a msgctl() system call. |
The implementation has no intrinsic limits for the system wide maximum number of message headers (MSGTQL) and for the system wide maximum size in bytes of the message pool (MSGPOOL).
另请参阅
msync()函数
内容简介
#include <sys/mman.h>
int msync(void *start, size_t length, int flags);
描述
msync() flushes changes made to the in-core copy of a file that was mapped into memory using mmap(2) back to disk. Without use of this call there is no guarantee that changes are written back before munmap(2) is called. To be more precise, the part of the file that corresponds to the memory area starting at start and having length lengthis updated. The flags argument may have the bits MS_ASYNC, MS_SYNC and MS_INVALIDATE set, but not both MS_ASYNC and MS_SYNC. MS_ASYNC specifies that an update be scheduled, but the call returns immediately. MS_SYNC asks for an update and waits for it to complete. MS_INVALIDATE asks to invalidate other mappings of the same file (so that they can be updated with the fresh values just written).
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | start is not a multiple of PAGESIZE; or any bit other than MS_ASYNC | MS_INVALIDATE | MS_SYNC is set in flags; or both MS_SYNC and MS_ASYNC are set in flags. |
ENOMEM | The indicated memory (or part of it) was not mapped. |
可用性
On POSIX systems on which msync() is available, both _POSIX_MAPPED_FILES and_POSIX_SYNCHRONIZED_IO are defined in <unistd.h> to a value greater than 0. (See also sysconf(3).)
遵循于
POSIX.1-2001.
This call was introduced in Linux 1.3.21, and then used EFAULT instead of ENOMEM. In Linux 2.4.19 this was changed to the POSIX value ENOMEM.
另请参阅
B.O. Gallmeister, POSIX.4, O’Reilly, pp. 128-129 and 389-391.
multiplexer()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
Unimplemented system calls.
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
munlockall()函数
mlock, munlock, mlockall, munlockall - 锁定和解锁内存
内容简介
#include <sys/mman.h> int mlock(const void *addr, size_t len); int munlock(const void *addr, size_t len); int mlockall(int flags); int munlockall(void); |
描述
mlock() and mlockall() respectively lock part or all of the calling process’s virtual address space into RAM, preventing that memory from being paged to the swap area.munlock() and munlockall() perform the converse operation, respectively unlocking part or all of the calling process’s virtual address space, so that pages in the specified virtual address range may once more to be swapped out if required by the kernel memory manager. Memory locking and unlocking are performed in units of whole pages.
mlock() and munlock()
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
munlock() unlocks pages in the address range starting at addr and continuing for lenbytes. After this call, all pages that contain a part of the specified memory range can be moved to external swap space again by the kernel.
mlockall() and munlockall()
mlockall() locks all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory, and memory-mapped files. All mapped pages are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
The flags argument is constructed as the bitwise OR of one or more of the following constants:
标签 | 描述 |
MCL_CURRENT | 锁而当前映射到进程的地址空间中的所有页。 |
MCL_FUTURE | 锁这将成为映射到进程的未来的地址空间中的所有页。这些可以是例如通过不断增长的堆所需的新页面和堆栈以及新的内存映射文件或共享内存区域。 |
If MCL_FUTURE has been specified, then a later system call (e.g., mmap(2), sbrk(2),malloc(3)), may fail if it would cause the number of locked bytes to exceed the permitted maximum (see below). In the same circumstances, stack growth may likewise fail: the kernel will deny stack expansion and deliver a SIGSEGV signal to the process.
munlockall() 解锁映射到调用进程的地址空间中所有页面。
注意
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler with sched_setscheduler(2). Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system’s RAM to disk, regardless of memory locks.)
Real-time processes that are using mlockall() to prevent delays on page faults should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and writes to the memory occupied by this array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.
Memory locks are not inherited by a child created via fork(2) and are automatically removed (unlocked) during an execve(2) or when the process terminates.
The memory lock on an address range is automatically removed if the address range is unmapped via munmap(2).
Memory locks do not stack, i.e., pages which have been locked several times by calls tomlock() or mlockall() will be unlocked by a single call to munlock() for the corresponding range or by munlockall(). Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
LINUX 注意
Under Linux, mlock() and munlock() automatically round addr down to the nearest page boundary. However, POSIX.1-2001 allows an implementation to require that addr is page aligned, so portable applications should ensure this.
Limits and permissions
In Linux 2.6.8 and earlier, a process must be privileged (CAP_IPC_LOCK) in order to lock memory and the RLIMIT_MEMLOCK soft resource limit defines a limit on how much memory the process may lock.
Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process can lock and the RLIMIT_MEMLOCK soft resource limit instead defines a limit on how much memory an unprivileged process may lock.
返回值
On success these system calls return 0. On error, -1 is returned, errno is set appropriately, and no changes are made to any locks in the address space of the process.
错误
标签 | 描述 |
ENOMEM | (Linux 2.6.9 and later) the caller had a non-zeroRLIMIT_MEMLOCK soft resource limit, but tried to lock more memory than the limit permitted. This limit is not enforced if the process is privileged (CAP_IPC_LOCK). |
ENOMEM | (Linux 2.4 and earlier) the calling process tried to lock more than half of RAM. |
EPERM | (Linux 2.6.9 and later) the caller was not privileged (CAP_IPC_LOCK) and its RLIMIT_MEMLOCK soft resource limit was 0. |
EPERM | (Linux 2.6.8 and earlier) The calling process has insufficient privilege to call munlockall(). Under Linux the CAP_IPC_LOCKcapability is required. |
For mlock() and munlock(): | |
EINVAL | len was negative. |
EINVAL | (Not on Linux) addr was not a multiple of the page size. |
ENOMEM | Some of the specified address range does not correspond to mapped pages in the address space of the process. |
For mlockall(): | |
EINVAL | Unknown flags were specified. |
For munlockall(): | |
EPERM | (Linux 2.6.8 and earlier) The caller was not privileged (CAP_IPC_LOCK). |
BUGS
In the 2.4 series Linux kernels up to and including 2.4.17, a bug caused the mlockall()MCL_FUTURE flag to be inherited across a fork(2). This was rectified in kernel 2.4.18.
Since kernel 2.6.9, if a privileged process calls mlockall(MCL_FUTURE) and later drops privileges (loses the CAP_IPC_LOCK capability by, for example, setting its effective UID to a non-zero value), then subsequent memory allocations (e.g., mmap(2), brk(2)) will fail if the RLIMIT_MEMLOCK resource limit is encountered.
可用性
On POSIX systems on which mlock() and munlock() are available,_POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the number of bytes in a page can be determined from the constant PAGESIZE (if defined) in <limits.h> or by calling sysconf(_SC_PAGESIZE).
On POSIX systems on which mlockall() and munlockall() are available,_POSIX_MEMLOCK is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
POSIX.1-2001, SVr4
另请参阅
munlock()函数
mlock, munlock, mlockall, munlockall - 锁定和解锁内存
内容简介
#include <sys/mman.h> int mlock(const void *addr, size_t len); int munlock(const void *addr, size_t len); int mlockall(int flags); int munlockall(void); |
描述
mlock() and mlockall() respectively lock part or all of the calling process’s virtual address space into RAM, preventing that memory from being paged to the swap area.munlock() and munlockall() perform the converse operation, respectively unlocking part or all of the calling process’s virtual address space, so that pages in the specified virtual address range may once more to be swapped out if required by the kernel memory manager. Memory locking and unlocking are performed in units of whole pages.
mlock() and munlock()
mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
munlock() unlocks pages in the address range starting at addr and continuing for lenbytes. After this call, all pages that contain a part of the specified memory range can be moved to external swap space again by the kernel.
mlockall() and munlockall()
mlockall() locks all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory, and memory-mapped files. All mapped pages are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.
The flags argument is constructed as the bitwise OR of one or more of the following constants:
标签 | 描述 |
MCL_CURRENT | Lock all pages which are currently mapped into the address space of the process. |
MCL_FUTURE | Lock all pages which will become mapped into the address space of the process in the future. These could be for instance new pages required by a growing heap and stack as well as new memory mapped files or shared memory regions. |
If MCL_FUTURE has been specified, then a later system call (e.g., mmap(2), sbrk(2),malloc(3)), may fail if it would cause the number of locked bytes to exceed the permitted maximum (see below). In the same circumstances, stack growth may likewise fail: the kernel will deny stack expansion and deliver a SIGSEGV signal to the process.
munlockall() unlocks all pages mapped into the address space of the calling process.
注意
Memory locking has two main applications: real-time algorithms and high-security data processing. Real-time applications require deterministic timing, and, like scheduling, paging is one major cause of unexpected program execution delays. Real-time applications will usually also switch to a real-time scheduler withsched_setscheduler(2). Cryptographic security software often handles critical bytes like passwords or secret keys as data structures. As a result of paging, these secrets could be transferred onto a persistent swap store medium, where they might be accessible to the enemy long after the security software has erased the secrets in RAM and terminated. (But be aware that the suspend mode on laptops and some desktop computers will save a copy of the system’s RAM to disk, regardless of memory locks.)
Real-time processes that are using mlockall() to prevent delays on page faults should reserve enough locked stack pages before entering the time-critical section, so that no page fault can be caused by function calls. This can be achieved by calling a function that allocates a sufficiently large automatic variable (an array) and writes to the memory occupied by this array in order to touch these stack pages. This way, enough pages will be mapped for the stack and can be locked into RAM. The dummy writes ensure that not even copy-on-write page faults can occur in the critical section.
Memory locks are not inherited by a child created via fork(2) and are automatically removed (unlocked) during an execve(2) or when the process terminates.
The memory lock on an address range is automatically removed if the address range is unmapped via munmap(2).
Memory locks do not stack, i.e., pages which have been locked several times by calls tomlock() or mlockall() will be unlocked by a single call to munlock() for the corresponding range or by munlockall(). Pages which are mapped to several locations or by several processes stay locked into RAM as long as they are locked at least at one location or by at least one process.
LINUX 注意
Under Linux, mlock() and munlock() automatically round addr down to the nearest page boundary. However, POSIX.1-2001 allows an implementation to require that addr is page aligned, so portable applications should ensure this.
Limits and permissions
In Linux 2.6.8 and earlier, a process must be privileged (CAP_IPC_LOCK) in order to lock memory and the RLIMIT_MEMLOCK soft resource limit defines a limit on how much memory the process may lock.
Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process can lock and the RLIMIT_MEMLOCK soft resource limit instead defines a limit on how much memory an unprivileged process may lock.
返回值
On success these system calls return 0. On error, -1 is returned, errno is set appropriately, and no changes are made to any locks in the address space of the process.
错误
标签 | 描述 |
ENOMEM | (Linux 2.6.9 and later) the caller had a non-zeroRLIMIT_MEMLOCK soft resource limit, but tried to lock more memory than the limit permitted. This limit is not enforced if the process is privileged (CAP_IPC_LOCK). |
ENOMEM | (Linux 2.4 and earlier) the calling process tried to lock more than half of RAM. |
EPERM | (Linux 2.6.9 and later) the caller was not privileged (CAP_IPC_LOCK) and its RLIMIT_MEMLOCK soft resource limit was 0. |
EPERM | (Linux 2.6.8 and earlier) The calling process has insufficient privilege to call munlockall(). Under Linux the CAP_IPC_LOCKcapability is required. |
For mlock() and munlock(): | |
EINVAL | len was negative. |
EINVAL | (Not on Linux) addr was not a multiple of the page size. |
ENOMEM | Some of the specified address range does not correspond to mapped pages in the address space of the process. |
For mlockall(): | |
EINVAL | Unknown flags were specified. |
For munlockall(): | |
EPERM | (Linux 2.6.8 and earlier) The caller was not privileged (CAP_IPC_LOCK). |
BUGS
In the 2.4 series Linux kernels up to and including 2.4.17, a bug caused the mlockall()MCL_FUTURE flag to be inherited across a fork(2). This was rectified in kernel 2.4.18.
Since kernel 2.6.9, if a privileged process calls mlockall(MCL_FUTURE) and later drops privileges (loses the CAP_IPC_LOCK capability by, for example, setting its effective UID to a non-zero value), then subsequent memory allocations (e.g., mmap(2), brk(2)) will fail if the RLIMIT_MEMLOCK resource limit is encountered.
可用性
On POSIX systems on which mlock() and munlock() are available,_POSIX_MEMLOCK_RANGE is defined in <unistd.h> and the number of bytes in a page can be determined from the constant PAGESIZE (if defined) in <limits.h> or by calling sysconf(_SC_PAGESIZE).
On POSIX systems on which mlockall() and munlockall() are available,_POSIX_MEMLOCK is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
POSIX.1-2001, SVr4
另请参阅
munmap()函数
mmap, munmap - 映射或取消映射文件或设备到内存
内容简介
#include <sys/mman.h> void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); int munmap(void *start, size_t length); |
描述
The mmap() function asks to map length bytes starting at offset offset from the file (or other object) specified by the file descriptor fd into memory, preferably at address start. This latter address is a hint only, and is usually specified as 0. The actual place where the object is mapped is returned by mmap().
The prot argument describes the desired memory protection (and must not conflict with the open mode of the file). It is either PROT_NONE or is the bitwise OR of one or more of the other PROT_* flags.
标签 | 描述 |
PROT_EXEC | Pages may be executed. |
PROT_READ | Pages may be read. |
PROT_WRITE | Pages may be written. |
PROT_NONE | Pages may not be accessed. |
The flags parameter specifies the type of the mapped object, mapping options and whether modifications made to the mapped copy of the page are private to the process or are to be shared with other references. It has bits
标签 | 描述 |
MAP_FIXED | Do not select a different address than the one specified. If the memory region specified by start and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail. If MAP_FIXED is specified, start must be a multiple of the page size. Use of this option is discouraged. |
MAP_SHARED | Share this mapping with all other processes that map this object. Storing to the region is equivalent to writing to the file. The file may not actually be updated until msync(2) ormunmap(2) are called. |
MAP_PRIVATE | Create a private copy-on-write mapping. Stores to the region do not affect the original file. It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region. |
You must specify exactly one of MAP_SHARED and MAP_PRIVATE.
The above three flags are described in POSIX.1-2001. Linux also knows about the following non-standard flags:
标签 | 描述 |
MAP_DENYWRITE | |
This flag is ignored. (Long ago, it signalled that attempts to write to the underlying file should fail with ETXTBUSY. But this was a source of denial-of-service attacks.) | |
MAP_EXECUTABLE | |
This flag is ignored. | |
MAP_NORESERVE | |
Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available. See also the discussion of the file /proc/sys/vm/overcommit_memoryin proc(5). In kernels before 2.6, this flag only had effect for private writable mappings. | |
MAP_LOCKED (since Linux 2.5.37) | |
Lock the pages of the mapped region into memory in the manner of mlock(). This flag is ignored in older kernels. | |
MAP_GROWSDOWN | |
Used for stacks. Indicates to the kernel VM system that the mapping should extend downwards in memory. | |
MAP_ANONYMOUS | |
The mapping is not backed by any file; the fd and offsetarguments are ignored. The use of this flag in conjunction withMAP_SHARED is only supported on Linux since kernel 2.4. | |
MAP_ANON | |
Alias for MAP_ANONYMOUS. Deprecated. | |
MAP_FILE | |
Compatibility flag. Ignored. | |
MAP_32BIT | |
Put the mapping into the first 2GB of the process address space. Ignored when MAP_FIXED is set. This flag is currently only supported on x86-64 for 64bit programs. | |
MAP_POPULATE (since Linux 2.5.46) | |
Populate (prefault) page tables for a file mapping, by performing read-ahead on the file. Later accesses to the mapping will not be bocked by page faults. | |
MAP_NONBLOCK (since Linux 2.5.46) | |
Only meaningful in conjunction with MAP_POPULATE. Don’t perform read-ahead: only create page tables entries for pages that are already present in RAM. |
Some systems document the additional flags MAP_AUTOGROW, MAP_AUTORESRV, MAP_COPY, and MAP_LOCAL.
fd should be a valid file descriptor, unless MAP_ANONYMOUS is set. IfMAP_ANONYMOUS is set, then fd is ignored on Linux. However, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should ensure this.
offset should be a multiple of the page size as returned by getpagesize(2).
Memory mapped by mmap() is preserved across fork(2), with the same attributes.
A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.
The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.
The address start must be a multiple of the page size. All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. It is not an error if the indicated range does not contain any mapped pages.
For file-backed mappings, the st_atime field for the mapped file may be updated at any time between the mmap() and the corresponding unmapping; the first reference to a mapped page will update the field if it has not been already.
The st_ctime and st_mtime field for a file mapped with PROT_WRITE andMAP_SHARED will be updated after a write to the mapped region, and before a subsequent msync() with the MS_SYNC or MS_ASYNC flag, if one occurs.
返回值
On success, mmap() returns a pointer to the mapped area. On error, the valueMAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. On success, munmap() returns 0, on failure -1, and errno is set (probably to EINVAL).
注意
It is architecture dependent whether PROT_READ includes PROT_EXEC or not. Portable programs should always set PROT_EXEC if they intend to execute code in the new mapping.
错误
标签 | 描述 |
EACCES | A file descriptor refers to a non-regular file. Or MAP_PRIVATEwas requested, but fd is not open for reading. Or MAP_SHAREDwas requested and PROT_WRITE is set, but fd is not open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the file is append-only. |
EAGAIN | The file has been locked, or too much memory has been locked (see setrlimit(2)). |
EBADF | fd is not a valid file descriptor (and MAP_ANONYMOUS was not set). |
EINVAL | We don’t like start or length or offset. (E.g., they are too large, or not aligned on a page boundary.) |
ENFILE | The system limit on the total number of open files has been reached. |
ENODEV | The underlying filesystem of the specified file does not support memory mapping. |
ENOMEM | No memory is available, or the process’s maximum number of mappings would have been exceeded. |
EPERM | The prot argument asks for PROT_EXEC but the mapped area belongs to a file on a filesystem that was mounted no-exec. |
ETXTBSY | |
MAP_DENYWRITE was set but the object specified by fd is open for writing. | |
Use of a mapped region can result in these signals: | |
SIGSEGV | |
Attempted write into a region mapped as read-only. | |
SIGBUS | Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file). |
可用性
On POSIX systems on which mmap(), msync() and munmap() are available,_POSIX_MAPPED_FILES is defined in <unistd.h> to a value greater than 0. (See alsosysconf(3).)
遵循于
SVr4, 4.4BSD, POSIX.1-2001.
BUGS
On Linux there are no guarantees like those suggested above underMAP_NORESERVE. By default, any process can be killed at any moment when the system runs out of memory.
In kernels before 2.6.7, the MAP_POPULATE flag only has effect if prot is specified asPROT_NONE.
另请参阅
B.O. Gallmeister, POSIX.4, O’Reilly, pp. 128-129 and 389-391.
nanosleep()函数
内容简介
#define _POSIX_C_SOURCE 199309 #include <time.h>
int nanosleep(const struct timespec *req, struct timespec *rem);
DESCRIPTION
nanosleep() delays the execution of the program for at least the time specified in *req. The function can return earlier if a signal has been delivered to the process. In this case, it returns -1, sets errno to EINTR, and writes the remaining time into the structure pointed to by rem unless rem is NULL. The value of *rem can then be used to callnanosleep() again and complete the specified pause.
The structure timespec is used to specify intervals of time with nanosecond precision. It is specified in <time.h> and has the form
struct timespec { |
The value of the nanoseconds field must be in the range 0 to 999999999.
Compared to sleep(3) and usleep(3), nanosleep() has the advantage of not affecting any signals, it is standardized by POSIX, it provides higher timing resolution, and it allows to continue a sleep that has been interrupted by a signal more easily.
返回值
On successfully sleeping for the requested interval, nanosleep() returns 0. If the call is interrupted by a signal handler or encounters an error, then it returns -1, with errno set to indicate the error.
错误
标签 | 描述 |
EFAULT | 问题从用户空间复制信息。 |
EINTR | The pause has been interrupted by a non-blocked signal that was delivered to the process. The remaining sleep time has been written into *rem so that the process can easily callnanosleep() again and continue with the pause. |
EINVAL | The value in the tv_nsec field was not in the range 0 to 999999999 or tv_sec was negative. |
BUGS
The current implementation of nanosleep() is based on the normal kernel timer mechanism, which has a resolution of 1/HZ s (see time(7)). Therefore, nanosleep() pauses always for at least the specified time, however it can take up to 10 ms longer than specified until the process becomes runnable again. For the same reason, the value returned in case of a delivered signal in *rem is usually rounded to the next larger multiple of 1/HZ s.
Old behaviour
In order to support applications requiring much more precise pauses (e.g., in order to control some time-critical hardware), nanosleep() would handle pauses of up to 2 ms by busy waiting with microsecond precision when called from a process scheduled under a real-time policy like SCHED_FIFO or SCHED_RR. This special extension was removed in kernel 2.5.39, hence is still present in current 2.4 kernels, but not in 2.6 kernels.
In Linux 2.4, if nanosleep() is stopped by a signal (e.g., SIGTSTP), then the call fails with the error EINTR after the process is resumed by a SIGCONT signal. If the system call is subsequently restarted, then the time that the process spent in the stopped state isnot counted against the sleep interval.
遵循于
POSIX.1-2001.
另请参阅
_newselect()函数
select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - 同步I / O复用
内容简介
/* According to POSIX.1-2001 */ int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); void FD_CLR(int fd, fd_set *set); int FD_ISSET(int fd, fd_set *set); void FD_SET(int fd, fd_set *set); void FD_ZERO(fd_set *set); #define _XOPEN_SOURCE 600 #include <sys/select.h> int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask); |
描述
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
The operation of select() and pselect() is identical, with three differences:
S.N. | 描述 |
(i) | select() uses a timeout that is a struct timeval (with seconds and microseconds), while pselect() uses a struct timespec (with seconds and nanoseconds). |
(ii) | select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument. |
(iii) | select() has no sigmask argument, and behaves as pselect() called with NULL sigmask. |
Three independent sets of file descriptors are watched. Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file),those inwritefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions. On exit, the sets are modified in place to indicate which file descriptors actually changed status. Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events.
Four macros are provided to manipulate the sets.
- FD_ZERO() clears a set.
- FD_SET() and
- FD_CLR() respectively add and remove a given file descriptor from a set.
- FD_ISSET() tests to see if a file descriptor is part of the set;
this is useful after select() returns.
nfds is the highest-numbered file descriptor in any of the three sets, plus 1.
timeout is an upper bound on the amount of time elapsed before select() returns. It may be zero, causing select() to return immediately. (This is useful for polling.) Iftimeout is NULL (no timeout), select() can block indefinitely.
sigmask is a pointer to a signal mask (see sigprocmask(2)); if it is not NULL, thenpselect() first replaces the current signal mask by the one pointed to by sigmask, then does the ‘select’ function, and then restores the original signal mask.
Other than the difference in the precision of the timeout argument, the followingpselect() call:
ready = pselect(nfds, &readfds, &writefds, &exceptfds, |
is equivalent to atomically executing the following calls:
sigset_t origmask; |
The reason that pselect() is needed is that if one wants to wait for either a signal or for a file descriptor to become ready, then an atomic test is needed to prevent race conditions. (Suppose the signal handler sets a global flag and returns. Then a test of this global flag followed by a call of select() could hang indefinitely if the signal arrived just after the test but just before the call. By contrast, pselect() allows one to first block signals, handle the signals that have come in, then call pselect() with the desiredsigmask, avoiding the race.)
The timeout
The time structures involved are defined in <sys/time.h> and look like
struct timeval { |
and
struct timespec { |
(However, see below on the POSIX.1-2001 versions.). Some code calls select() with all three sets empty, n zero, and a non-NULL timeout as a fairly portable way to sleep with subsecond precision.
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behaviour.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multipleselect()s in a loop without reinitializing it. Consider timeout to be undefined afterselect() returns.
返回值
On success, select() and pselect() return the number of file descriptors contained in the three returned descriptor sets (that is, the total number of bits that are set in readfds,writefds, exceptfds) which may be zero if the timeout expires before anything interesting happens. On error, -1 is returned, and errno is set appropriately; the sets and timeoutbecome undefined, so do not rely on their contents after an error.
错误
标签 | 描述 |
EBADF | An invalid file descriptor was given in one of the sets. (Perhaps a file descriptor that was already closed, or one on which an error has occurred.) |
EINTR | A signal was caught. |
EINVAL | nfds is negative or the value contained withintimeout is invalid. |
ENOMEM | unable to allocate memory for internal tables. |
EXAMPLE
#include <stdio.h> |
遵循于
select() conforms to POSIX.1-2001 and 4.4BSD
pselect() is defined in POSIX.1g, and in POSIX.1-2001.
nfsservctl()函数
nfsservctl - 系统调用接口,内核中NFS守护进程
内容简介
#include <linux/nfsd/syscall.h>
nfsservctl(int cmd, struct nfsctl_arg *argp, union nfsctl_res *resp);
描述
/* struct nfsctl_arg { int ca_version; /* safeguard */ union { struct nfsctl_svc u_svc; struct nfsctl_client u_client; struct nfsctl_export u_export; struct nfsctl_uidmap u_umap; struct nfsctl_fhparm u_getfh; unsigned int u_debug; } u; } union nfsctl_res { struct knfs_fh cr_getfh; unsigned int cr_debug; }; |
RETURN VALUE
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
CONFORMING TO
This call is Linux-specific.
nice()函数
内容简介
#include <unistd.h>
int nice(int inc);
描述
nice() 增加inc 为调用进程的nice值. (A higher nice value means a low priority.) Only the super user may specify a negative increment, or priority increase. The range for nice values is described in getpriority(2).
返回值
On success, the new nice value is returned (but see NOTES below). On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EPERM | The calling process attempted to increase its priority by supplying a negative inc but has insufficient privileges. Under Linux the CAP_SYS_NICE capability is required. (But see the discussion of the RLIMIT_NICE resource limit in setrlimit(2).) |
遵循于
SVr4, 4.3BSD, POSIX.1-2001. However, the Linux and (g)libc (earlier than glibc 2.2.4) return value is nonstandard, see below. SVr4 documents an additional EINVAL error code.
注意
SUSv2 and POSIX.1-2001 specify that nice() should return the new nice value. However, the Linux syscall and the nice() library function provided in older versions of (g)libc (earlier than glibc 2.2.4) return 0 on success. The new nice value can be found usinggetpriority(2).
Since glibc 2.2.4, nice() is implemented as a library function that calls getpriority(2) to obtain the new nice value to be returned to the caller. With this implementation, a successful call can legitimately return -1. To reliably detect an error, set errno to 0 before the call, and check its value when nice() returns -1.
另请参阅
obsolete()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
在Linux2.0内核中实现了这些调用,以支持旧的可执行文件. 这些调用返回,因为他们的第一个实现,它已经成长结构,但旧的可执行文件必须继续接受旧较小的结构。
当前可执行文件应该与当前的库链接并不会使用这些调用。
遵循于
这些调用是唯一的Linux和不应该被用来在所有的新方案。
另请参阅
oldfstat()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
The Linux 2.0 kernel implements these calls to support old executables. These calls return structures which have grown since their first implementation, but old executables must continue to receive old smaller structures.
Current executables should be linked with current libraries and never use these calls.
CONFORMING TO
These calls are unique to Linux and should not be used at all in new programs.
SEE ALSO
oldlstat()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
The Linux 2.0 kernel implements these calls to support old executables. These calls return structures which have grown since their first implementation, but old executables must continue to receive old smaller structures.
Current executables should be linked with current libraries and never use these calls.
CONFORMING TO
These calls are unique to Linux and should not be used at all in new programs.
SEE ALSO
oldolduname()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
The Linux 2.0 kernel implements these calls to support old executables. These calls return structures which have grown since their first implementation, but old executables must continue to receive old smaller structures.
Current executables should be linked with current libraries and never use these calls.
遵循于
这些调用是唯一的Linux和不应该被用来在所有的新方案。
另请参阅
oldstat()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
The Linux 2.0 kernel implements these calls to support old executables. These calls return structures which have grown since their first implementation, but old executables must continue to receive old smaller structures.
Current executables should be linked with current libraries and never use these calls.
遵循于
These calls are unique to Linux and should not be used at all in new programs.
另请参阅
olduname()函数
oldfstat, oldlstat, oldstat, oldolduname, olduname - 过时的系统调用
内容简介
Obsolete system calls. |
描述
The Linux 2.0 kernel implements these calls to support old executables. These calls return structures which have grown since their first implementation, but old executables must continue to receive old smaller structures.
Current executables should be linked with current libraries and never use these calls.
遵循于
These calls are unique to Linux and should not be used at all in new programs.
另请参阅
openat()函数
内容简介
#include <fcntl.h>
int openat(int
dirfd
, const char *
pathname
, int
flags
);
int openat(int
dirfd
, const char *
pathname
, int
flags
", mode_t " mode );
The openat() system call operates in exactly the same way as open(2), except for the differences described in this manual page.
描述
If the pathname given in pathname is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by open(2) for a relative pathname).
If the pathname given in pathname is relative and dirfd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like open(2)).
If the pathname given in pathname is absolute, then dirfd is ignored.
返回值
On success, openat() returns a new file descriptor. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for open(2) can also occur for openat(). The following additional errors can occur for openat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
ENOTDIR | |
pathname is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
openat() and other similar system calls suffixed "at" are supported for two reasons.
First, openat() allows an application to avoid race conditions that could occur when using open(2) to open files in directories other than the current working directory. These race conditions result from the fact that some component of the directory prefix given to open() could be changed in parallel with the call to open(). Such races can be avoided by opening a file descriptor for the target directory, and then specifying that file descriptor as the dirfd argument of openat().
Second, openat() allows the implementation of a per-thread "current working directory", via file descriptor(s) maintained by the application. (This functionality can also be obtained by tricks based on the use of /proc/self/fd/dirfd, but less efficiently.)
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1. A similar system call exists on Solaris.
版本
openat() was added to Linux in kernel 2.6.16.
另请参阅
- faccessat (2)
- fchmodat (2)
- fchownat (2)
- fstatat (2)
- futimesat (2)
- linkat (2)
- mkdirat (2)
- mknodat (2)
- open (2)
- path_resolution (2)
- readlinkat (2)
- renameat (2)
- symlinkat (2)
- unlinkat (2)
open()函数
内容简介
#include <sys/types.h> |
描述
Given a pathname for a file, open() returns a file descriptor, a small, non-negative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.). The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
The new file descriptor is set to remain open across an execve(2) (i.e., theFD_CLOEXEC file descriptor flag described in fcntl(2) is initially disabled). The file offset is set to the beginning of the file (see lseek(2)).
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via thefcntl() F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2).
The parameter flags must include one of the following access modes: O_RDONLY,O_WRONLY, or O_RDWR. These request opening the file read-only, write-only, or read/write, respectively.
In addition, zero or more file creation flags and file status flags can be bitwise-or’d inflags. The file creation flags are O_CREAT, O_EXCL, O_NOCTTY, and O_TRUNC. The file status flags are all of the remaining flags listed below. The distinction between these two groups of flags is that the file status flags can be retrieved and (in some cases) modified using fcntl(2). The full list of file creation flags and file status flags is as follows:
标签 | 描述 |
O_APPEND | |
The file is opened in append mode. Before each write(), the file offset is positioned at the end of the file, as if with lseek().O_APPEND may lead to corrupted files on NFS file systems if more than one process appends data to a file at once. This is because NFS does not support appending to a file, so the client kernel has to simulate it, which can’t be done without a race condition. | |
O_ASYNC | |
Enable signal-driven I/O: generate a signal (SIGIO by default, but this can be changed via fcntl(2)) when input or output becomes possible on this file descriptor. This feature is only available for terminals, pseudo-terminals, sockets, and (since Linux 2.6) pipes and FIFOs. See fcntl(2) for further details. | |
O_CREAT | |
If the file does not exist it will be created. The owner (user ID) of the file is set to the effective user ID of the process. The group ownership (group ID) is set either to the effective group ID of the process or to the group ID of the parent directory (depending on filesystem type and mount options, and the mode of the parent directory, see, e.g., the mount optionsbsdgroups and sysvgroups of the ext2 filesystem, as described inmount(8)). | |
O_DIRECT | |
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of a read(2) or write(2), data is guaranteed to have been transferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment must fit the block size of the device. A semantically similar (but deprecated) interface for block devices is described in raw(8). | |
O_DIRECTORY | |
If pathname is not a directory, cause the open to fail. This flag is Linux-specific, and was added in kernel version 2.1.126, to avoid denial-of-service problems if opendir(3) is called on a FIFO or tape device, but should not be used outside of the implementation of opendir. | |
O_EXCL | When used with O_CREAT, if the file already exists it is an error and the open() will fail. In this context, a symbolic link exists, regardless of where it points to. O_EXCL is broken on NFS file systems; programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful. |
O_LARGEFILE | |
(LFS) Allow files whose sizes cannot be represented in an off_t(but can be represented in an off64_t) to be opened. | |
O_NOATIME | |
(Since Linux 2.6.8) Do not update the file last access time (st_atime in the inode) when the file is read(2). This flag is intended for use by indexing or backup programs, where its use can significantly reduce the amount of disk activity. This flag may not be effective on all filesystems. One example is NFS, where the server maintains the access time. | |
O_NOCTTY | |
If pathname refers to a terminal device — see tty(4) — it will not become the process’s controlling terminal even if the process does not have one. | |
O_NOFOLLOW | |
If pathname is a symbolic link, then the open fails. This is a FreeBSD extension, which was added to Linux in version 2.1.126. Symbolic links in earlier components of the pathname will still be followed. | |
O_NONBLOCK or O_NDELAY | |
When possible, the file is opened in non-blocking mode. Neither the open() nor any subsequent operations on the file descriptor which is returned will cause the calling process to wait. For the handling of FIFOs (named pipes), see also fifo(7). For a discussion of the effect of O_NONBLOCK in conjunction with mandatory file locks and with file leases, see fcntl(2). | |
O_SYNC | The file is opened for synchronous I/O. Any write()s on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware.But see RESTRICTIONS below. |
O_TRUNC | |
If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified. | |
Some of these optional flags can be altered using fcntl() after the file has been opened. The argument mode specifies the permissions to use in case a new file is created. It is modified by the process’s umask in the usual way: the permissions of the created file are (mode & ~umask). Note that this mode only applies to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor. | |
The following symbolic constants are provided for mode: | |
S_IRWXU | |
00700 user (file owner) has read, write and execute permission | |
S_IRUSR | |
00400 user has read permission | |
S_IWUSR | |
00200 user has write permission | |
S_IXUSR | |
00100 user has execute permission | |
S_IRWXG | |
00070 group has read, write and execute permission | |
S_IRGRP | |
00040 group has read permission | |
S_IWGRP | |
00020 group has write permission | |
S_IXGRP | |
00010 group has execute permission | |
S_IRWXO | |
00007 others have read, write and execute permission | |
S_IROTH | |
00004 others have read permission | |
S_IWOTH | |
00002 others have write permission | |
S_IXOTH | |
00001 others have execute permission |
mode must be specified when O_CREAT is in the flags, and is ignored otherwise.
creat() is equivalent to open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC.
返回值
open() and creat() return the new file descriptor, or -1 if an error occurred (in which case, errno is set appropriately).
注意
Note that open() can open device special files, but creat() cannot create them; usemknod(2) instead.
On NFS file systems with UID mapping enabled, open() may return a file descriptor but e.g. read(2) requests are denied with EACCES. This is because the client performsopen() by checking the permissions, but UID mapping is performed by the server upon read and write requests.
If the file is newly created, its st_atime, st_ctime, st_mtime fields (respectively, time of last access, time of last status change, and time of last modification; see stat(2)) are set to the current time, and so are the st_ctime and st_mtime fields of the parent directory. Otherwise, if the file is modified because of the O_TRUNC flag, its st_ctime and st_mtime fields are set to the current time.
错误
标签 | 描述 |
EACCES | The requested access to the file is not allowed, or search permission is denied for one of the directories in the path prefix of pathname, or the file did not exist yet and write access to the parent directory is not allowed. (See also path_resolution(2).) |
EEXIST | pathname already exists and O_CREAT and O_EXCL were used. |
EFAULT | pathname points outside your accessible address space. |
EISDIR | pathname refers to a directory and the access requested involved writing (that is, O_WRONLY or O_RDWR is set). |
ELOOP | Too many symbolic links were encountered in resolvingpathname, or O_NOFOLLOW was specified but pathname was a symbolic link. |
EMFILE | The process already has the maximum number of files open. |
ENAMETOOLONG | |
pathname was too long. | |
ENFILE | The system limit on the total number of open files has been reached. |
ENODEV | pathname refers to a device special file and no corresponding device exists. (This is a Linux kernel bug; in this situation ENXIO must be returned.) |
ENOENT | O_CREAT is not set and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | pathname was to be created but the device containingpathname has no room for the new file. |
ENOTDIR | |
A component used as a directory in pathname is not, in fact, a directory, or O_DIRECTORY was specified and pathname was not a directory. | |
ENXIO | O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no process has the file open for reading. Or, the file is a device special file and no corresponding device exists. |
EOVERFLOW | |
pathname refers to a regular file, too large to be opened; see O_LARGEFILE above. | |
EPERM | The O_NOATIME flag was specified, but the effective user ID of the caller did not match the owner of the file and the caller was not privileged (CAP_FOWNER). |
EROFS | pathname refers to a file on a read-only filesystem and write access was requested. |
ETXTBSY | |
pathname refers to an executable image which is currently being executed and write access was requested. | |
EWOULDBLOCK | |
The O_NONBLOCK flag was specified, and an incompatible lease was held on the file (see fcntl(2)). |
注意
Under Linux, the O_NONBLOCK flag indicates that one wants to open but does not necessarily have the intention to read or write. This is typically used to open devices in order to get a file descriptor for use with ioctl(2).
遵循于
SVr4, 4.3BSD, POSIX.1-2001. The O_NOATIME, O_NOFOLLOW, and O_DIRECTORYflags are Linux-specific. One may have to define the _GNU_SOURCE macro to get their definitions.
The (undefined) effect of O_RDONLY | O_TRUNC varies among implementations. On many systems the file is actually truncated.
The O_DIRECT flag was introduced in SGI IRIX, where it has alignment restrictions similar to those of Linux 2.4. IRIX has also a fcntl(2) call to query appropriate alignments, and sizes. FreeBSD 4.x introduced a flag of same name, but without alignment restrictions. Support was added under Linux in kernel version 2.4.10. Older Linux kernels simply ignore this flag. One may have to define the _GNU_SOURCE macro to get its definition.
BUGS
"The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." — Linus
Currently, it is not possible to enable signal-driven I/O by specifying O_ASYNC when calling open(); use fcntl(2) to enable this flag.
RESTRICTIONS
There are many infelicities in the protocol underlying NFS, affecting amongst othersO_SYNC and O_NDELAY.
POSIX provides for three different variants of synchronised I/O, corresponding to the flags O_SYNC, O_DSYNC and O_RSYNC. Currently (2.1.130) these are all synonymous under Linux.
另请参阅
- close (2)
- dup (2)
- fcntl (2)
- link (2)
- lseek (2)
- mknod (2)
- mount (2)
- mmap (2)
- openat (2)
- path_resolution (2)
- read (2)
- socket (2)
- stat (2)
- umask (2)
- unlink (2)
- write (2)
outb()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outb_p()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outsb()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outsl()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the port argument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outsw()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outw()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
outw_p()函数
outb, outw, outl, outsb, outsw, outsl, inb, inw, inl, insb, insw, insl, outb_p, outw_p, outl_p, inb_p, inw_p, inl_p - 端口 I/O
描述
This family of functions is used to do low level port input and output. The out* functions do port output, the in* functions do port input; the b-suffix functions are byte-width and the w-suffix functions word-width; the _p-suffix functions pause until the I/O completes.
They are primarily designed for internal kernel use, but can be used from user space.
You compile with -O or -O2 or similar. The functions are defined as inline macros, and will not be substituted in without optimization enabled, causing unresolved references at link time.
You use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the user space application to access the I/O ports in question. Failure to do this will cause the application to receive a segmentation fault.
遵循于
outb() and friends are hardware specific. The value argument is passed first and the portargument is passed second, which is the opposite order from most DOS implementations.
另请参阅
path_resolution()函数
Unix / Linux路径解析 - 查找一个文件名所指的文件
描述
一些Unix/ Linux的系统调用作为参数的一个或多个文件名。文件名(或路径)的解析方式如下。
Step 1: 开始解析过程
If the pathname starts with the ’/’ character, the starting lookup directory is the root directory of the current process. (A process inherits its root directory from its parent. Usually this will be the root directory of the file hierarchy. A process may get a different root directory by use of the chroot(2) system call. A process may get an entirely private namespace in case it — or one of its ancestors — was started by an invocation of theclone(2) system call that had the CLONE_NEWNS flag set.) This handles the ’/’ part of the pathname.
If the pathname does not start with the ’/’ character, the starting lookup directory of the resolution process is the current working directory of the process. (This is also inherited from the parent. It can be changed by use of the chdir(2) system call.)
Pathnames starting with a ’/’ character are called absolute pathnames. Pathnames not starting with a ’/’ are called relative pathnames.
Step 2: 沿着路径走
Set the current lookup directory to the starting lookup directory. Now, for each non-final component of the pathname, where a component is a substring delimited by ’/’ characters, this component is looked up in the current lookup directory.
If the process does not have search permission on the current lookup directory, an EACCES error is returned ("Permission denied").
If the component is not found, an ENOENT error is returned ("No such file or directory").
If the component is found, but is neither a directory nor a symbolic link, an ENOTDIR error is returned ("Not a directory").
If the component is found and is a directory, we set the current lookup directory to that directory, and go to the next component.
If the component is found and is a symbolic link (symlink), we first resolve this symbolic link (with the current lookup directory as starting lookup directory). Upon error, that error is returned. If the result is not a directory, an ENOTDIR error is returned. If the resolution of the symlink is successful and returns a directory, we set the current lookup directory to that directory, and go to the next component. Note that the resolution process here involves recursion. In order to protect the kernel against stack overflow, and also to protect against denial of service, there are limits on the maximum recursion depth, and on the maximum number of symlinks followed. An ELOOP error is returned when the maximum is exceeded ("Too many levels of symbolic links").
Step 3: 找到最后一项
The lookup of the final component of the pathname goes just like that of all other components, as described in the previous step, with two differences: (i) the final component need not be a directory (at least as far as the path resolution process is concerned — it may have to be a directory, or a non-directory, because of the requirements of the specific system call), and (ii) it is not necessarily an error if the component is not found — maybe we are just creating it. The details on the treatment of the final entry are described in the manual pages of the specific system calls.
. and ..
By convention, every directory has the entries "." and "..", which refer to the directory itself and to its parent directory, respectively.
The path resolution process will assume that these entries have their conventional meanings, regardless of whether they are actually present in the physical filesystem.
One cannot walk down past the root: "/.." is the same as "/".
挂载点
After a "mount dev path" command, the pathname "path" refers to the root of the filesystem hierarchy on the device "dev", and no longer to whatever it referred to earlier.
One can walk out of a mounted filesystem: "path/.." refers to the parent directory of "path", outside of the filesystem hierarchy on "dev".
尾随斜线
If a pathname ends in a ’/’, that forces resolution of the preceding component as in Step 2: it has to exist and resolve to a directory. Otherwise a trailing ’/’ is ignored. (Or, equivalently, a pathname with a trailing ’/’ is equivalent to the pathname obtained by appending ’.’ to it.)
最后的符号链接
If the last component of a pathname is a symbolic link, then it depends on the system call whether the file referred to will be the symbolic link or the result of path resolution on its contents. For example, the system call lstat(2) will operate on the symlink, whilestat(2) operates on the file pointed to by the symlink.
长度限制
There is a maximum length for pathnames. If the pathname (or some intermediate pathname obtained while resolving symbolic links) is too long, an ENAMETOOLONG error is returned ("File name too long").
空路径名
In the original Unix, the empty pathname referred to the current directory. Nowadays POSIX decrees that an empty pathname must not be resolved successfully. Linux returns ENOENT in this case.
权限
The permission bits of a file consist of three groups of three bits, cf. chmod(1) andstat(2). The first group of three is used when the effective user ID of the current process equals the owner ID of the file. The second group of three is used when the group ID of the file either equals the effective group ID of the current process, or is one of the supplementary group IDs of the current process (as set by setgroups(2)). When neither holds, the third group is used.
Of the three bits used, the first bit determines read permission, the second write permission, and the last execute permission in case of ordinary files, or search permission in case of directories.
Linux uses the fsuid instead of the effective user ID in permission checks. Ordinarily the fsuid will equal the effective user ID, but the fsuid can be changed by the system callsetfsuid(2).
(Here "fsuid" stands for something like "file system user ID". The concept was required for the implementation of a user space NFS server at a time when processes could send a signal to a process with the same effective user ID. It is obsolete now. Nobody should use setfsuid(2).)
Similarly, Linux uses the fsgid ("file system group ID") instead of the effective group ID. See setfsgid(2).
绕过权限检查:超级用户和功能
On a traditional Unix system, the superuser (root, user ID 0) is all-powerful, and bypasses all permissions restrictions when accessing files.
On Linux, superuser privileges are divided into capabilities (see capabilities(7)). Two capabilities are relevant for file permissions checks: CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. (A process has these capabilities if its fsuid is 0.)
The CAP_DAC_OVERRIDE capability overrides all permission checking, but only grants execute permission when at least one of the file’s three execute permission bits is set.
The CAP_DAC_READ_SEARCH capability grants read and search permission on directories, and read permission on ordinary files.
另请参阅
pause()函数
内容简介
#include <unistd.h>
int pause(void);
描述
pause() 库函数使调用进程(或线程)睡眠状态,直到接收到信号,要么终止,或导致它调用一个信号捕获函数。
返回值
The pause() function only returns when a signal was caught and the signal-catching function returned. In this case pause() returns -1, and errno is set to EINTR.
错误
标签 | 描述 |
EINTR | a signal was caught and the signal-catching function returned. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
另请参阅
perfmonctl()函数
内容简介
#include <syscall.h> |
描述
perfmonctl system call provides an interface to PMU (performance monitoring unit). PMU consists of PMD (performance monitoring data) registers and PMC (performance monitoring control) registers, where are gathered the hardware statistic.
perfmonctl will apply a function cmd to input arguments arg. The number of arguments is defined by input variable narg. fd specifies the perfmon context to operate on.
实现的 cmd 命令是:
标签 | 描述 | |
PFM_CREATE_CONTEXT | ||
set up a context
The fd parameter is ignored. A new context is created as specified in ctxtand its file descriptor is returned in ctxt->ctx_fd. The file descriptor, apart from passing it to perfmonctl, can be used to read event notifications (type pfm_msg_t) using the read(2) system call. Both select(2) and poll(2) can be used to wait for event notifications. The context can be destroyed using the close(2) system call. | ||
PFM_WRITE_PMCS | ||
set PMC registers
| ||
PFM_WRITE_PMDS | ||
set PMD registers
| ||
PFM_READ_PMDS | ||
read PMD registers
| ||
PFM_START | ||
start monitoring
| ||
PFM_STOP | ||
stop monitoring
| ||
PFM_LOAD_CONTEXT | ||
attach the context to a thread
| ||
PFM_UNLOAD_CONTEXT | ||
detach the context from a thread
| ||
PFM_RESTART | ||
restart monitoring after recieving an overflow notification
| ||
PFM_CREATE_EVTSETS | ||
create or modify event sets
| ||
PFM_DELETE_EVTSETS | ||
delete event sets
| ||
PFM_GETINFO_EVTSETS | ||
get information about event sets
|
返回值
performctl returns zero when the operation is successful. On error -1 is returned and an error code is set in errno.
可用性
This syscall is implemented only on the IA-64 architecture since kernel 2.6.
另请参阅
personality()函数
内容简介
#include <sys/personality.h>
int personality(unsigned long persona);
描述
Linux supports different execution domains, or personalities, for each process. Among other things, execution domains tell Linux how to map signal numbers into signal actions. The execution domain system allows Linux to provide limited support for binaries compiled under other Unix-like operating systems.
This function will return the current personality() when persona equals 0xffffffff. Otherwise, it will make the execution domain referenced by persona the new execution domain of the current process.
返回值
On success, the previous persona is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | The kernel was unable to change the personality. |
遵循于
personality() 是Linux特有的,应在拟移植的程序不能使用。
pipe()函数
内容简介
#include <unistd.h>
int pipe(int filedes[2]);
描述
pipe() creates a pair of file descriptors, yiibaiing to a pipe inode, and places them in the array yiibaied to by filedes. filedes[0] is for reading, filedes[1] is for writing.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | filedes is not valid. |
EMFILE | Too many file descriptors are in use by the process. |
ENFILE | The system limit on the total number of open files has been reached. |
遵循于
POSIX.1-2001.
实例
The following program creates a pipe, and then fork(2)s to create a child process. After the fork(2), each process closes the descriptors that it doesn’t need for the pipe (seepipe(7)). The parent then writes the string contained in the program’s command-line argument to the pipe, and the child reads this string a byte at a time from the pipe and echoes it on standard output.
#include <sys/wait.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int
main(int argc, char *argv[])
{
int pfd[2];
pid_t cpid;
char buf;
assert(argc == 2);
if (pipe(pfd) == -1) { perror("pipe"); exit(EXIT_FAILURE); }
cpid = fork();
if (cpid == -1) { perror("fork"); exit(EXIT_FAILURE); }
if (cpid == 0) { /* Child reads from pipe */
close(pfd[1]); /* Close unused write end */
while (read(pfd[0], &buf, 1) > 0)
write(STDOUT_FILENO, &buf, 1);
write(STDOUT_FILENO, "
", 1);
close(pfd[0]);
_exit(EXIT_SUCCESS);
} else { /* Parent writes argv[1] to pipe */
close(pfd[0]); /* Close unused read end */
write(pfd[1], argv[1], strlen(argv[1]));
close(pfd[1]); /* Reader will see EOF */
wait(NULL); /* Wait for child */
exit(EXIT_SUCCESS);
}
}
另请参阅
pivot_root()函数
内容简介
int pivot_root(const char *new_root, const char *put_old);
描述
pivot_root() 将当前进程的根文件系统的目录put_oldand使得new_root当前进程的新的根文件系统。
The typical use of pivot_root() is during system startup, when the system mounts a temporary root file system (e.g. an initrd), then mounts the real root file system, and eventually turns the latter into the current root of all relevant processes or threads.
pivot_root() may or may not change the current root and the current working directory (cwd) of any processes or threads which use the old root directory. The caller ofpivot_root() must ensure that processes with root or cwd at the old root operate correctly in either case. An easy way to ensure this is to change their root and cwd tonew_root before invoking pivot_root().
The paragraph above is intentionally vague because the implementation of pivot_root() may change in the future. At the time of writing, pivot_root() changes root and cwd of each process or thread to new_root if they yiibai to the old root directory. This is necessary in order to prevent kernel threads from keeping the old root directory busy with their root and cwd, even if they never access the file system in any way. In the future, there may be a mechanism for kernel threads to explicitly relinquish any access to the file system, such that this fairly intrusive mechanism can be removed frompivot_root().
Note that this also applies to the current process: pivot_root() may or may not affect its cwd. It is therefore recommended to call chdir("/") immediately after pivot_root().
The following restrictions apply to new_root and put_old:
标签 | 描述 |
- | They must be directories. |
- | new_root and put_old must not be on the same file system as the current root. |
- | put_old must be underneath new_root, i.e. adding a non-zero number of /.. to the string yiibaied to by put_old must yield the same directory as new_root. |
- | No other file system may be mounted on put_old. |
See also pivot_root(8) for additional usage examples.
If the current root is not a mount yiibai (e.g. after chroot(2) or pivot_root(), see also below), not the old root directory, but the mount yiibai of that file system is mounted onput_old.
new_root does not have to be a mount yiibai. In this case, /proc/mounts will show the mount yiibai of the file system containing new_root as root (/).
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
pivot_root() may return (in errno) any of the errors returned by stat(2). Additionally, it may return:
标签 | 描述 |
EBUSY | new_root or put_old are on the current root file system, or a file system is already mounted on put_old. |
EINVAL | put_old is not underneath new_root. |
ENOTDIR | |
new_root or put_old is not a directory. | |
EPERM | The current process does not have the CAP_SYS_ADMINcapability. |
版本
pivot_root() was introduced in Linux 2.3.41.
遵循于
pivot_root() 是Linux特有的,因此是不可移植的。
BUGS
pivot_root() 不应该去更改root所有其他进程和 cwd 系统。
Some of the more obscure uses of pivot_root() may quickly lead to insanity.
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
另请参阅
poll()函数
内容简介
#include <poll.h> int poll(struct pollfd *fds, nfds_t nfds, int timeout); #define _GNU_SOURCE #include <poll.h> int ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *timeout, const sigset_t *sigmask); |
描述
poll() performs a similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.
The set of file descriptors to be monitored is specified in the fds argument, which is an array of nfds structures of the following form:
s |
The field fd contains a file descriptor for an open file.
The field events is an input parameter, a bitmask specifying the events the application is interested in.
The field revents is an output parameter, filled by the kernel with the events that actually occurred. The bits returned in revents can include any of those specified inevents, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.)
If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs.
The timeout argument specifies an upper limit on the time for which poll() will block, in milliseconds. Specifying a negative value in timeout means an infinite timeout.
The bits that may be set/returned in events and revents are defined in <poll.h>:
标签 | 描述 |
POLLIN | There is data to read. |
POLLPRI | |
There is urgent data to read (e.g., out-of-band data on TCP socket; pseudo-terminal master in packet mode has seen state change in slave). | |
POLLOUT | |
Writing now will not block. | |
POLLRDHUP (since Linux 2.6.17) | |
Stream socket peer closed connection, or shut down writing half of connection. The _GNU_SOURCE feature test macro must be defined in order to obtain this definition. | |
POLLERR | |
Error condition (output only). | |
POLLHUP | |
Hang up (output only). | |
POLLNVAL | |
Invalid request: fd not open (output only). |
When compiling with _XOPEN_SOURCE defined, one also has the following, which convey no further information beyond the bits listed above:
标签 | 描述 |
POLLRDNORM | |
Equivalent to POLLIN. | |
POLLRDBAND | |
Priority band data can be read (generally unused on Linux). | |
POLLWRNORM | |
Equivalent to POLLOUT. | |
POLLWRBAND | |
Priority data may be written. |
Linux also knows about, but does not use POLLMSG.
ppoll()
The relationship between poll() and ppoll() is analogous to the relationship betweenselect() and pselect(): like pselect(), ppoll() allows an application to safely wait until either a file descriptor becomes ready or until a signal is caught.
Other than the difference in the timeout argument, the following ppoll() call:
ready = ppoll(&fds, nfds, timeout, &sigmask); |
相当于原子执行以下调用:
sigset_t origmask; sigprocmask(SIG_SETMASK, &sigmask, &origmask); ready = ppoll(&fds, nfds, timeout); sigprocmask(SIG_SETMASK, &origmask, NULL); |
See the description of pselect(2) for an explanation of why ppoll() is necessary.
The timeout argument specifies an upper limit on the amount of time that ppoll() will block. This argument is a pointer to a structure of the following form:
struct timespec { |
If timeout is specified as NULL, then ppoll() can block indefinitely.
返回值
On success, a positive number is returned; this is the number of structures which have non-zero revents fields (in other words, those descriptors with events or errors reported). A value of 0 indicates that the call timed out and no file descriptors were ready. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | An invalid file descriptor was given in one of the sets. |
EFAULT | The array given as argument was not contained in the calling program’s address space. |
EINTR | A signal occurred before any requested event. |
EINVAL | The nfds value exceeds the RLIMIT_NOFILE value. |
ENOMEM | There was no space to allocate file descriptor tables. |
LINUX 注意
The Linux ppoll() system call modifies its timeout argument. However, the glibc wrapper function hides this behaviour by using a local variable for the timeout argument that is passed to the system call. Thus, the glibc ppoll() function does not modify its timeoutargument.
BUGS
See the discussion of spurious readiness notifications under the BUGS section ofselect(2).
遵循于
poll() conforms to POSIX.1-2001. ppoll() is Linux specific.
版本
The poll() system call was introduced in Linux 2.1.23. The poll() library call was introduced in libc 5.4.28 (and provides emulation using select() if your kernel does not have a poll() system call).
The ppoll() system call was added to Linux in kernel 2.6.16. The ppoll() library call was added in glibc 2.4.
注意
Some implementations define the non-standard constant INFTIM with the value -1 for use as a timeout. This constant is not provided in glibc.
另请参阅
posix_fadvise()函数
posix_fadvise - 预先声明对文件数据的访问模式
内容简介
#define _XOPEN_SOURCE 600 |
描述
程序可以使用 posix_fadvise() 公布的意图来访问文件中的数据在未来的特定图案,从而允许内核来执行适当的优化。
The advice applies to a (not necessarily existent) region starting at offset and extending for len bytes (or until the end of the file if len is 0) within the file referred to by fd. The advice is not binding; it merely constitutes an expectation on behalf of the application.
Permissible values for advice include:
标签 | 描述 |
POSIX_FADV_NORMAL | |
表示该应用程序没有建议提供有关其指定的数据访问模式。如果没有意见,给出了一个打开的文件,这是默认的假设。 | |
POSIX_FADV_SEQUENTIAL | |
该应用程序需要访问指定的数据顺序(与以前高的人读低偏移)。 | |
POSIX_FADV_RANDOM | |
将指定的数据将会以随机顺序进行访问。 | |
POSIX_FADV_NOREUSE | |
将指定的数据将只访问一次。 | |
POSIX_FADV_WILLNEED | |
将指定的数据将在不久的将来访问。 | |
POSIX_FADV_DONTNEED | |
指定的数据不会在短期内被访问。 |
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | The fd argument was not a valid file descriptor. |
EINVAL | An invalid value was specified for advice. |
ESPIPE | The specified file descriptor refers to a pipe or FIFO. (Linux actually returns EINVAL in this case.) |
注意
posix_fadvise() appeared in kernel 2.5.60.
Under Linux, POSIX_FADV_NORMAL sets the readahead window to the default size for the backing device; POSIX_FADV_SEQUENTIAL doubles this size, andPOSIX_FADV_RANDOM disables file readahead entirely. These changes affect the entire file, not just the specified region (but other open file handles to the same file are unaffected).
POSIX_FADV_WILLNEED and POSIX_FADV_NOREUSE both initiate a non-blocking read of the specified region into the page cache. The amount of data read may be decreased by the kernel depending on VM load. (A few megabytes will usually be fully satisfied, and more is rarely useful.)
POSIX_FADV_DONTNEED attempts to free cached pages associated with the specified region. This is useful, for example, while streaming large files. A program may periodically request the kernel to free cached data that has already been used, so that more useful cached pages are not discarded instead.
Pages that have not yet been written out will be unaffected, so if the application wishes to guarantee that pages will be released, it should call fsync() or fdatasync() first.
遵循于
POSIX.1-2001. Note that the type of the len parameter was changed from size_t to off_tin POSIX.1-2003 TC5.
BUGS
In kernels before 2.6.6, if len was specified as 0, then this was interpreted literally as "zero bytes", rather than as meaning "all bytes through to the end of the file".
另请参阅
ppoll()函数
内容简介
#include <poll.h> int poll(struct pollfd *fds, nfds_t nfds, int timeout); #define _GNU_SOURCE #include <poll.h> int ppoll(struct pollfd *fds, nfds_t nfds, const struct timespec *timeout, const sigset_t *sigmask); |
描述
poll() performs a similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.
The set of file descriptors to be monitored is specified in the fds argument, which is an array of nfds structures of the following form:
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events */
short revents; /* returned events */
};
The field fd contains a file descriptor for an open file.
The field events is an input parameter, a bitmask specifying the events the application is interested in.
The field revents is an output parameter, filled by the kernel with the events that actually occurred. The bits returned in revents can include any of those specified inevents, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.)
If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs.
The timeout argument specifies an upper limit on the time for which poll() will block, in milliseconds. Specifying a negative value in timeout means an infinite timeout.
The bits that may be set/returned in events and revents are defined in <poll.h>:
标签 | 描述 |
POLLIN | There is data to read. |
POLLPRI | |
There is urgent data to read (e.g., out-of-band data on TCP socket; pseudo-terminal master in packet mode has seen state change in slave). | |
POLLOUT | |
Writing now will not block. | |
POLLRDHUP (since Linux 2.6.17) | |
Stream socket peer closed connection, or shut down writing half of connection. The _GNU_SOURCE feature test macro must be defined in order to obtain this definition. | |
POLLERR | |
Error condition (output only). | |
POLLHUP | |
Hang up (output only). | |
POLLNVAL | |
Invalid request: fd not open (output only). |
When compiling with _XOPEN_SOURCE defined, one also has the following, which convey no further information beyond the bits listed above:
标签 | 描述 |
POLLRDNORM | |
Equivalent to POLLIN. | |
POLLRDBAND | |
Priority band data can be read (generally unused on Linux). | |
POLLWRNORM | |
Equivalent to POLLOUT. | |
POLLWRBAND | |
Priority data may be written. |
Linux also knows about, but does not use POLLMSG.
ppoll()
The relationship between poll() and ppoll() is analogous to the relationship betweenselect() and pselect(): like pselect(), ppoll() allows an application to safely wait until either a file descriptor becomes ready or until a signal is caught.
Other than the difference in the timeout argument, the following ppoll() call:
ready = ppoll(&fds, nfds, timeout, &sigmask); |
is equivalent to atomically executing the following calls:
sigset_t origmask; sigprocmask(SIG_SETMASK, &sigmask, &origmask); ready = ppoll(&fds, nfds, timeout); sigprocmask(SIG_SETMASK, &origmask, NULL); |
See the description of pselect(2) for an explanation of why ppoll() is necessary.
The timeout argument specifies an upper limit on the amount of time that ppoll() will block. This argument is a pointer to a structure of the following form:
struct timespec { long tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; |
If timeout is specified as NULL, then ppoll() can block indefinitely.
返回值
On success, a positive number is returned; this is the number of structures which have non-zero revents fields (in other words, those descriptors with events or errors reported). A value of 0 indicates that the call timed out and no file descriptors were ready. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | An invalid file descriptor was given in one of the sets. |
EFAULT | The array given as argument was not contained in the calling program’s address space. |
EINTR | A signal occurred before any requested event. |
EINVAL | The nfds value exceeds the RLIMIT_NOFILE value. |
ENOMEM | There was no space to allocate file descriptor tables. |
LINUX 注意
The Linux ppoll() system call modifies its timeout argument. However, the glibc wrapper function hides this behaviour by using a local variable for the timeout argument that is passed to the system call. Thus, the glibc ppoll() function does not modify its timeoutargument.
BUGS
See the discussion of spurious readiness notifications under the BUGS section ofselect(2).
遵循于
poll() conforms to POSIX.1-2001. ppoll() is Linux specific.
版本
The poll() system call was introduced in Linux 2.1.23. The poll() library call was introduced in libc 5.4.28 (and provides emulation using select() if your kernel does not have a poll() system call).
The ppoll() system call was added to Linux in kernel 2.6.16. The ppoll() library call was added in glibc 2.4.
注意
Some implementations define the non-standard constant INFTIM with the value -1 for use as a timeout. This constant is not provided in glibc.
另请参阅
prctl()函数
内容简介
#include <sys/prctl.h>
int prctl(int option, unsigned long arg2, unsigned long arg3 , unsigned long arg4, unsigned long arg5);
描述
prctl() is called with a first argument describing what to do (with values defined in <linux/prctl.h>), and further parameters with a significance depending on the first one. The first argument can be:
标签 | 描述 |
PR_SET_PDEATHSIG | |
(since Linux 2.1.57) Set the parent process death signal of the current process to arg2 (either a signal value in the range 1..maxsig, or 0 to clear). This is the signal that the current process will get when its parent dies. This value is cleared upon a fork(). | |
PR_GET_PDEATHSIG | |
(since Linux 2.3.15) Read the current value of the parent process death signal into the (int *) arg2. | |
PR_SET_DUMPABLE | |
(Since Linux 2.4) Set the state of the flag determining whether core dumps are produced for this process upon delivery of a signal whose default behaviour is to produce a core dump. (Normally this flag is set for a process by default, but it is cleared when a set-user-ID or set-group-ID program is executed and also by various system calls that manipulate process UIDs and GIDs). In kernels up to and including 2.6.12,arg2 must be either 0 (process is not dumpable) or 1 (process is dumpable). Since kernel 2.6.13, the value 2 is also permitted; this causes any binary which normally would not be dumped to be dumped readable by root only. (See also the description of/proc/sys/fs/suid_dumpable in proc(5).) | |
PR_GET_DUMPABLE | |
(Since Linux 2.4) Return (as the function result) the current state of the calling process’s dumpable flag. | |
PR_SET_KEEPCAPS | |
Set the state of the process’s "keep capabilities" flag, which determines whether the process’s effective and permitted capability sets are cleared when a change is made to the process’s user IDs such that the process’s real UID, effective UID, and saved set-user-ID all become non-zero when at least one of them previously had the value 0. (By default, these credential sets are cleared). arg2 must be either 0 (capabilities are cleared) or 1 (capabilities are kept). | |
PR_GET_KEEPCAPS | |
Return (as the function result) the current state of the calling process’s "keep capabilities" flag. |
返回值
PR_GET_DUMPABLE and PR_GET_KEEPCAPS return 0 or 1 on success. All other optionvalues return 0 on success. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | The value of option is not recognized, or it isPR_SET_PDEATHSIG and arg2 is not zero or a signal number. |
遵循于
This call is Linux-specific. IRIX has a prctl() system call (also introduced in Linux 2.1.44 as irix_prctl on the MIPS architecture), with prototype
ptrdiff_t prctl(int option, int arg2, int arg3);
and options to get the maximum number of processes per user, get the maximum number of processors the calling process can use, find out whether a specified process is currently blocked, get or set the maximum stack size, etc.
可用性
The prctl() system call was introduced in Linux 2.1.57.
另请参阅
pread()函数
pread, pwrite - 读取或写入到一个文件描述符在给定的偏移量
内容简介
#define _XOPEN_SOURCE 500 |
描述
pread() reads up to count bytes from file descriptor fd at offset offset (from the start of the file) into the buffer starting at buf. The file offset is not changed.
pwrite() writes up to count bytes from the buffer starting at buf to the file descriptor fdat offset offset. The file offset is not changed.
The file referenced by fd must be capable of seeking.
返回值
On success, the number of bytes read or written is returned (zero indicates that nothing was written, in the case of pwrite(), or end of file, in the case of pread), or -1 on error, in which case errno is set to indicate the error.
错误
pread() can fail and set errno to any error specified for read(2) or lseek(2). pwrite() can fail and set errno to any error specified for write(2) or lseek(2).
遵循于
POSIX.1-2001.
HISTORY
The pread() and pwrite() system calls were added to Linux in version 2.1.60; the entries in the i386 system call table were added in 2.1.69. The libc support (including emulation on older kernels without the system calls) was added in glibc 2.1.
另请参阅
prof()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
pselect()函数
select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - 同步I / O复用
内容简介
/* According to POSIX.1-2001 */
#include <sys/select.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
void FD_CLR(int fd, fd_set *set);
int FD_ISSET(int fd, fd_set *set);
void FD_SET(int fd, fd_set *set);
void FD_ZERO(fd_set *set);
#define _XOPEN_SOURCE 600
#include
int pselect(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, const struct timespec *timeout,
const sigset_t *sigmask);
描述
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
The operation of select() and pselect() is identical, with three differences:
标签 | 描述 |
(i) | select() uses a timeout that is a struct timeval (with seconds and microseconds), while pselect() uses a struct timespec (with seconds and nanoseconds). |
(ii) | select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument. |
(iii) | select() has no sigmask argument, and behaves as pselect() called with NULL sigmask. |
Three independent sets of file descriptors are watched. Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file), those inwritefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions. On exit, the sets are modified in place to indicate which file descriptors actually changed status. Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events.
Four macros are provided to manipulate the sets. FD_ZERO() clears a set. FD_SET() and FD_CLR() respectively add and remove a given file descriptor from a set.FD_ISSET() tests to see if a file descriptor is part of the set; this is useful after select() returns.
nfds is the highest-numbered file descriptor in any of the three sets, plus 1.
timeout is an upper bound on the amount of time elapsed before select() returns. It may be zero, causing select() to return immediately. (This is useful for polling.) Iftimeout is NULL (no timeout), select() can block indefinitely.
sigmask is a pointer to a signal mask (see sigprocmask(2)); if it is not NULL, thenpselect() first replaces the current signal mask by the one pointed to by sigmask, then does the ‘select’ function, and then restores the original signal mask.
Other than the difference in the precision of the timeout argument, the followingpselect() call:
ready = pselect(nfds, &readfds, &writefds, &exceptfds, timeout, &sigmask); |
is equivalent to atomically executing the following calls:
sigset_t origmask; sigprocmask(SIG_SETMASK, &sigmask, &origmask); ready = select(nfds, &readfds, &writefds, &exceptfds, timeout); sigprocmask(SIG_SETMASK, &origmask, NULL); |
The reason that pselect() is needed is that if one wants to wait for either a signal or for a file descriptor to become ready, then an atomic test is needed to prevent race conditions. (Suppose the signal handler sets a global flag and returns. Then a test of this global flag followed by a call of select() could hang indefinitely if the signal arrived just after the test but just before the call. By contrast, pselect() allows one to first block signals, handle the signals that have come in, then call pselect() with the desiredsigmask, avoiding the race.)
The timeout
The time structures involved are defined in <sys/time.h> and look like
struct timeval { |
and
struct timespec { |
(However, see below on the POSIX.1-2001 versions.)
Some code calls select() with all three sets empty, n zero, and a non-NULL timeout as a fairly portable way to sleep with subsecond precision.
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behaviour.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multipleselect()s in a loop without reinitializing it. Consider timeout to be undefined afterselect() returns.
返回值
On success, select() and pselect() return the number of file descriptors contained in the three returned descriptor sets (that is, the total number of bits that are set in readfds,writefds, exceptfds) which may be zero if the timeout expires before anything interesting happens. On error, -1 is returned, and errno is set appropriately; the sets and timeoutbecome undefined, so do not rely on their contents after an error.
错误
标签 | 描述 |
EBADF | An invalid file descriptor was given in one of the sets. (Perhaps a file descriptor that was already closed, or one on which an error has occurred.) |
EINTR | A signal was caught. |
EINVAL | nfds is negative or the value contained within timeout is invalid. |
ENOMEM | unable to allocate memory for internal tables. |
EXAMPLE
#include <stdio.h> int main(void) { fd_set rfds; struct timeval tv; int retval; /* Watch stdin (fd 0) to see when it has input. */ FD_ZERO(&rfds); FD_SET(0, &rfds); /* Wait up to five seconds. */ tv.tv_sec = 5; tv.tv_usec = 0; retval = select(1, &rfds, NULL, NULL, &tv); /* Don’t rely on the value of tv now! */ if (retval == -1) perror("select()"); else if (retval) printf("Data is available now.\n"); /* FD_ISSET(0, &rfds) will be true. */ else printf("No data within five seconds.\n"); return 0; } |
遵循于
select() conforms to POSIX.1-2001 and 4.4BSD (select() first appeared in 4.2BSD). Generally portable to/from non-BSD systems supporting clones of the BSD socket layer (including System V variants). However, note that the System V variant typically sets the timeout variable before exit, but the BSD variant does not.
pselect() is defined in POSIX.1g, and in POSIX.1-2001.
注意
An fd_set is a fixed size buffer. Executing FD_CLR() or FD_SET() with a value of fd that is negative or is equal to or larger than FD_SETSIZE will result in undefined behavior. Moreover, POSIX requires fd to be a valid file descriptor.
Concerning the types involved, the classical situation is that the two fields of a timevalstructure are longs (as shown above), and the structure is defined in <sys/time.h>. The POSIX.1-2001 situation is
struct timeval { |
where the structure is defined in <sys/select.h> and the data types time_t andsuseconds_t are defined in <sys/types.h>.
Concerning prototypes, the classical situation is that one should include <time.h> forselect(). The POSIX.1-2001 situation is that one should include <sys/select.h> forselect() and pselect(). Libc4 and libc5 do not have a <sys/select.h> header; under glibc 2.0 and later this header exists. Under glibc 2.0 it unconditionally gives the wrong prototype for pselect(), under glibc 2.1-2.2.1 it gives pselect() when _GNU_SOURCE is defined, under glibc 2.2.2-2.2.4 it gives it when _XOPEN_SOURCE is defined and has a value of 600 or larger. No doubt, since POSIX.1-2001, it should give the prototype by default.
版本
pselect() was added to Linux in kernel 2.6.16. Prior to this, pselect() was emulated in glibc (but see BUGS).
LINUX 注意
The Linux pselect() system call modifies its timeout argument. However, the glibc wrapper function hides this behaviour by using a local variable for the timeout argument that is passed to the system call. Thus, the glibc pselect() function does not modify its timeout argument; this is the behaviour required by POSIX.1-2001.
BUGS
Glibc 2.0 provided a version of pselect() that did not take a sigmask argument.
Since version 2.1, glibc has provided an emulation of pselect() that is implemented using sigprocmask(2) and select(). This implementation remains vulnerable to the very race condition that pselect() was designed to prevent. On systems that lack pselect() reliable (and more portable) signal trapping can be achieved using the self-pipe trick (where a signal handler writes a byte to a pipe whose other end is monitored byselect() in the main program.)
Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.
另请参阅
select_tut(2).
For vaguely related stuff, see accept(2), connect(2), poll(2), read(2), recv(2), send(2),sigprocmask(2), write(2), epoll(7), feature_test_macros(7)
ptrace()函数
内容简介
#include <sys/ptrace.h> long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data); |
描述
ptrace() 系统调用提供了其中一个父进程可以观察和控制另一个进程的执行,检查和改变其核心映像和寄存器的手段。它主要用于实现断点调试和系统调用跟踪。
The parent can initiate a trace by calling fork(2) and having the resulting child do a PTRACE_TRACEME, followed (typically) by an exec(3). Alternatively, the parent may commence trace of an existing process using PTRACE_ATTACH.
While being traced, the child will stop each time a signal is delivered, even if the signal is being ignored. (The exception is SIGKILL, which has its usual effect.) The parent will be notified at its next wait(2) and may inspect and modify the child process while it is stopped. The parent then causes the child to continue, optionally ignoring the delivered signal (or even delivering a different signal instead).
When the parent is finished tracing, it can terminate the child with PTRACE_KILL or cause it to continue executing in a normal, untraced mode via PTRACE_DETACH.
The value of request determines the action to be performed:
标签 | 描述 | ||||||||||||||||||||||||||||||
PTRACE_TRACEME | |||||||||||||||||||||||||||||||
Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. A process probably shouldn’t make this request if its parent isn’t expecting to trace it. (pid, addr, and data are ignored.) | |||||||||||||||||||||||||||||||
The above request is used only by the child process; the rest are used only by the parent. In the following requests, pid specifies the child process to be acted on. For requests other than PTRACE_KILL, the child process must be stopped. | |||||||||||||||||||||||||||||||
PTRACE_PEEKTEXT, PTRACE_PEEKDATA | |||||||||||||||||||||||||||||||
Reads a word at the location addr in the child’s memory, returning the word as the result of the ptrace() call. Linux does not have separate text and data address spaces, so the two requests are currently equivalent. (The argument data is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_PEEKUSR | |||||||||||||||||||||||||||||||
Reads a word at offset addr in the child’s USER area, which holds the registers and other information about the process (see <linux/user.h> and <sys/user.h>). The word is returned as the result of the ptrace() call. Typically the offset must be word-aligned, though this might vary by architecture. See NOTES. (data is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_POKETEXT, PTRACE_POKEDATA | |||||||||||||||||||||||||||||||
Copies the word data to location addr in the child’s memory. As above, the two requests are currently equivalent. | |||||||||||||||||||||||||||||||
PTRACE_POKEUSR | |||||||||||||||||||||||||||||||
Copies the word data to offset addr in the child’s USER area. As above, the offset must typically be word-aligned. In order to maintain the integrity of the kernel, some modifications to theUSER area are disallowed. | |||||||||||||||||||||||||||||||
PTRACE_GETREGS, PTRACE_GETFPREGS | |||||||||||||||||||||||||||||||
Copies the child’s general purpose or floating-point registers, respectively, to location data in the parent. See <linux/user.h> for information on the format of this data. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_GETSIGINFO (since Linux 2.3.99-pre6) | |||||||||||||||||||||||||||||||
Retrieve information about the signal that caused the stop. Copies a siginfo_t structure (see sigaction(2)) from the child to location data in the parent. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_SETREGS, PTRACE_SETFPREGS | |||||||||||||||||||||||||||||||
Copies the child’s general purpose or floating-point registers, respectively, from location data in the parent. As for PTRACE_POKEUSER, some general purpose register modifications may be disallowed. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_SETSIGINFO (since Linux 2.3.99-pre6) | |||||||||||||||||||||||||||||||
Set signal information. Copies a siginfo_t structure from locationdata in the parent to the child. This will only affect signals that would normally be delivered to the child and were caught by the tracer. It may be difficult to tell these normal signals from synthetic signals generated by ptrace() itself. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_SETOPTIONS (since Linux 2.4.6; see BUGS for caveats) | |||||||||||||||||||||||||||||||
Sets ptrace options from data in the parent. (addr is ignored.)data is interpreted as a bitmask of options, which are specified by the following flags:
| |||||||||||||||||||||||||||||||
PTRACE_GETEVENTMSG (since Linux 2.5.46) | |||||||||||||||||||||||||||||||
Retrieve a message (as an unsigned long) about the ptrace event that just happened, placing it in the location data in the parent. For PTRACE_EVENT_EXIT this is the child’s exit status. For PTRACE_EVENT_FORK, PTRACE_EVENT_VFORK and PTRACE_EVENT_CLONE this is the PID of the new process. (addris ignored.) | |||||||||||||||||||||||||||||||
PTRACE_CONT | |||||||||||||||||||||||||||||||
Restarts the stopped child process. If data is non-zero and not SIGSTOP, it is interpreted as a signal to be delivered to the child; otherwise, no signal is delivered. Thus, for example, the parent can control whether a signal sent to the child is delivered or not. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_SYSCALL, PTRACE_SINGLESTEP | |||||||||||||||||||||||||||||||
Restarts the stopped child as for PTRACE_CONT, but arranges for the child to be stopped at the next entry to or exit from a system call, or after execution of a single instruction, respectively. (The child will also, as usual, be stopped upon receipt of a signal.) From the parent’s perspective, the child will appear to have been stopped by receipt of a SIGTRAP. So, for PTRACE_SYSCALL, for example, the idea is to inspect the arguments to the system call at the first stop, then do another PTRACE_SYSCALL and inspect the return value of the system call at the second stop. (addr is ignored.) | |||||||||||||||||||||||||||||||
PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP (since Linux 2.6.14) | |||||||||||||||||||||||||||||||
For PTRACE_SYSEMU, continue and stop on entry to the next syscall, which will not be executed. For PTRACE_SYSEMU_SINGLESTEP, do the same but also singlestep if not a syscall. This call is used by programs like User Mode Linux that want to emulate all the the child’s syscalls. (addr anddata are ignored; not supported on all architectures.) | |||||||||||||||||||||||||||||||
PTRACE_KILL | |||||||||||||||||||||||||||||||
Sends the child a SIGKILL to terminate it. (addr and data are ignored.) | |||||||||||||||||||||||||||||||
PTRACE_ATTACH | |||||||||||||||||||||||||||||||
Attaches to the process specified in pid, making it a traced "child" of the current process; the behavior of the child is as if it had done a PTRACE_TRACEME. The current process actually becomes the parent of the child process for most purposes (e.g., it will receive notification of child events and appears inps(1) output as the child’s parent), but a getppid(2) by the child will still return the PID of the original parent. The child is sent a SIGSTOP, but will not necessarily have stopped by the completion of this call; use wait() to wait for the child to stop. (addr and data are ignored.) | |||||||||||||||||||||||||||||||
PTRACE_DETACH | |||||||||||||||||||||||||||||||
Restarts the stopped child as for PTRACE_CONT, but first detaches from the process, undoing the reparenting effect of PTRACE_ATTACH, and the effects of PTRACE_TRACEME. Although perhaps not intended, under Linux a traced child can be detached in this way regardless of which method was used to initiate tracing. (addr is ignored.) |
注意
Although arguments to ptrace() are interpreted according to the prototype given, GNU libc currently declares ptrace() as a variadic function with only the request argument fixed. This means that unneeded trailing arguments may be omitted, though doing so makes use of undocumented gcc(1) behavior.
init(8), the process with PID 1, may not be traced.
The layout of the contents of memory and the USER area are quite OS- and architecture-specific. The offset supplied and the data returned might not entirely match with the definition of struct user
The size of a "word" is determined by the OS variant (e.g., for 32-bit Linux it’s 32 bits, etc.).
Tracing causes a few subtle differences in the semantics of traced processes. For example, if a process is attached to with PTRACE_ATTACH, its original parent can no longer receive notification via wait() when it stops, and there is no way for the new parent to effectively simulate this notification.
This page documents the way the ptrace() call works currently in Linux. Its behavior differs noticeably on other flavors of Unix. In any case, use of ptrace() is highly OS- and architecture-specific.
The SunOS man page describes ptrace() as "unique and arcane", which it is. The proc-based debugging interface present in Solaris 2 implements a superset of ptrace() functionality in a more powerful and uniform way.
返回值
On success, PTRACE_PEEK* requests return the requested data, while other requests return zero. On error, all requests return -1, and errno is set appropriately. Since the value returned by a successful PTRACE_PEEK* request may be -1, the caller must checkerrno after such requests to determine whether or not an error occurred.
BUGS
On hosts with 2.6 kernel headers, PTRACE_SETOPTIONS is declared with a different value than the one for 2.4. This leads to applications compiled with such headers failing when run on 2.4 kernels. This can be worked around by redefining PTRACE_SETOPTIONS to PTRACE_OLDSETOPTIONS, if that is defined.
错误
标签 | 描述 |
EBUSY | (i386 only) There was an error with allocating or freeing a debug register. |
EFAULT | There was an attempt to read from or write to an invalid area in the parent’s or child’s memory, probably because the area wasn’t mapped or accessible. Unfortunately, under Linux, different variations of this fault will return EIO or EFAULT more or less arbitrarily. |
EINVAL | An attempt was made to set an invalid option. |
EIO | request is invalid, or an attempt was made to read from or write to an invalid area in the parent’s or child’s memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request. |
EPERM | The specified process cannot be traced. This could be because the parent has insufficient privileges (the required capability isCAP_SYS_PTRACE); non-root processes cannot trace processes that they cannot send signals to or those running set-user-ID/set-group-ID programs, for obvious reasons. Alternatively, the process may already be being traced, or beinit (PID 1). |
ESRCH | The specified process does not exist, or is not currently being traced by the caller, or is not stopped (for requests that require that). |
遵循于
SVr4, 4.3BSD
另请参阅
putmsg()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用.
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
putpmsg()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用。
内容简介
未实现系统调用。
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
pwrite()函数
pread, pwrite - 读取或写入到一个文件描述符在给定的偏移量
内容简介
#define _XOPEN_SOURCE 500 #include <unistd.h> ssize_t pread(int fd, void *buf, size_t count, off_t offset); ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset); |
描述
pread() reads up to count bytes from file descriptor fd at offset offset (from the start of the file) into the buffer starting at buf. The file offset is not changed.
pwrite() writes up to count bytes from the buffer starting at buf to the file descriptor fdat offset offset. The file offset is not changed.
The file referenced by fd must be capable of seeking.
返回值
On success, the number of bytes read or written is returned (zero indicates that nothing was written, in the case of pwrite(), or end of file, in the case of pread), or -1 on error, in which case errno is set to indicate the error.
错误
pread() can fail and set errno to any error specified for read(2) or lseek(2). pwrite() can fail and set errno to any error specified for write(2) or lseek(2).
遵循于
POSIX.1-2001.
HISTORY
The pread() and pwrite() system calls were added to Linux in version 2.1.60; the entries in the i386 system call table were added in 2.1.69. The libc support (including emulation on older kernels without the system calls) was added in glibc 2.1.
另请参阅
query_module()函数
内容简介
#include <linux/module.h> int query_module(const char *name, int which, void *buf, size_t bufsize, size_t *ret); |
描述
query_module() requests information from the kernel about loadable modules. The returned information is placed in the buffer pointed to by buf. The caller must specify the size of buf in bufsize. The precise nature and format of the returned information depend on the operation specified by which. Some operations require name to identify a currently loaded module, some allow name to be NULL, indicating the kernel proper.
The following values can be specified for which:
标签 | 描述 | |
0 | Always returns success. Used to probe for availability of the system call. | |
QM_MODULES | ||
Returns the names of all loaded modules. The returned buffer consists of a sequence of null-terminated strings; ret is set to the number of modules. | ||
QM_DEPS | ||
Returns the names of all modules used by the indicated module. The returned buffer consists of a sequence of null-terminated strings; ret is set to the number of modules. | ||
QM_REFS | ||
Returns the names of all modules using the indicated module. This is the inverse of QM_DEPS. The returned buffer consists of a sequence of null-terminated strings; ret is set to the number of modules. | ||
QM_SYMBOLS | ||
Returns the symbols and values exported by the kernel or the indicated module. The returned buffer is an array of structures of the following form
followed by null-terminated strings. The value of name is the character offset of the string relative to the start of buf; ret is set to the number of symbols. | ||
QM_INFO | ||
Returns miscellaneous information about the indicated module. The output buffer format is:
where address is the kernel address at which the module resides, size is the size of the module in bytes, and flags is a mask of MOD_RUNNING, MOD_AUTOCLEAN, etc. that indicates the current status of the module (see the kernel source file include/linux/module.h). ret is set to the size of themodule_info structure. |
返回值
On success, zero is returned. On error, -1 is returned and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | At least one of name, buf, or ret was outside the program’s accessible address space. |
EINVAL | Invalid which; or name is NULL (indicating "the kernel"), but this is not permitted with the specified value of which. |
ENOENT | No module by that name exists. |
ENOSPC | The buffer size provided was too small. ret is set to the minimum size needed. |
遵循于
query_module() is Linux specific.
注意
This system call is only present on Linux up until kernel 2.4; it was removed in Linux 2.6. Some of the information that was available via query_module() can be obtained from/proc/modules, /proc/kallsyms, and /sys/modules.
另请参阅
quotactl()函数
内容简介
#include <sys/quota.h> long quotactl(int cmd, char *special, qid_t id, caddr_t addr) |
描述
The quotactl() call manipulates disk quotas. cmd indicates a command to be applied toUID id or GID id. To set the type of quota use the QCMD(cmd, type) macro. special is a pointer to a null-terminated string containing the path name of the block special device for the filesystem being manipulated. addr is the address of an optional, command specific, data structure which is copied in or out of the system. The interpretation of addris given with each command below.
标签 | 描述 |
Q_QUOTAON | Turn on quotas for a filesystem. id is the identification number of the quota format to be used. Format numbers are defined in the header file of appropriate format. Currently there are two supported quota formats whose numbers are defined by constants QFMT_VFS_OLD (original quota format) andQFMT_VFS_V0 (new VFS v0 quota format). addr points to the path name of file containing the quotas for the filesystem. The quota file must exist; it is normally created with thequotacheck(8) program. This call is restricted to the super-user. |
Q_QUOTAOFF | Turn off quotas for a filesystem. addr and id are ignored. This call is restricted to the super-user. |
Q_GETQUOTA | Get disk quota limits and current usage for user or group id. addris a pointer to an if_dqblk structure (defined in<sys/quota.h>). The field dqb_valid defines the entries in the structure which are set correctly. On Q_GETQUOTA call all entries are valid. Only the super-user may get the quotas of a user other than himself. |
Q_SETQUOTA | Set current quota information for user or group id. addr is a pointer to an if_dqblk structure (defined in <sys/quota.h>). The field dqb_valid defines which entries in the quota structure are valid and should be set. The constants for dqb_valid field are defined in the <sys/quota.h> header file. This call obsoletes calls Q_SETQLIM and Q_SETUSE in the previous quota interfaces. This call is restricted to the super-user. |
Q_GETINFO | Get information (like grace times) about quotafile. addr should be a pointer to an if_dqinfo structure (defined in<sys/quota.h>). The dqi_valid field in the structure defines entries in it which are valid. On Q_GETINFO call all entries are valid. Parameter id is ignored. |
Q_SETINFO | Set information about quotafile. addr should be a pointer toif_dqinfo structure (defined in <sys/quota.h>). The field dqi_validdefines which entries in the quota info structure are valid and should be set. The constants for dqi_valid field are defined in the<sys/quota.h> header file. This call obsoletes callsQ_SETGRACE and Q_SETFLAGS in the previous quota interfaces. Parameter id is ignored. This operation is restricted to super-user. |
Q_GETFMT | Get quota format used on the specified filesystem. addr should be a pointer to a memory (4 bytes) where the format number will be stored. |
Q_SYNC | Update the on-disk copy of quota usages for a filesystem. Ifspecial is null then all filesystems with active quotas are sync’ed.addr and id are ignored. |
Q_GETSTATS | Get statistics and other generic information about quota subsystem. addr should be a pointer to dqstats structure (defined in <sys/quota.h>) in which data should be stored.special and id are ignored. |
For XFS filesystems making use of the XFS Quota Manager (XQM), the above commands are bypassed and the following commands are used: Q_XQUOTAON | |
Turn on quotas for an XFS filesystem. XFS provides the ability to turn on/off quota limit enforcement with quota accounting. Therefore, XFS expects the addr to be a pointer to an unsigned int that contains either the flags XFS_QUOTA_UDQ_ACCT and/or XFS_QUOTA_UDQ_ENFD (for user quota), or XFS_QUOTA_GDQ_ACCT and/or XFS_QUOTA_GDQ_ENFD (for group quota), as defined in <xfs/xqm.h>. This call is restricted to the superuser. | |
Q_XQUOTAOFF | Turn off quotas for an XFS filesystem. As in Q_QUOTAON, XFS filesystems expect a pointer to an unsigned int that specifies whether quota accounting and/or limit enforcement need to be turned off. This call is restricted to the superuser. |
Q_XGETQUOTA | Get disk quota limits and current usage for user id. addr is a pointer to a fs_disk_quota structure (defined in <xfs/xqm.h>). Only the superuser may get the quotas of a user other than himself. |
Q_XSETQLIM | Set disk quota limits for user id. addr is a pointer to afs_disk_quota structure (defined in <xfs/xqm.h>). This call is restricted to the superuser. |
Q_XGETQSTAT | Returns a fs_quota_stat structure containing XFS filesystem specific quota information. This is useful in finding out how much space is spent to store quota information, and also to get quotaon/off status of a given local XFS filesystem. |
Q_XQUOTARM | Free the disk space taken by disk quotas. Quotas must have already been turned off. |
There is no command equivalent to Q_SYNC for XFS since sync(1) writes quota information to disk (in addition to the other filesystem metadata it writes out).
RETURN VALUES
quotactl() returns:
标签 | 描述 |
0 | on success. |
-1 | on failure and sets errno to indicate the error. |
错误
标签 | 描述 |
EFAULT | addr or special are invalid. |
ENOSYS | The kernel has not been compiled with the QUOTA option. |
EINVAL | |
cmd or type is invalid. | |
ENOENT | The file specified by special or addr does not exist. |
ENOTBLK | special is not a block device. |
EPERM | The call is privileged and the caller was not the super-user. |
ESRCH | No disc quota is found for the indicated user. |
Quotas have not been turned on for this filesystem. | |
If cmd is Q_QUOTAON, quotactl() may set errno to: | |
EACCES | The quota file pointed to by addr exists but is not a regular file. |
The quota file pointed to by addr exists but is not on the filesystem pointed to by special. | |
EINVAL | The quota file is corrupted. |
ESRCH | Specified quota format was not found. |
EBUSY | Q_QUOTAON attempted while another Q_QUOTAON has already taken place. |
另请参阅
readahead()函数
内容简介
#include <fcntl.h> ssize_t readahead(int fd, off64_t *offset, size_t count); |
描述
readahead() populates the page cache with data from a file so that subsequent reads from that file will not block on disk I/O. The fd argument is a file descriptor identifying the file which is to be read. The offset argument specifies the starting point from which data is to be read and count specifies the number of bytes to be read. I/O is performed in whole pages, so that offset is effectively rounded down to a page boundary and bytes are read up to the next page boundary greater than or equal to (offset+count).readahead() does not read beyond the end of the file. readahead() blocks until the specified data has been read. The current file offset of the open file referred to by fd is left unchanged.
返回值
On success, readahead() returns 0; on failure, -1 is returned, with errno set to indicate the cause of the error.
错误
标签 | 描述 |
EBADF | fd is not a valid file descriptor or is not open for reading. |
EINVAL | fd does not refer to a file type to which readahead() can be applied. |
遵循于
readahead() 系统调用是Linux特有的,并且应该避免在便携式应用中的使用。
注意
The readahead() system call appeared in Linux 2.4.13.
另请参阅
readdir()函数
内容简介
#include <linux/types.h> int readdir(unsigned int fd, struct dirent *dirp, unsigned int count); |
描述
This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface, which can change, and which is superseded by getdents(2).
readdir() reads one dirent structure from the directory pointed at by fd into the memory area pointed to by dirp. The parameter count is ignored; at most one dirent structure is read.
The dirent structure is declared as follows:
struct dirent |
d_ino is an inode number. d_off is the distance from the start of the directory to thisdirent. d_reclen is the size of d_name, not counting the null terminator. d_name is a null-terminated filename.
返回值
On success, 1 is returned. On end of directory, 0 is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EBADF | Invalid file descriptor fd. |
EFAULT | Argument points outside the calling process’s address space. |
EINVAL | Result buffer is too small. |
ENOENT | No such directory. |
ENOTDIR | |
File descriptor does not refer to a directory. |
遵循于
This system call is Linux specific.
注意
Glibc does not provide a wrapper for this system call; call it using syscall(2).
另请参阅
read()函数
内容简介
#include <unistd.h> ssize_t read(int fd, void *buf, size_t count); |
描述
read() 尝试读取多达数从文件描述符fd字节到缓冲区起始于 buf.
If count is zero, read() returns zero and has no other results. If count is greater than SSIZE_MAX, the result is unspecified.
返回值
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
错误
标签 | 描述 |
EAGAIN | Non-blocking I/O has been selected using O_NONBLOCK and no data was immediately available for reading. |
EBADF | fd is not a valid file descriptor or is not open for reading. |
EFAULT | buf is outside your accessible address space. |
EINTR | The call was interrupted by a signal before any data was read. |
EINVAL | fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned. |
EIO | I/O error. This will happen for example when the process is in a background process group, tries to read from its controlling tty, and either it is ignoring or blocking SIGTTIN or its process group is orphaned. It may also occur when there is a low-level I/O error while reading from a disk or tape. |
EISDIR | fd refers to a directory. |
Other errors may occur, depending on the object connected to fd. POSIX allows a read() that is interrupted after reading some data to return -1 (with errno set to EINTR) or to return the number of bytes already read.
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
RESTRICTIONS
On NFS file systems, reading small amounts of data will only update the time stamp the first time, subsequent calls may not do so. This is caused by client side attribute caching, because most if not all NFS clients leave st_atime (last file access time) updates to the server and client side reads satisfied from the client’s cache will not cause st_atime updates on the server as there are no server side reads. UNIX semantics can be obtained by disabling client side attribute caching, but in most situations this will substantially increase server load and decrease performance.
Many filesystems and disks were considered to be fast enough that the implementation of O_NONBLOCK was deemed unnecessary. So, O_NONBLOCK may not be available on files and/or disks.
另请参阅
- close (2)
- fcntl (2)
- ioctl (2)
- lseek (2)
- open (2)
- pread (2)
- readdir (2)
- readlink (2)
- readv (2)
- select (2)
- write (2)
readlinkat()函数
readlinkat - 读符号链接一个相对的值到一个目录文件描述符
内容简介
#include <unistd.h> int readlinkat(int dirfd, const char *path ", char *" buf ", size_t " bufsiz ); |
描述
The readlinkat() system call operates in exactly the same way as readlink(2), except for the differences described in this manual page.
If the pathname given in path is relative, then it is interpreted relative to the directory referred to by the file descriptor dirfd (rather than relative to the current working directory of the calling process, as is done by readlink(2) for a relative pathname).
If the pathname given in path is relative and dirfd is the special value AT_FDCWD, thenpath is interpreted relative to the current working directory of the calling process (likereadlink(2)).
If the pathname given in path is absolute, then dirfd is ignored.
返回值
On success, readlinkat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for readlink(2) can also occur for readlinkat(). The following additional errors can occur for readlinkat():
标签 | 描述 |
EBADF | dirfd is not a valid file descriptor. |
ENOTDIR | |
path is a relative path and dirfd is a file descriptor referring to a file other than a directory. |
注意
See openat(2) for an explanation of the need for readlinkat().
遵循于
This system call is non-standard but is proposed for inclusion in a future revision of POSIX.1.
版本
readlinkat() was added to Linux in kernel 2.6.16.
另请参阅
readlink()函数
内容简介
#include <unistd.h>
ssize_t readlink(const char *path, char *buf, size_t bufsiz);
描述
readlink() places the contents of the symbolic link path in the buffer buf, which has sizebufsiz. readlink() does not append a null byte to buf. It will truncate the contents (to a length of bufsiz characters), in case the buffer is too small to hold all of the contents.
返回值
The call returns the count of characters placed in the buffer if it succeeds, or a -1 if an error occurs, placing the error code in errno.
错误
标签 | 描述 |
EACCES | 搜索权限的路径前缀的组成部分将被拒绝. (See also path_resolution(2).) |
EFAULT | buf 进程的已分配地址空间之外延伸. |
EINVAL | bufsiz is not positive. |
EINVAL | The named file is not a symbolic link. |
EIO | An I/O error occurred while reading from the file system. |
ELOOP | Too many symbolic links were encountered in translating the pathname. |
ENAMETOOLONG | |
路径名,或路径名的组成部分太长。 | |
ENOENT | 指定的文件不存在。 |
ENOMEM | 没有足够的内核内存可用。 |
ENOTDIR | |
路径前缀的某组成部分不是目录。 |
遵循于
4.4BSD (the readlink() function call appeared in 4.2BSD), POSIX.1-2001.
HISTORY
In versions of glibc up to and including glibc 2.4, the return type of readlink() was declared as int. Nowadays, the return type is declared as ssize_t, as (newly) required in POSIX.1-2001.
另请参阅
readv()函数
readv, writev - 读取或写入数据到多个缓冲区
内容简介
#include <sys/uio.h> ssize_t readv(int fd, const struct iovec *vector, int count); ssize_t writev(int fd, const struct iovec *vector, int count); |
描述
Thereadv() function reads count blocks from the file associated with the file descriptor fdinto the multiple buffers described by vector.
The writev() function writes at most count blocks described by vector to the file associated with the file descriptor fd.
The pointer vector points to a struct iovec defined in <sys/uio.h> as :
struct iovec { |
Buffers are processed in the order specified. The readv() function works just likeread(2) except that multiple buffers are filled.
The writev() function works just like write(2) except that multiple buffers are written out.
返回值
On success, the readv() function returns the number of bytes read; the writev() function returns the number of bytes written. On error, -1 is returned, and errno is set appropriately.
错误
The errors are as given for read(2) and write(2). Additionally the following error is defined:
标签 | 描述 |
EINVAL | The sum of the iov_len values overflows an ssize_t value. Or, the vector count count is less than zero or greater than the permitted maximum. |
遵循于
4.4BSD (the readv() and writev() functions first appeared in 4.2BSD), POSIX.1-2001. Linux libc5 used size_t as the type of the count parameter, and int as return type for these functions.
LINUX 注意
POSIX.1-2001 allows an implementation to place a limit on the number of items that can be passed in vector. An implementation can advertise its limit by defining IOV_MAX in<limits.h> or at run time via the return value from sysconf(_SC_IOV_MAX). On Linux, the limit advertised by these mechanisms is 1024, which is the true kernel limit.
However, the glibc wrapper functions do some extra work if they detect that the underlying kernel system call failed because this limit was exceeded. In the case ofreadv() the wrapper function allocates a temporary buffer large enough for all of the items specified by vector, passes that buffer in a call to read(), copies data from the buffer to the locations specified by the iov_base fields of the elements of vector, and then frees the buffer.
The wrapper function for writev() performs the analogous task using a temporary buffer and a call to write().
BUGS
It is not advisable to mix calls to functions like readv() or writev(), which operate on file descriptors, with the functions from the stdio library; the results will be undefined and probably not what you want.
另请参阅
reboot()函数
reboot - 重新启动或启用/禁用按Ctrl-Alt-Del键
内容简介
For libc4 and libc5 the library call and the system call are identical, and since kernel version 2.1.30 there are symbolic names LINUX_REBOOT_* for the constants and a fourth argument to the call:
#include <unistd.h>
#include <linux/reboot.h>
int reboot(int magic, int magic2, int flag, void *arg);
Under glibc some of the constants involved have gotten symbolic names RB_*, and the library call is a 1-argument wrapper around the 3-argument system call:
#include <unistd.h>
#include <sys/reboot.h>
int reboot(int flag);
描述
The reboot() call reboots the system, or enables/disables the reboot keystroke (abbreviated CAD, since the default is Ctrl-Alt-Delete; it can be changed usingloadkeys(1)).
This system call will fail (with EINVAL) unless magic equals LINUX_REBOOT_MAGIC1 (that is, 0xfee1dead) and magic2 equals LINUX_REBOOT_MAGIC2 (that is, 672274793). However, since 2.1.17 also LINUX_REBOOT_MAGIC2A (that is, 85072278) and since 2.1.97 also LINUX_REBOOT_MAGIC2B (that is, 369367448) and since 2.5.71 also LINUX_REBOOT_MAGIC2C (that is, 537993216) are permitted as value for magic2. (The hexadecimal values of these constants are meaningful.) The flag argument can have the following values:
标签 | 描述 |
LINUX_REBOOT_CMD_RESTART | |
(RB_AUTOBOOT, 0x1234567). The message ‘Restarting system.’ is printed, and a default restart is performed immediately. If not preceded by a sync(2), data will be lost. | |
LINUX_REBOOT_CMD_HALT | |
(RB_HALT_SYSTEM, 0xcdef0123; since 1.1.76). The message ‘System halted.’ is printed, and the system is halted. Control is given to the ROM monitor, if there is one. If not preceded by async(2), data will be lost. | |
LINUX_REBOOT_CMD_POWER_OFF | |
(0x4321fedc; since 2.1.30). The message ‘Power down.’ is printed, the system is stopped, and all power is removed from the system, if possible. If not preceded by a sync(2), data will be lost. | |
LINUX_REBOOT_CMD_RESTART2 | |
(0xa1b2c3d4; since 2.1.30). The message ‘Restarting system with command ’%s’’ is printed, and a restart (using the command string given in arg) is performed immediately. If not preceded by a sync(2), data will be lost. | |
LINUX_REBOOT_CMD_CAD_ON | |
(RB_ENABLE_CAD, 0x89abcdef). CAD is enabled. This means that the CAD keystroke will immediately cause the action associated with LINUX_REBOOT_CMD_RESTART. | |
LINUX_REBOOT_CMD_CAD_OFF | |
(RB_DISABLE_CAD, 0). CAD is disabled. This means that the CAD keystroke will cause a SIGINT signal to be sent to init (process 1), whereupon this process may decide upon a proper action (maybe: kill all processes, sync, reboot). |
Only the superuser may use this function.
The precise effect of the above actions depends on the architecture. For the i386 architecture, the additional argument does not do anything at present (2.1.122), but the type of reboot can be determined by kernel command line arguments (‘reboot=...’) to be either warm or cold, and either hard or through the BIOS.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | Problem with getting userspace data under LINUX_REBOOT_CMD_RESTART2. |
EINVAL | Bad magic numbers or flag. |
EPERM | The calling process has insufficient privilege to call reboot(); theCAP_SYS_BOOT capability is required. |
遵循于
reboot() 是Linux特有的,并应在拟移植的程序不能使用。
另请参阅
recvfrom()函数
recv, recvfrom, recvmsg - 从套接字接收消息
内容简介
#include <sys/types.h> ssize_t recv(int s, void *buf, size_t len, int flags); ssize_t recvfrom(int s, void *buf, size_t len, int flags, struct sockaddr *from, socklen_t *fromlen); ssize_t recvmsg(int s, struct msghdr *msg, int flags); |
描述
The recvfrom() and recvmsg() 调用用于从套接字接收消息,并且可以被用于接收套接字上的数据是否是面向连接的。
If from is not NULL, and the underlying protocol provides the source address, this source address is filled in. The argument fromlen is a value-result parameter, initialized to the size of the buffer associated with from, and modified on return to indicate the actual size of the address stored there.
The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL from parameter.
All three routines return the length of the message on successful completion. If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of socket the message is received from.
If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking (see fcntl(2)), in which case the value -1 is returned and the external variable errno set to EAGAIN. The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested.
The select(2) or poll(2) 调用可被用来确定何时多个数据到达。
The flags argument to a recv() call is formed by OR’ing one or more of the following values:
标签 | 描述 | |
MSG_DONTWAIT | ||
Enables non-blocking operation; if the operation would block,EAGAIN is returned (this can also be enabled using theO_NONBLOCK with the F_SETFL fcntl(2)). | ||
MSG_ERRQUEUE | ||
This flag specifies that queued errors should be received from the socket error queue. The error is passed in an ancillary message with a type dependent on the protocol (for IPv4 IP_RECVERR). The user should supply a buffer of sufficient size. See cmsg(3) andip(7) for more information. The payload of the original packet that caused the error is passed as normal data via msg_iovec. The original destination address of the datagram that caused the error is supplied via msg_name. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, the MSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. The error is supplied in a sock_extended_err structure:
| ||
ee_errno contains the errno number of the queued error.ee_origin is the origin code of where the error originated. The other fields are protocol specific. The macro SOCK_EE_OFFENDERreturns a pointer to the address of the network object where the error originated from given a pointer to the ancillary message. If this address is not known, the sa_family member of the sockaddrcontains AF_UNSPEC and the other fields of the sockaddr are undefined. The payload of the packet that caused the error is passed as normal data. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, theMSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. | ||
MSG_OOB | ||
This flag requests receipt of out-of-band data that would not be received in the normal data stream. Some protocols place expedited data at the head of the normal data queue, and thus this flag cannot be used with such protocols. | ||
MSG_PEEK | ||
This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data. | ||
MSG_TRUNC | ||
Return the real length of the packet, even when it was longer than the passed buffer. Only valid for packet sockets. | ||
MSG_WAITALL | ||
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. | ||
The recvmsg() call uses a msghdr structure to minimize the number of directly supplied parameters. This structure has the following form, as defined in<sys/socket.h>:
| ||
Here msg_name and msg_namelen specify the source address if the socket is unconnected; msg_name may be given as a null pointer if no names are desired or required. The fields msg_iov and msg_iovlen describe scatter-gather locations, as discussed in readv(2). The field msg_control, which has length msg_controllen, points to a buffer for other protocol control related messages or miscellaneous ancillary data. When recvmsg() is called, msg_controllen should contain the length of the available buffer in msg_control; upon return from a successful call it will contain the length of the control message sequence. | ||
The messages are of the form:
| ||
Ancillary data should only be accessed by the macros defined in cmsg(3). | ||
As an example, Linux uses this auxiliary data mechanism to pass extended errors, IP options or file descriptors over Unix sockets. | ||
The msg_flags field in the msghdr is set on return of recvmsg(). It can contain several flags: | ||
MSG_EOR | ||
indicates end-of-record; the data returned completed a record (generally used with sockets of type SOCK_SEQPACKET). | ||
MSG_TRUNC | ||
indicates that the trailing portion of a datagram was discarded because the datagram was larger than the buffer supplied. | ||
MSG_CTRUNC | ||
indicates that some control data were discarded due to lack of space in the buffer for ancillary data. | ||
MSG_OOB | ||
is returned to indicate that expedited or out-of-band data were received. | ||
MSG_ERRQUEUE | ||
indicates that no data was received but an extended error from the socket error queue. |
返回值
These calls return the number of bytes received, or -1 if an error occurred. The return value will be 0 when the peer has performed an orderly shutdown.
错误
These are some standard errors generated by the socket layer. Additional errors may be generated and returned from the underlying protocol modules; see their manual pages.
标签 | 描述 |
EAGAIN | The socket is marked non-blocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received. |
EBADF | The argument s is an invalid descriptor. |
ECONNREFUSED | |
A remote host refused to allow the network connection (typically because it is not running the requested service). | |
EFAULT | The receive buffer pointer(s) point outside the process’s address space. |
EINTR | The receive was interrupted by delivery of a signal before any data were available. |
EINVAL | Invalid argument passed. |
ENOMEM | Could not allocate memory for recvmsg(). |
ENOTCONN | |
The socket is associated with a connection-oriented protocol and has not been connected (see connect(2) and accept(2)). | |
ENOTSOCK | |
The argument s does not refer to a socket. |
遵循于
4.4BSD (these function calls first appeared in 4.2BSD), POSIX.1-2001.
POSIX.1-2001 only describes the MSG_OOB, MSG_PEEK, and MSG_WAITALL flags.
注意
The prototypes given above follow glibc2. The Single Unix Specification agrees, except that it has return values of type ‘ssize_t’ (while 4.x BSD and libc4 and libc5 all have ‘int’). The flags argument is ‘int’ in 4.x BSD, but ‘unsigned int’ in libc4 and libc5. The lenargument is ‘int’ in 4.x BSD, but ‘size_t’ in libc4 and libc5. The fromlen argument is ‘int *’ in 4.x BSD, libc4 and libc5. The present ‘socklen_t *’ was invented by POSIX. See alsoaccept(2).
According to POSIX.1-2001, the msg_controllen field of the msghdr structure should be typed as socklen_t, but glibc currently (2.4) types it as size_t.
另请参阅
recv()函数
recv, recvfrom, recvmsg - 从套接字接收消息
内容简介
#include <sys/types.h> ssize_t recv(int s, void *buf, size_t len, int flags); ssize_t recvfrom(int s, void *buf, size_t len, int flags, struct sockaddr *from, socklen_t *fromlen); ssize_t recvmsg(int s, struct msghdr *msg, int flags); |
描述
The recvfrom() and recvmsg() calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented.
If from is not NULL, and the underlying protocol provides the source address, this source address is filled in. The argument fromlen is a value-result parameter, initialized to the size of the buffer associated with from, and modified on return to indicate the actual size of the address stored there.
The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL from parameter.
All three routines return the length of the message on successful completion. If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of socket the message is received from.
If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking (see fcntl(2)), in which case the value -1 is returned and the external variable errno set to EAGAIN. The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested.
The select(2) or poll(2) call may be used to determine when more data arrives.
The flags argument to a recv() call is formed by OR’ing one or more of the following values:
标签 | 描述 | |
MSG_DONTWAIT | ||
Enables non-blocking operation; if the operation would block,EAGAIN is returned (this can also be enabled using theO_NONBLOCK with the F_SETFL fcntl(2)). | ||
MSG_ERRQUEUE | ||
This flag specifies that queued errors should be received from the socket error queue. The error is passed in an ancillary message with a type dependent on the protocol (for IPv4 IP_RECVERR). The user should supply a buffer of sufficient size. See cmsg(3) andip(7) for more information. The payload of the original packet that caused the error is passed as normal data via msg_iovec. The original destination address of the datagram that caused the error is supplied via msg_name. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, theMSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. The error is supplied in a sock_extended_err structure:
| ||
ee_errno contains the errno number of the queued error.ee_origin is the origin code of where the error originated. The other fields are protocol specific. The macro SOCK_EE_OFFENDERreturns a pointer to the address of the network object where the error originated from given a pointer to the ancillary message. If this address is not known, the sa_family member of the sockaddrcontains AF_UNSPEC and the other fields of the sockaddr are undefined. The payload of the packet that caused the error is passed as normal data. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, theMSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. | ||
MSG_OOB | ||
This flag requests receipt of out-of-band data that would not be received in the normal data stream. Some protocols place expedited data at the head of the normal data queue, and thus this flag cannot be used with such protocols. | ||
MSG_PEEK | ||
This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data. | ||
MSG_TRUNC | ||
Return the real length of the packet, even when it was longer than the passed buffer. Only valid for packet sockets. | ||
MSG_WAITALL | ||
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. | ||
The recvmsg() call uses a msghdr structure to minimize the number of directly supplied parameters. This structure has the following form, as defined in<sys/socket.h>:
| ||
Here msg_name and msg_namelen specify the source address if the socket is unconnected; msg_name may be given as a null pointer if no names are desired or required. The fields msg_iov and msg_iovlen describe scatter-gather locations, as discussed in readv(2). The field msg_control, which has length msg_controllen, points to a buffer for other protocol control related messages or miscellaneous ancillary data. When recvmsg() is called, msg_controllen should contain the length of the available buffer in msg_control; upon return from a successful call it will contain the length of the control message sequence. | ||
The messages are of the form:
| ||
Ancillary data should only be accessed by the macros defined in cmsg(3). | ||
As an example, Linux uses this auxiliary data mechanism to pass extended errors, IP options or file descriptors over Unix sockets. | ||
The msg_flags field in the msghdr is set on return of recvmsg(). It can contain several flags: | ||
MSG_EOR | ||
indicates end-of-record; the data returned completed a record (generally used with sockets of type SOCK_SEQPACKET). | ||
MSG_TRUNC | ||
indicates that the trailing portion of a datagram was discarded because the datagram was larger than the buffer supplied. | ||
MSG_CTRUNC | ||
indicates that some control data were discarded due to lack of space in the buffer for ancillary data. | ||
MSG_OOB | ||
is returned to indicate that expedited or out-of-band data were received. | ||
MSG_ERRQUEUE | ||
indicates that no data was received but an extended error from the socket error queue. |
返回值
These calls return the number of bytes received, or -1 if an error occurred. The return value will be 0 when the peer has performed an orderly shutdown.
错误
These are some standard errors generated by the socket layer. Additional errors may be generated and returned from the underlying protocol modules; see their manual pages.
标签 | 描述 |
EAGAIN | The socket is marked non-blocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received. |
EBADF | The argument s is an invalid descriptor. |
ECONNREFUSED | |
A remote host refused to allow the network connection (typically because it is not running the requested service). | |
EFAULT | The receive buffer pointer(s) point outside the process’s address space. |
EINTR | The receive was interrupted by delivery of a signal before any data were available. |
EINVAL | Invalid argument passed. |
ENOMEM | Could not allocate memory for recvmsg(). |
ENOTCONN | |
The socket is associated with a connection-oriented protocol and has not been connected (see connect(2) and accept(2)). | |
ENOTSOCK | |
The argument s does not refer to a socket. |
遵循于
4.4BSD (these function calls first appeared in 4.2BSD), POSIX.1-2001.
POSIX.1-2001 only describes the MSG_OOB, MSG_PEEK, and MSG_WAITALL flags.
注意
The prototypes given above follow glibc2. The Single Unix Specification agrees, except that it has return values of type ‘ssize_t’ (while 4.x BSD and libc4 and libc5 all have ‘int’). The flags argument is ‘int’ in 4.x BSD, but ‘unsigned int’ in libc4 and libc5. The lenargument is ‘int’ in 4.x BSD, but ‘size_t’ in libc4 and libc5. The fromlen argument is ‘int *’ in 4.x BSD, libc4 and libc5. The present ‘socklen_t *’ was invented by POSIX. See alsoaccept(2).
According to POSIX.1-2001, the msg_controllen field of the msghdr structure should be typed as socklen_t, but glibc currently (2.4) types it as size_t.
另请参阅
recvmsg()函数
recv, recvfrom, recvmsg - 从套接字接收消息
内容简介
#include <sys/types.h> ssize_t recv(int s, void *buf, size_t len, int flags); ssize_t recvfrom(int s, void *buf, size_t len, int flags, struct sockaddr *from, socklen_t *fromlen); ssize_t recvmsg(int s, struct msghdr *msg, int flags); |
描述
The recvfrom() and recvmsg() calls are used to receive messages from a socket, and may be used to receive data on a socket whether or not it is connection-oriented.
If from is not NULL, and the underlying protocol provides the source address, this source address is filled in. The argument fromlen is a value-result parameter, initialized to the size of the buffer associated with from, and modified on return to indicate the actual size of the address stored there.
The recv() call is normally used only on a connected socket (see connect(2)) and is identical to recvfrom() with a NULL from parameter.
All three routines return the length of the message on successful completion. If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of socket the message is received from.
If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking (see fcntl(2)), in which case the value -1 is returned and the external variable errno set to EAGAIN. The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested.
The select(2) or poll(2) call may be used to determine when more data arrives.
The flags argument to a recv() call is formed by OR’ing one or more of the following values:
标签 | 描述 | |
MSG_DONTWAIT | ||
Enables non-blocking operation; if the operation would block,EAGAIN is returned (this can also be enabled using theO_NONBLOCK with the F_SETFL fcntl(2)). | ||
MSG_ERRQUEUE | ||
This flag specifies that queued errors should be received from the socket error queue. The error is passed in an ancillary message with a type dependent on the protocol (for IPv4 IP_RECVERR). The user should supply a buffer of sufficient size. See cmsg(3) andip(7) for more information. The payload of the original packet that caused the error is passed as normal data via msg_iovec. The original destination address of the datagram that caused the error is supplied via msg_name. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, theMSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. The error is supplied in a sock_extended_err structure:
| ||
ee_errno contains the errno number of the queued error.ee_origin is the origin code of where the error originated. The other fields are protocol specific. The macro SOCK_EE_OFFENDERreturns a pointer to the address of the network object where the error originated from given a pointer to the ancillary message. If this address is not known, the sa_family member of the sockaddrcontains AF_UNSPEC and the other fields of the sockaddr are undefined. The payload of the packet that caused the error is passed as normal data. | ||
For local errors, no address is passed (this can be checked with thecmsg_len member of the cmsghdr). For error receives, theMSG_ERRQUEUE is set in the msghdr. After an error has been passed, the pending socket error is regenerated based on the next queued error and will be passed on the next socket operation. | ||
MSG_OOB | ||
This flag requests receipt of out-of-band data that would not be received in the normal data stream. Some protocols place expedited data at the head of the normal data queue, and thus this flag cannot be used with such protocols. | ||
MSG_PEEK | ||
This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data. | ||
MSG_TRUNC | ||
Return the real length of the packet, even when it was longer than the passed buffer. Only valid for packet sockets. | ||
MSG_WAITALL | ||
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. | ||
The recvmsg() call uses a msghdr structure to minimize the number of directly supplied parameters. This structure has the following form, as defined in<sys/socket.h>:
| ||
Here msg_name and msg_namelen specify the source address if the socket is unconnected; msg_name may be given as a null pointer if no names are desired or required. The fields msg_iov and msg_iovlen describe scatter-gather locations, as discussed in readv(2). The field msg_control, which has length msg_controllen, points to a buffer for other protocol control related messages or miscellaneous ancillary data. When recvmsg() is called, msg_controllen should contain the length of the available buffer in msg_control; upon return from a successful call it will contain the length of the control message sequence. | ||
The messages are of the form:
| ||
Ancillary data should only be accessed by the macros defined in cmsg(3). | ||
As an example, Linux uses this auxiliary data mechanism to pass extended errors, IP options or file descriptors over Unix sockets. | ||
The msg_flags field in the msghdr is set on return of recvmsg(). It can contain several flags: | ||
MSG_EOR | ||
indicates end-of-record; the data returned completed a record (generally used with sockets of type SOCK_SEQPACKET). | ||
MSG_TRUNC | ||
indicates that the trailing portion of a datagram was discarded because the datagram was larger than the buffer supplied. | ||
MSG_CTRUNC | ||
indicates that some control data were discarded due to lack of space in the buffer for ancillary data. | ||
MSG_OOB | ||
is returned to indicate that expedited or out-of-band data were received. | ||
MSG_ERRQUEUE | ||
indicates that no data was received but an extended error from the socket error queue. |
返回值
These calls return the number of bytes received, or -1 if an error occurred. The return value will be 0 when the peer has performed an orderly shutdown.
错误
These are some standard errors generated by the socket layer. Additional errors may be generated and returned from the underlying protocol modules; see their manual pages.
标签 | 描述 |
EAGAIN | The socket is marked non-blocking and the receive operation would block, or a receive timeout had been set and the timeout expired before data was received. |
EBADF | The argument s is an invalid descriptor. |
ECONNREFUSED | |
A remote host refused to allow the network connection (typically because it is not running the requested service). | |
EFAULT | The receive buffer pointer(s) point outside the process’s address space. |
EINTR | The receive was interrupted by delivery of a signal before any data were available. |
EINVAL | Invalid argument passed. |
ENOMEM | Could not allocate memory for recvmsg(). |
ENOTCONN | |
The socket is associated with a connection-oriented protocol and has not been connected (see connect(2) and accept(2)). | |
ENOTSOCK | |
The argument s does not refer to a socket. |
遵循于
4.4BSD (these function calls first appeared in 4.2BSD), POSIX.1-2001.
POSIX.1-2001 only describes the MSG_OOB, MSG_PEEK, and MSG_WAITALL flags.
注意
The prototypes given above follow glibc2. The Single Unix Specification agrees, except that it has return values of type ‘ssize_t’ (while 4.x BSD and libc4 and libc5 all have ‘int’). The flags argument is ‘int’ in 4.x BSD, but ‘unsigned int’ in libc4 and libc5. The lenargument is ‘int’ in 4.x BSD, but ‘size_t’ in libc4 and libc5. The fromlen argument is ‘int *’ in 4.x BSD, libc4 and libc5. The present ‘socklen_t *’ was invented by POSIX. See alsoaccept(2).
According to POSIX.1-2001, the msg_controllen field of the msghdr structure should be typed as socklen_t, but glibc currently (2.4) types it as size_t.
另请参阅
remap_file_pages()函数
remap_file_pages - 创建一个非线性映射文件
内容简介
#include <sys/mman.h>
int remap_file_pages(void *start, size_t size, int prot, ssize_t pgoff, int flags);
描述
remap_file_pages() 系统调用是用来建立一个非线性映射,也就是,在该文件的页被映射到存储器中的非连续的顺序映射. The advantage of using remap_file_pages() over using repeated calls to mmap(2) is that the former approach does not require the kernel to create additional VMA (Virtual Memory Area) data structures.
要创建一个非线性映射,我们执行以下步骤:
标签 | 描述 |
1. | Use mmap() to create a mapping (which is initially linear). This mapping must be created with the MAP_SHARED flag. |
2. | Use one or more calls to remap_file_pages() to rearrange the correspondence between the pages of the mapping and the pages of the file. It is possible to map the same page of a file into multiple locations within the mapped region. |
The pgoff and size arguments specify the region of the file that is to be relocated within the mapping: pgoff is a file offset in units of the system page size; size is the length of the region in bytes.
The start argument serves two purposes. First, it identifies the mapping whose pages we want to rearrange. Thus, start must be an address that falls within a region previously mapped by a call to mmap(). Second, start specifies the address at which the file pages identified by pgoff and size will be placed.
The values specified in start and size should be multiples of the system page size. If they are not, then the kernel rounds both values down to the nearest multiple of the page size.
The prot argument must be specified as 0.
The flags argument has the same meaning as for mmap(), but all flags other thanMAP_NONBLOCK are ignored.
返回值
On success, remap_file_pages() returns 0. On error, -1 is returned, and errno is set appropriately.
注意
The remap_file_pages() system call appeared in Linux 2.5.46.
错误
标签 | 描述 |
EINVAL | start does not refer to a valid mapping created with theMAP_SHARED flag. |
EINVAL | start, size, prot, or pgoff is invalid. |
遵循于
The remap_file_pages() system call is Linux specific.
另请参阅
renameat()函数
内容简介
#include <stdio.h> int renameat(int olddirfd, const char *oldpath, int newdirfd, const char *newpath); |
描述
The renameat() system call operates in exactly the same way as rename(2), except for the differences described in this manual page.
If the pathname given in oldpath is relative, then it is interpreted relative to the directory referred to by the file descriptor olddirfd (rather than relative to the current working directory of the calling process, as is done by rename(2) for a relative pathname).
If the pathname given in oldpath is relative and olddirfd is the special value AT_FDCWD, then oldpath is interpreted relative to the current working directory of the calling process (like rename(2)).
If the pathname given in oldpath is absolute, then olddirfd is ignored.
The interpretation of newpath is as for oldpath, except that a relative pathname is interpreted relative to the directory referred to by the file descriptor newdirfd.
返回值
On success, renameat() returns 0. On error, -1 is returned and errno is set to indicate the error.
错误
The same errors that occur for rename(2) can also occur for renameat(). The following additional errors can occur for renameat():
标签 | 描述 |
EBADF | olddirfd or newdirfd is not a valid file descriptor. |
ENOTDIR | |
oldpath is a relative path and olddirfd is a file descriptor referring to a file other than a directory; or similar for newpath andnewdirfd |
注意
See openat(2) for an explanation of the need for renameat().
遵循于
这个系统调用是非标准的,但建议列入POSIX.1将来的修订版。
版本
renameat() was added to Linux in kernel 2.6.16.
另请参阅
rename()函数
内容简介
#include <stdio.h>
int rename(const char *oldpath, const char *newpath);
描述
rename() 重命名文件,如果需要在目录之间移动它。
Any other hard links to the file (as created using link(2)) are unaffected.
If newpath already exists it will be atomically replaced (subject to a few conditions; see ERRORS below), so that there is no yiibai at which another process attempting to accessnewpath will find it missing.
If newpath exists but the operation fails for some reason rename() guarantees to leave an instance of newpath in place.
However, when overwriting there will probably be a window in which both oldpath andnewpath refer to the file being renamed.
If oldpath refers to a symbolic link the link is renamed; if newpath refers to a symbolic link the link will be overwritten.
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | Write permission is denied for the directory containing oldpath ornewpath, or, search permission is denied for one of the directories in the path prefix of oldpath or newpath, or oldpath is a directory and does not allow write permission (needed to update the .. entry). (See also path_resolution(2).) |
EBUSY | The rename fails because oldpath or newpath is a directory that is in use by some process (perhaps as current working directory, or as root directory, or because it was open for reading) or is in use by the system (for example as mount yiibai), while the system considers this an error. (Note that there is no requirement to return EBUSY in such cases — there is nothing wrong with doing the rename anyway — but it is allowed to return EBUSY if the system cannot otherwise handle such situations.) |
EFAULT | oldpath or newpath yiibais outside your accessible address space. |
EINVAL | The new pathname contained a path prefix of the old, or, more generally, an attempt was made to make a directory a subdirectory of itself. |
EISDIR | newpath is an existing directory, but oldpath is not a directory. |
ELOOP | Too many symbolic links were encountered in resolving oldpathor newpath. |
EMLINK | oldpath already has the maximum number of links to it, or it was a directory and the directory containing newpath has the maximum number of links. |
ENAMETOOLONG | |
oldpath or newpath was too long. | |
ENOENT | A directory component in oldpath or newpath does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOSPC | The device containing the file has no room for the new directory entry. |
ENOTDIR | |
A component used as a directory in oldpath or newpath is not, in fact, a directory. Or, oldpath is a directory, and newpath exists but is not a directory. | |
ENOTEMPTY or EEXIST | |
newpath is a non-empty directory, i.e., contains entries other than "." and "..". | |
EPERM or EACCES | |
The directory containing oldpath has the sticky bit (S_ISVTX) set and the process’s effective user ID is neither the user ID of the file to be deleted nor that of the directory containing it, and the process is not privileged (Linux: does not have theCAP_FOWNER capability); or newpath is an existing file and the directory containing it has the sticky bit set and the process’s effective user ID is neither the user ID of the file to be replaced nor that of the directory containing it, and the process is not privileged (Linux: does not have the CAP_FOWNER capability); or the filesystem containing pathname does not support renaming of the type requested. | |
EROFS | The file is on a read-only filesystem. |
EXDEV | oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple yiibais, butrename(2) does not work across different mount yiibais, even if the same filesystem is mounted on both.) |
遵循于
4.3BSD, C89, POSIX.1-2001.
BUGS
On NFS filesystems, you can not assume that if the operation failed the file was not renamed. If the server does the rename operation and then crashes, the retransmitted RPC which will be processed when the server is up again causes a failure. The application is expected to deal with this. See link(2) for a similar problem.
另请参阅
request_key()函数
request_key - 要求从内核的密钥管理工具的一个键
内容简介
#include <keyutils.h> key_serial_t request_key(const char *type, const char *description, const char *callout_info, key_serial_t keyring); |
描述
request_key() asks the kernel to find a key of the given type that matches the specified description and, if successful, to attach it to the nominated keyring and to return its serial number.
request_key() first recursively searches all the keyrings attached to the calling process in the order thread-specific keyring, process-specific keyring and then session keyring for a matching key.
If request_key() is called from a program invoked by request_key() on behalf of some other process to generate a key, then the keyrings of that other process will be searched next, using that other process’s UID, GID, groups and security context to control access.
The keys in each keyring searched are checked for a match before any child keyrings are recursed into. Only keys that are searchable for the caller may be found, and onlysearchable keyrings may be searched.
If the key is not found then, if callout_info is set, this function will attempt to look further afield. In such a case, the callout_info is passed to a userspace service such as/sbin/request-key to generate the key.
If that is unsuccessful also, then an error will be returned, and a temporary negative key will be installed in the nominated keyring. This will expire after a few seconds, but will cause subsequent calls to request_key() to fail until it does.
The keyring serial number may be that of a valid keyring to which the caller has write permission, or it may be a special keyring ID:
标签 | 描述 |
KEY_SPEC_THREAD_KEYRING | |
This specifies the caller’s thread-specific keyring. | |
KEY_SPEC_PROCESS_KEYRING | |
This specifies the caller’s process-specific keyring. | |
KEY_SPEC_SESSION_KEYRING | |
This specifies the caller’s session-specific keyring. | |
KEY_SPEC_USER_KEYRING | |
This specifies the caller’s UID-specific keyring. | |
KEY_SPEC_USER_SESSION_KEYRING | |
This specifies the caller’s UID-session keyring. |
If a key is created, no matter whether it’s a valid key or a negative key, it will displace any other key of the same type and description from the destination keyring.
返回值
On success request_key() returns the serial number of the key it found. On error, the value -1 will be returned and errno will have been set to an appropriate error.
错误
标签 | 描述 |
ENOKEY | No matching key was found. |
EKEYEXPIRED | |
An expired key was found, but no replacement could be obtained. | |
EKEYREVOKED | |
A revoked key was found, but no replacement could be obtained. | |
EKEYREJECTED | |
The attempt to generate a new key was rejected. | |
ENOMEM | Insufficient memory to create a key. |
EINTR | The request was interrupted by a signal. |
EDQUOT | The key quota for this user would be exceeded by creating this key or linking it to the keyring. |
EACCES | The keyring wasn’t available for modification by the user. |
LINKING
Although this is a Linux system call, it is not present in libc but can be found rather inlibkeyutils. When linking, -lkeyutils should be specified to the linker.
另请参阅
rmdir()函数
内容简介
#include <unistd.h>
int rmdir(const char *pathname);
描述
rmdir() 删除一个目录,该目录必须是空的。
返回值
On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EACCES | Write access to the directory containing pathname was not allowed, or one of the directories in the path prefix of pathnamedid not allow search permission. (See also path_resolution(2). |
EBUSY | pathname is currently in use by the system or some process that prevents its removal. On Linux this means pathname is currently used as a mount yiibai or is the root directory of the calling process. |
EFAULT | pathname yiibais outside your accessible address space. |
EINVAL | pathname has . as last component. |
ELOOP | Too many symbolic links were encountered in resolvingpathname. |
ENAMETOOLONG | |
pathname was too long. | |
ENOENT | A directory component in pathname does not exist or is a dangling symbolic link. |
ENOMEM | Insufficient kernel memory was available. |
ENOTDIR | |
pathname, or a component used as a directory in pathname, is not, in fact, a directory. | |
ENOTEMPTY | |
pathname contains entries other than . and .. ; or, pathname has.. as its final component. | |
EPERM | The directory containing pathname has the sticky bit (S_ISVTX) set and the process’s effective user ID is neither the user ID of the file to be deleted nor that of the directory containing it, and the process is not privileged (Linux: does not have theCAP_FOWNER capability). |
EPERM | The filesystem containing pathname does not support the removal of directories. |
EROFS | pathname refers to a file on a read-only filesystem. |
遵循于
SVr4, 4.3BSD, POSIX.1-2001.
BUGS
在协议基础NFS 设备可能导致目录的其中仍在使用的意外消失。
另请参阅
sbrk()函数
内容简介
#include <unistd.h>
int brk(void *end_data_segment);
void *sbrk(intptr_t increment);
描述
brk() sets the end of the data segment to the value specified by end_data_segment, when that value is reasonable, the system does have enough memory and the process does not exceed its max data size (see setrlimit(2)).
sbrk() increments the program’s data space by increment bytes. sbrk() isn’t a system call, it is just a C library wrapper. Calling sbrk() with an increment of 0 can be used to find the current location of the program break.
返回值
On success, brk() returns zero, and sbrk() returns a yiibaier to the start of the new area. On error, -1 is returned, and errno is set to ENOMEM.
遵循于
4.3BSD; SUSv1, marked LEGACY in SUSv2, removed in POSIX.1-2001.
brk() and sbrk() are not defined in the C Standard and are deliberately excluded from the POSIX.1 standard (see paragraphs B.1.1.1.3 and B.8.3.3).
注意
Various systems use various types for the parameter of sbrk(). Common are int, ssize_t,ptrdiff_t, intptr_t.
另请参阅
sched_setaffinity()函数
sched_setaffinity, sched_getaffinity, CPU_CLR, CPU_ISSET, CPU_SET, CPU_ZERO - 设置和获取一个进程的CPU关联掩码
内容简介
int sched_setaffinity(pid_t pid, unsigned int cpusetsize, cpu_set_t *mask); |
描述
A process’s CPU affinity mask determines the set of CPUs on which it is eligible to run. On a multiprocessor system, setting the CPU affinity mask can be used to obtain performance benefits. For example, by dedicating one CPU to a particular process (i.e., setting the affinity mask of that process to specify a single CPU, and setting the affinity mask of all other processes to exclude that CPU), it is possible to ensure maximum execution speed for that process. Restricting a process to run on a single CPU also prevents the performance cost caused by the cache invalidation that occurs when a process ceases to execute on one CPU and then recommences execution on a different CPU.
A CPU affinity mask is represented by the cpu_set_t structure, a "CPU set", pointed to by mask. Four macros are provided to manipulate CPU sets. CPU_ZERO() clears a set.CPU_SET() and CPU_CLR() respectively add and remove a given CPU from a set.CPU_ISSET() tests to see if a CPU is part of the set; this is useful aftersched_getaffinity() returns. The first available CPU on the system corresponds to a cpuvalue of 0, the next CPU corresponds to a cpu value of 1, and so on. The constantCPU_SETSIZE (1024) specifies a value one greater than the maximum CPU number that can be stored in a CPU set.
sched_setaffinity() sets the CPU affinity mask of the process whose ID is pid to the value specified by mask. If pid is zero, then the calling process is used. The argumentcpusetsize is the length (in bytes) of the data pointed to by mask. Normally this argument would be specified as sizeof(cpu_set_t).
If the process specified by pid is not currently running on one of the CPUs specified inmask, then that process is migrated to one of the CPUs specified in mask.
sched_getaffinity() writes the affinity mask of the process whose ID is pid into thecpu_set_t structure pointed to by mask. The cpusetsize argument specifies the size (in bytes) of mask. If pid is zero, then the mask of the calling process is returned.
返回值
On success, sched_setaffinity() and sched_getaffinity() return 0. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | A supplied memory address was invalid. |
EINVAL | The affinity bitmask mask contains no processors that are physically on the system, or cpusetsize is smaller than the size of the affinity mask used by the kernel. |
EPERM | The calling process does not have appropriate privileges. The process calling sched_setaffinity() needs an effective user ID equal to the user ID or effective user ID of the process identified by pid, or it must possess the CAP_SYS_NICE capability. |
ESRCH | The process whose ID is pid could not be found. |
遵循于
这些系统调用是Linux特有的。
注意
The affinity mask is actually a per-thread attribute that can be adjusted independently for each of the threads in a thread group. The value returned from a call to gettid(2) can be passed in the argument pid.
A child created via fork(2) inherits its parent’s CPU affinity mask. The affinity mask is preserved across an execve(2).
This manual page describes the glibc interface for the CPU affinity calls. The actual system call interface is slightly different, with the mask being typed as unsigned long *, reflecting that the fact that the underlying implementation of CPU sets is a simple bitmask. On success, the raw sched_getaffinity() system call returns the size (in bytes) of the cpumask_t data type that is used internally by the kernel to represent the CPU set bitmask.
HISTORY
The CPU affinity system calls were introduced in Linux kernel 2.5.8. The library interfaces were introduced in glibc 2.3. Initially, the glibc interfaces included a cpusetsize argument. In glibc 2.3.2, the cpusetsize argument was removed, but this argument was restored in glibc 2.3.4.
另请参阅
- clone (2)
- getpriority (2)
- gettid (2)
- nice (2)
- sched_get_priority_max (2)
- sched_get_priority_min (2)
- sched_getscheduler (2)
- sched_setscheduler (2)
- setpriority (2)
sched_setscheduler(2) has a description of the Linux scheduling scheme.
sched_getparam()函数
sched_setparam, sched_getparam - 设置和获取调度参数
内容简介
#include <sched.h> |
描述
sched_setparam() sets the scheduling parameters associated with the scheduling policy for the process identified by pid. If pid is zero, then the parameters of the current process are set. The interpretation of the parameter param depends on the scheduling policy of the process identified by pid. See sched_setscheduler(2) for a description of the scheduling policies supported under Linux.
sched_getparam() retrieves the scheduling parameters for the process identified by pid. If pid is zero, then the parameters of the current process are retrieved.
sched_setparam() checks the validity of param for the scheduling policy of the process. The parameter param->sched_priority must lie within the range given bysched_get_priority_min(2) and sched_get_priority_max(2).
For a discussion of the privileges and resource limits related to scheduling priority and policy, see sched_setscheduler(2).
POSIX systems on which sched_setparam() and sched_getparam() are available define_POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_setparam() and sched_getparam() return 0. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | The parameter param does not make sense for the current scheduling policy. |
EPERM | The calling process does not have appropriate privileges (Linux: does not have the CAP_SYS_NICE capability). |
ESRCH | The process whose ID is pid could not be found. |
遵循于
POSIX.1-2001.
另请参阅
- getpriority (2)
- nice (2)
- sched_get_priority_max (2)
- sched_get_priority_min (2)
- sched_getaffinity (2)
- sched_getscheduler (2)
- sched_setaffinity (2)
- sched_setscheduler (2)
- setpriority (2)
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_get_priority_max()函数
sched_get_priority_max, sched_get_priority_min - 让静态优先级范围
内容简介
#include <sched.h>
int sched_get_priority_max(int policy);
int sched_get_priority_min(int policy);
DESCRIPTION
sched_get_priority_max() returns the maximum priority value that can be used with the scheduling algorithm identified by policy. sched_get_priority_min() returns the minimum priority value that can be used with the scheduling algorithm identified bypolicy. Supported policy values are SCHED_FIFO, SCHED_RR, SCHED_OTHER, andSCHED_BATCH. Further details about these policies can be found insched_setscheduler(2).
Processes with numerically higher priority values are scheduled before processes with numerically lower priority values. Thus, the value returned bysched_get_priority_max() will be greater than the value returned by sched_get_priority_min().
Linux allows the static priority value range 1 to 99 for SCHED_FIFO and SCHED_RR and the priority 0 for SCHED_OTHER and SCHED_BATCH. Scheduling priority ranges for the various policies are not alterable.
The range of scheduling priorities may vary on other POSIX systems, thus it is a good idea for portable applications to use a virtual priority range and map it to the interval given by sched_get_priority_max() and sched_get_priority_min(). POSIX.1-2001 requires a spread of at least 32 between the maximum and the minimum values forSCHED_FIFO and SCHED_RR.
POSIX systems on which sched_get_priority_max() and sched_get_priority_min() are available define _POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_get_priority_max() and sched_get_priority_min() return the maximum/minimum priority value for the named scheduling policy. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | 参数策略不识别定义的调度策略。 |
遵循于
POSIX.1-2001.
另请参阅
- sched_getaffinity (2)
- sched_getparam (2)
- sched_getscheduler (2)
- sched_setaffinity (2)
- sched_setparam (2)
- sched_setscheduler (2)
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_get_priority_min()函数
sched_get_priority_max, sched_get_priority_min - 获取静态优先级范围
内容简介
#include <sched.h>
int sched_get_priority_max(int policy);
int sched_get_priority_min(int policy);
DESCRIPTION
sched_get_priority_max() returns the maximum priority value that can be used with the scheduling algorithm identified by policy. sched_get_priority_min() returns the minimum priority value that can be used with the scheduling algorithm identified bypolicy. Supported policy values are SCHED_FIFO, SCHED_RR, SCHED_OTHER, andSCHED_BATCH. Further details about these policies can be found insched_setscheduler(2).
Processes with numerically higher priority values are scheduled before processes with numerically lower priority values. Thus, the value returned bysched_get_priority_max() will be greater than the value returned by sched_get_priority_min().
Linux allows the static priority value range 1 to 99 for SCHED_FIFO and SCHED_RR and the priority 0 for SCHED_OTHER and SCHED_BATCH. Scheduling priority ranges for the various policies are not alterable.
The range of scheduling priorities may vary on other POSIX systems, thus it is a good idea for portable applications to use a virtual priority range and map it to the interval given by sched_get_priority_max() and sched_get_priority_min(). POSIX.1-2001 requires a spread of at least 32 between the maximum and the minimum values forSCHED_FIFO and SCHED_RR.
POSIX systems on which sched_get_priority_max() and sched_get_priority_min() are available define _POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_get_priority_max() and sched_get_priority_min() return the maximum/minimum priority value for the named scheduling policy. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | 参数策略不识别定义的调度策略。 |
遵循于
POSIX.1-2001.
另请参阅
- sched_getaffinity (2)
- sched_getparam (2)
- sched_getscheduler (2)
- sched_setaffinity (2)
- sched_setparam (2)
- sched_setscheduler (2)
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_setscheduler()函数
sched_setscheduler, sched_getscheduler - 设置和获取调度算法/参数
内容简介
#include <sched.h> int sched_setscheduler(pid_t pid, int policy, const struct sched_param *param); int sched_getscheduler(pid_t pid); struct sched_param { ... int sched_priority; ... }; |
描述
sched_setscheduler() sets both the scheduling policy and the associated parameters for the process identified by pid. If pid equals zero, the scheduler of the calling process will be set. The interpretation of the parameter param depends on the selected policy. Currently, the following three scheduling policies are supported under Linux:SCHED_FIFO, SCHED_RR, SCHED_OTHER, and SCHED_BATCH; their respective semantics are described below.
sched_getscheduler() queries the scheduling policy currently applied to the process identified by pid. If pid equals zero, the policy of the calling process will be retrieved.
调度策略
The scheduler is the kernel part that decides which runnable process will be executed by the CPU next. The Linux scheduler offers three different scheduling policies, one for normal processes and two for real-time applications. A static priority value sched_priorityis assigned to each process and this value can be changed only via system calls. Conceptually, the scheduler maintains a list of runnable processes for each possiblesched_priority value, and sched_priority can have a value in the range 0 to 99. In order to determine the process that runs next, the Linux scheduler looks for the non-empty list with the highest static priority and takes the process at the head of this list. The scheduling policy determines for each process, where it will be inserted into the list of processes with equal static priority and how it will move inside this list.
SCHED_OTHER is the default universal time-sharing scheduler policy used by most processes. SCHED_BATCH is intended for "batch" style execution of processes.SCHED_FIFO and SCHED_RR are intended for special time-critical applications that need precise control over the way in which runnable processes are selected for execution.
Processes scheduled with SCHED_OTHER or SCHED_BATCH must be assigned the static priority 0. Processes scheduled under SCHED_FIFO or SCHED_RR can have a static priority in the range 1 to 99. The system calls sched_get_priority_min() andsched_get_priority_max() can be used to find out the valid priority range for a scheduling policy in a portable way on all POSIX.1-2001 conforming systems.
All scheduling is preemptive: If a process with a higher static priority gets ready to run, the current process will be preempted and returned into its wait list. The scheduling policy only determines the ordering within the list of runnable processes with equal static priority.
SCHED_FIFO:先入先出调度
SCHED_FIFO can only be used with static priorities higher than 0, which means that when a SCHED_FIFO processes becomes runnable, it will always immediately preempt any currently running SCHED_OTHER or SCHED_BATCH process. SCHED_FIFO is a simple scheduling algorithm without time slicing. For processes scheduled under theSCHED_FIFO policy, the following rules are applied: A SCHED_FIFO process that has been preempted by another process of higher priority will stay at the head of the list for its priority and will resume execution as soon as all processes of higher priority are blocked again. When a SCHED_FIFO process becomes runnable, it will be inserted at the end of the list for its priority. A call to sched_setscheduler() or sched_setparam() will put the SCHED_FIFO (or SCHED_RR) process identified by pid at the start of the list if it was runnable. As a consequence, it may preempt the currently running process if it has the same priority. (POSIX.1-2001 specifies that the process should go to the end of the list.) A process calling sched_yield() will be put at the end of the list. No other events will move a process scheduled under the SCHED_FIFO policy in the wait list of runnable processes with equal static priority. A SCHED_FIFO process runs until either it is blocked by an I/O request, it is preempted by a higher priority process, or it calls sched_yield().
SCHED_RR:轮循调度
SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described above forSCHED_FIFO also applies to SCHED_RR, except that each process is only allowed to run for a maximum time quantum. If a SCHED_RR process has been running for a time period equal to or longer than the time quantum, it will be put at the end of the list for its priority. A SCHED_RR process that has been preempted by a higher priority process and subsequently resumes execution as a running process will complete the unexpired portion of its round robin time quantum. The length of the time quantum can be retrieved using sched_rr_get_interval(2).
SCHED_OTHER:默认的Linux分时调度
SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the standard Linux time-sharing scheduler that is intended for all processes that do not require special static priority real-time mechanisms. The process to run is chosen from the static priority 0 list based on a dynamic priority that is determined only inside this list. The dynamic priority is based on the nice level (set by nice(2) or setpriority(2)) and increased for each time quantum the process is ready to run, but denied to run by the scheduler. This ensures fair progress among all SCHED_OTHER processes.
SCHED_BATCH:调度批处理
(Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority 0. This policy is similar to SCHED_OTHER, except that this policy will cause the scheduler to always assume that the process is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty so that this process is mildly disfavoured in scheduling decisions. This policy is useful for workloads that are non-interactive, but do not want to lower their nice value, and for workloads that want a deterministic scheduling policy without interactivity causing extra preemptions (between the workload’s tasks).
权限和资源限制
In Linux kernels before 2.6.12, only privileged (CAP_SYS_NICE) processes can set a non-zero static priority. The only change that an unprivileged process can make is to set the SCHED_OTHER policy, and this can only be done if the effective user ID of the caller of sched_setscheduler() matches the real or effective user ID of the target process (i.e., the process specified by pid) whose policy is being changed.
Since Linux 2.6.12, the RLIMIT_RTPRIO resource limit defines a ceiling on an unprivileged process’s priority for the SCHED_RR and SCHED_FIFO policies. If an unprivileged process has a non-zero RLIMIT_RTPRIO soft limit, then it can change its scheduling policy and priority, subject to the restriction that the priority cannot be set to a value higher than the RLIMIT_RTPRIO soft limit. If the RLIMIT_RTPRIO soft limit is 0, then the only permitted change is to lower the priority. Subject to the same rules, another unprivileged process can also make these changes, as long as the effective user ID of the process making the change matches the real or effective user ID of the target process. See getrlimit(2) for further information on RLIMIT_RTPRIO. Privileged (CAP_SYS_NICE) processes ignore this limit; as with older older kernels, they can make arbitrary changes to scheduling policy and priority.
响应时间
A blocked high priority process waiting for the I/O has a certain response time before it is scheduled again. The device driver writer can greatly reduce this response time by using a "slow interrupt" interrupt handler.
杂项
Child processes inherit the scheduling algorithm and parameters across a fork(). The scheduling algorithm and parameters are preserved across execve(2).
Memory locking is usually needed for real-time processes to avoid paging delays, this can be done with mlock() or mlockall().
As a non-blocking end-less loop in a process scheduled under SCHED_FIFO or SCHED_RRwill block all processes with lower priority forever, a software developer should always keep available on the console a shell scheduled under a higher static priority than the tested application. This will allow an emergency kill of tested real-time applications that do not block or terminate as expected.
POSIX systems on which sched_setscheduler() and sched_getscheduler() are available define _POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_setscheduler() returns zero. On success, sched_getscheduler() returns the policy for the process (a non-negative integer). On error, -1 is returned, anderrno is set appropriately.
错误
标签 | 描述 |
EINVAL | The scheduling policy is not one of the recognized policies, or the parameter param does not make sense for the policy. |
EPERM | The calling process does not have appropriate privileges. |
ESRCH | The process whose ID is pid could not be found. |
遵循于
POSIX.1-2001. The SCHED_BATCH policy is Linux specific.
注意
Standard Linux is a general-purpose operating system and can handle background processes, interactive applications, and soft real-time applications (applications that need to usually meet timing deadlines). This man page is directed at these kinds of applications.
Standard Linux is not designed to support hard real-time applications, that is, applications in which deadlines (often much shorter than a second) must be guaranteed or the system will fail catastrophically. Like all general-purpose operating systems, Linux is designed to maximize average case performance instead of worst case performance. Linux’s worst case performance for interrupt handling is much poorer than its average case, its various kernel locks (such as for SMP) produce long maximum wait times, and many of its performance improvement techniques decrease average time by increasing worst-case time. For most situations, that’s what you want, but if you truly are developing a hard real-time application, consider using hard real-time extensions to Linux such as RTLinux (http://www.rtlinux.org) or RTAI (http://www.rtai.org) or use a different operating system designed specifically for hard real-time applications.
另请参阅
- getpriority (2)
- mlock (2)
- mlockall (2)
- munlock (2)
- munlockall (2)
- nice (2)
- sched_get_priority_max (2)
- sched_get_priority_min (2)
- sched_getaffinity (2)
- sched_getparam (2)
- sched_rr_get_interval (2)
- sched_setaffinity (2)
- sched_setparam (2)
- sched_yield (2)
- setpriority (2)
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_rr_get_interval()函数
sched_rr_get_interval - 获得SCHED_RR间隔为命名过程
内容简介
#include <sched.h>
int sched_rr_get_interval(pid_t pid, struct timespec *tp);
struct timespec { |
描述
sched_rr_get_interval() writes into the timespec structure pointed to by tp the round robin time quantum for the process identified by pid. If pid is zero, the time quantum for the calling process is written into *tp. The identified process should be running under the SCHED_RR scheduling policy.
The round robin time quantum value is not alterable under Linux 1.3.81.
POSIX systems on which sched_rr_get_interval() is available define_POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_rr_get_interval() returns 0. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EFAULT | Problem with copying information to userspace. |
EINVAL | Invalid pid. |
ENOSYS | The system call is not yet implemented. |
ESRCH | The process whose ID is pid could not be found. |
遵循于
POSIX.1-2001.
BUGS
As of Linux 1.3.81 sched_rr_get_interval() returns with error ENOSYS, because SCHED_RR has not yet been fully implemented and tested properly.
另请参阅
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_setparam()函数
sched_setparam, sched_getparam - 设置和获取调度参数
内容简介
#include <sched.h> int sched_setparam(pid_t pid, const struct sched_param *param); int sched_getparam(pid_t pid, struct sched_param *param); struct sched_param { ... int sched_priority; ... }; |
描述
sched_setparam() sets the scheduling parameters associated with the scheduling policy for the process identified by pid. If pid is zero, then the parameters of the current process are set. The interpretation of the parameter param depends on the scheduling policy of the process identified by pid. See sched_setscheduler(2) for a description of the scheduling policies supported under Linux.
sched_getparam() retrieves the scheduling parameters for the process identified by pid. If pid is zero, then the parameters of the current process are retrieved.
sched_setparam() checks the validity of param for the scheduling policy of the process. The parameter param->sched_priority must lie within the range given bysched_get_priority_min(2) and sched_get_priority_max(2).
For a discussion of the privileges and resource limits related to scheduling priority and policy, see sched_setscheduler(2).
POSIX systems on which sched_setparam() and sched_getparam() are available define_POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_setparam() and sched_getparam() return 0. On error, -1 is returned, and errno is set appropriately.
错误
标签 | 描述 |
EINVAL | The parameter param does not make sense for the current scheduling policy. |
EPERM | The calling process does not have appropriate privileges (Linux: does not have the CAP_SYS_NICE capability). |
ESRCH | The process whose ID is pid could not be found. |
遵循于
POSIX.1-2001.
另请参阅
- getpriority (2)
- nice (2)
- sched_get_priority_max (2)
- sched_get_priority_min (2)
- sched_getaffinity (2)
- sched_getscheduler (2)
- sched_setaffinity (2)
- sched_setscheduler (2)
- setpriority (2)
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
sched_yield()函数
内容简介
#include <sched.h>
int sched_yield(void);
DESCRIPTION
A process can relinquish the processor voluntarily without blocking by callingsched_yield(). 该过程将被移到队列的末尾,它的静态优先级和一个新的过程变得运行。
Note: If the current process is the only process in the highest priority list at that time, this process will continue to run after a call to sched_yield().
POSIX systems on which sched_yield() is available define_POSIX_PRIORITY_SCHEDULING in <unistd.h>.
返回值
On success, sched_yield() returns 0. On error, -1 is returned, and errno is set appropriately.
遵循于
POSIX.1-2001.
另请参阅
Programming for the real world - POSIX.4 by Bill O. Gallmeister, O’Reilly & Associates, Inc., ISBN 1-56592-074-0
security()函数
afs_syscall, break, fattach, fdetach, ftime, getmsg, getpmsg, gtty, isastream, lock, mpx, multiplexer, prof, profil, putmsg, putpmsg, security, stty, ulimit, vserver - 未实现系统调用
内容简介
未实现系统调用.
描述
These system calls are not implemented in the Linux 2.4 kernel.
返回值
These system calls always return -1 and set errno to ENOSYS.
注意
Note that ftime(3), profil(3) and ulimit(3) are implemented as library functions.
Some system calls, like alloc_hugepages(2), free_hugepages(2), ioperm(2), iopl(2), and vm86(2) only exist on certain architectures.
Some system calls, like ipc(2), create_module(2), init_module(2), anddelete_module(2) only exist when the Linux kernel was built with support for them.
另请参阅
select()函数
select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - 同步I / O复用
内容简介
/* According to POSIX.1-2001 */ /* According to earlier standards */ #include <sys/time.h> #include <sys/types.h> #include <unistd.h> int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); void FD_CLR(int fd, fd_set *set); int FD_ISSET(int fd, fd_set *set); void FD_SET(int fd, fd_set *set); void FD_ZERO(fd_set *set); #define _XOPEN_SOURCE 600 #include <sys/select.h> int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask); |
描述
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
The operation of select() and pselect() is identical, with three differences:
标签 | 描述 |
(i) | select() uses a timeout that is a struct timeval (with seconds and microseconds), while pselect() uses a struct timespec (with seconds and nanoseconds). |
(ii) | select() may update the timeout argument to indicate how much time was left. pselect() does not change this argument. |
(iii) | select() has no sigmask argument, and behaves as pselect() called with NULL sigmask. |
Three independent sets of file descriptors are watched. Those listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file), those inwritefds will be watched to see if a write will not block, and those in exceptfds will be watched for exceptions. On exit, the sets are modified in place to indicate which file descriptors actually changed status. Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events.
Four macros are provided to manipulate the sets. FD_ZERO() clears a set. FD_SET() and FD_CLR() respectively add and remove a given file descriptor from a set.FD_ISSET() tests to see if a file descriptor is part of the set; this is useful after select() returns.
nfds is the highest-numbered file descriptor in any of the three sets, plus 1.
timeout is an upper bound on the amount of time elapsed before select() returns. It may be zero, causing select() to return immediately. (This is useful for polling.) Iftimeout is NULL (no timeout), select() can block indefinitely.
sigmask is a pointer to a signal mask (see sigprocmask(2)); if it is not NULL, thenpselect() first replaces the current signal mask by the one pointed to by sigmask, then does the ‘select’ function, and then restores the original signal mask.
Other than the difference in the precision of the timeout argument, the followingpselect() call:
ready = pselect(nfds, &readfds, &writefds, &exceptfds, timeout, &sigmask); |
is equivalent to atomically executing the following calls:
sigset_t origmask; sigprocmask(SIG_SETMASK, &sigmask, &origmask); |
The reason that pselect() is needed is that if one wants to wait for either a signal or for a file descriptor to become ready, then an atomic test is needed to prevent race conditions. (Suppose the signal handler sets a global flag and returns. Then a test of this global flag followed by a call of select() could hang indefinitely if the signal arrived just after the test but just before the call. By contrast, pselect() allows one to first block signals, handle the signals that have come in, then call pselect() with the desiredsigmask, avoiding the race.)
超时
The time structures involved are defined in <sys/time.h> and look like
struct timeval { |
and
struct timespec { |
(However, see below on the POSIX.1-2001 versions.)
Some code calls select() with all three sets empty, n zero, and a non-NULL timeout as a fairly portable way to sleep with subsecond precision.
On Linux, select() modifies timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behaviour.) This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for multipleselect()s in a loop without reinitializing it. Consider timeout to be undefined afterselect() returns.
返回值
On success, select() and pselect() return the number of file descriptors contained in the three returned descriptor sets (that is, the total number of bits that are set in readfds,writefds, exceptfds) which may be zero if the timeout expires before anything interesting happens. On error, -1 is returned, and errno is set appropriately; the sets and timeoutbecome undefined, so do not rely on their contents after an error.
错误
标签 | 描述 |
EBADF | An invalid file descriptor was given in one of the sets. (Perhaps a file descriptor that was already closed, or one on which an error has occurred.) |
EINTR | A signal was caught. |
EINVAL | nfds is negative or the value contained within timeout is invalid. |
ENOMEM | unable to allocate memory for internal tables. |
EXAMPLE
#include <stdio.h> int main(void) { fd_set rfds; struct timeval tv; int retval; /* Watch stdin (fd 0) to see when it has input. */ FD_ZERO(&rfds); FD_SET(0, &rfds); /* Wait up to five seconds. */ tv.tv_sec = 5; tv.tv_usec = 0; retval = select(1, &rfds, NULL, NULL, &tv); /* Don’t rely on the value of tv now! */ if (retval == -1) return 0; |
遵循于
select() conforms to POSIX.1-2001 and 4.4BSD (select() first appeared in 4.2BSD). Generally portable to/from non-BSD systems supporting clones of the BSD socket layer (including System V variants). However, note that the System V variant typically sets the timeout variable before exit, but the BSD variant does not.
pselect() is defined in POSIX.1g, and in POSIX.1-2001.
注意
An fd_set is a fixed size buffer. Executing FD_CLR() or FD_SET() with a value of fd that is negative or is equal to or larger than FD_SETSIZE will result in undefined behavior. Moreover, POSIX requires fd to be a valid file descriptor.
Concerning the types involved, the classical situation is that the two fields of a timevalstructure are longs (as shown above), and the structure is defined in <sys/time.h>. The POSIX.1-2001 situation is
struct timeval { |
where the structure is defined in <sys/select.h> and the data types time_t andsuseconds_t are defined in <sys/types.h>.
Concerning prototypes, the classical situation is that one should include <time.h> forselect(). The POSIX.1-2001 situation is that one should include <sys/select.h> forselect() and pselect(). Libc4 and libc5 do not have a <sys/select.h> header; under glibc 2.0 and later this header exists. Under glibc 2.0 it unconditionally gives the wrong prototype for pselect(), under glibc 2.1-2.2.1 it gives pselect() when _GNU_SOURCE is defined, under glibc 2.2.2-2.2.4 it gives it when _XOPEN_SOURCE is defined and has a value of 600 or larger. No doubt, since POSIX.1-2001, it should give the prototype by default.
版本
pselect() was added to Linux in kernel 2.6.16. Prior to this, pselect() was emulated in glibc (but see BUGS).
LINUX 注意
The Linux pselect() system call modifies its timeout argument. However, the glibc wrapper function hides this behaviour by using a local variable for the timeout argument that is passed to the system call. Thus, the glibc pselect() function does not modify its timeout argument; this is the behaviour required by POSIX.1-2001.
BUGS
Glibc 2.0 provided a version of pselect() that did not take a sigmask argument.
Since version 2.1, glibc has provided an emulation of pselect() that is implemented using sigprocmask(2) and select(). This implementation remains vulnerable to the very race condition that pselect() was designed to prevent. On systems that lack pselect() reliable (and more portable) signal trapping can be achieved using the self-pipe trick (where a signal handler writes a byte to a pipe whose other end is monitored byselect() in the main program.)
Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.
另请参阅
select_tut(2).
For vaguely related stuff, see accept(2), connect(2), poll(2), read(2), recv(2), send(2),sigprocmask(2), write(2), epoll(7), feature_test_macros(7)
select_tut()函数
select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO - 同步I / O复用
内容简介
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *utimeout);
int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *ntimeout, sigset_t *sigmask);
FD_CLR(int fd, fd_set *set);
FD_ISSET(int fd, fd_set *set);
FD_SET(int fd, fd_set *set);
FD_ZERO(fd_set *set);
描述
select() (or pselect()) is the pivot function of most C programs that handle more than one simultaneous file descriptor (or socket handle) in an efficient manner. Its principal arguments are three arrays of file descriptors: readfds, writefds, and exceptfds. The way that select() is usually used is to block while waiting for a "change of status" on one or more of the file descriptors. A "change of status" is when more characters become available from the file descriptor, or when space becomes available within the kernel’s internal buffers for more to be written to the file descriptor, or when a file descriptor goes into error (in the case of a socket or pipe this is when the other end of the connection is closed).
In summary, select() just watches multiple file descriptors, and is the standard Unix call to do so.
The arrays of file descriptors are called file descriptor sets. Each set is declared as typefd_set, and its contents can be altered with the macros FD_CLR(), FD_ISSET(),FD_SET(), and FD_ZERO(). FD_ZERO() is usually the first function to be used on a newly declared set. Thereafter, the individual file descriptors that you are interested in can be added one by one with FD_SET(). select() modifies the contents of the sets according to the rules described below; after calling select() you can test if your file descriptor is still present in the set with the FD_ISSET() macro. FD_ISSET() returns non-zero if the descriptor is present and zero if it is not. FD_CLR() removes a file descriptor from the set.
ARGUMENTS
标签 | 描述 | |
readfds | ||
This set is watched to see if data is available for reading from any of its file descriptors. After select() has returned, readfds will be cleared of all file descriptors except for those file descriptors that are immediately available for reading with a recv() (for sockets) or read() (for pipes, files, and sockets) call. | ||
writefds | ||
This set is watched to see if there is space to write data to any of its file descriptors. After select() has returned, writefds will be cleared of all file descriptors except for those file descriptors that are immediately available for writing with a send() (for sockets) or write() (for pipes, files, and sockets) call. | ||
exceptfds | ||
This set is watched for exceptions or errors on any of the file descriptors. However, that is actually just a rumor. How you useexceptfds is to watch for out-of-band (OOB) data. OOB data is data sent on a socket using the MSG_OOB flag, and henceexceptfds only really applies to sockets. See recv(2) and send(2) about this. After select() has returned, exceptfds will be cleared of all file descriptors except for those descriptors that are available for reading OOB data. You can only ever read one byte of OOB data though (which is done with recv()), and writing OOB data (done with send()) can be done at any time and will not block. Hence there is no need for a fourth set to check if a socket is available for writing OOB data. | ||
nfds | This is an integer one more than the maximum of any file descriptor in any of the sets. In other words, while you are busy adding file descriptors to your sets, you must calculate the maximum integer value of all of them, then increment this value by one, and then pass this as nfds to select(). | |
utimeout | ||
This is the longest time select() must wait before returning, even if nothing interesting happened. If this value is passed as NULL, then select() blocks indefinitely waiting for an event.utimeout can be set to zero seconds, which causes select() to return immediately. The structure struct timeval is defined as,
| ||
ntimeout | ||
This argument has the same meaning as utimeout but struct timespec has nanosecond precision as follows,
| ||
sigmask | ||
This argument holds a set of signals to allow while performing apselect() call (see sigaddset(3) and sigprocmask(2)). It can be passed as NULL, in which case it does not modify the set of allowed signals on entry and exit to the function. It will then behave just like select(). |
COMBINING SIGNAL AND DATA EVENTS
pselect() must be used if you are waiting for a signal as well as data from a file descriptor. Programs that receive signals as events normally use the signal handler only to raise a global flag. The global flag will indicate that the event must be processed in the main loop of the program. A signal will cause the select() (or pselect()) call to return with errno set to EINTR. This behavior is essential so that signals can be processed in the main loop of the program, otherwise select() would block indefinitely. Now, somewhere in the main loop will be a conditional to check the global flag. So we must ask: what if a signal arrives after the conditional, but before the select() call? The answer is that select() would block indefinitely, even though an event is actually pending. This race condition is solved by the pselect() call. This call can be used to mask out signals that are not to be received except within the pselect() call. For instance, let us say that the event in question was the exit of a child process. Before the start of the main loop, we would block SIGCHLD using sigprocmask(). Our pselect() call would enable SIGCHLD by using the virgin signal mask. Our program would look like:
int child_events = 0; |
实用
So what is the point of select()? Can’t I just read and write to my descriptors whenever I want? The point of select() is that it watches multiple descriptors at the same time and properly puts the process to sleep if there is no activity. It does this while enabling you to handle multiple simultaneous pipes and sockets. Unix programmers often find themselves in a position where they have to handle I/O from more than one file descriptor where the data flow may be intermittent. If you were to merely create a sequence of read() and write() calls, you would find that one of your calls may block waiting for data from/to a file descriptor, while another file descriptor is unused though available for data. select() efficiently copes with this situation.
A simple example of the use of select() can be found in the select(2) manual page.
PORT FORWARDING EXAMPLE
Here is an example that better demonstrates the true utility of select(). The listing below is a TCP forwarding program that forwards from one TCP port to another.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <string.h>
#include <signal.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>
static int forward_port;
#undef max
#define max(x,y) ((x) > (y) ? (x) : (y))
static int listen_socket (int listen_port) {
struct sockaddr_in a;
int s;
int yes;
if ((s = socket (AF_INET, SOCK_STREAM, 0)) < 0) {
perror ("socket");
return -1;
}
yes = 1;
if (setsockopt
(s, SOL_SOCKET, SO_REUSEADDR,
(char *) &yes, sizeof (yes)) < 0) {
perror ("setsockopt");
close (s);
return -1;
}
memset (&a, 0, sizeof (a));
a.sin_port = htons (listen_port);
a.sin_family = AF_INET;
if (bind
(s, (struct sockaddr *) &a, sizeof (a)) < 0) {
perror ("bind");
close (s);
return -1;
}
printf ("accepting connections on port %d\n",
(int) listen_port);
listen (s, 10);
return s;
}
static int connect_socket (int connect_port,
char *address) {
struct sockaddr_in a;
int s;
if ((s = socket (AF_INET, SOCK_STREAM, 0)) < 0) {
perror ("socket");
close (s);
return -1;
}
memset (&a, 0, sizeof (a));
a.sin_port = htons (connect_port);
a.sin_family = AF_INET;
if (!inet_aton
(address,
(struct in_addr *) &a.sin_addr.s_addr)) {
perror ("bad IP address format");
close (s);
return -1;
}
if (connect
(s, (struct sockaddr *) &a,
sizeof (a)) < 0) {
perror ("connect()");
shutdown (s, SHUT_RDWR);
close (s);
return -1;
}
return s;
}
#define SHUT_FD1 { \
if (fd1 >= 0) { \
shutdown (fd1, SHUT_RDWR); \
close (fd1); \
fd1 = -1; \
} \
}
#define SHUT_FD2 { \
if (fd2 >= 0) { \
shutdown (fd2, SHUT_RDWR); \
close (fd2); \
fd2 = -1; \
} \
}
#define BUF_SIZE 1024
int main (int argc, char **argv) {
int h;
int fd1 = -1, fd2 = -1;
char buf1[BUF_SIZE], buf2[BUF_SIZE];
int buf1_avail, buf1_written;
int buf2_avail, buf2_written;
if (argc != 4) {
fprintf (stderr,
"Usage\n\tfwd
\
\n");
exit (1);
}
signal (SIGPIPE, SIG_IGN);
forward_port = atoi (argv[2]);
h = listen_socket (atoi (argv[1]));
if (h < 0)
exit (1);
for (;;) {
int r, nfds = 0;
fd_set rd, wr, er;
FD_ZERO (&rd);
FD_ZERO (&wr);
FD_ZERO (&er);
FD_SET (h, &rd);
nfds = max (nfds, h);
if (fd1 > 0 && buf1_avail < BUF_SIZE) {
FD_SET (fd1, &rd);
nfds = max (nfds, fd1);
}
if (fd2 > 0 && buf2_avail < BUF_SIZE) {
FD_SET (fd2, &rd);
nfds = max (nfds, fd2);
}
if (fd1 > 0
&& buf2_avail - buf2_written > 0) {
FD_SET (fd1, &wr);
nfds = max (nfds, fd1);
}
if (fd2 > 0
&& buf1_avail - buf1_written > 0) {
FD_SET (fd2, &wr);
nfds = max (nfds, fd2);
}
if (fd1 > 0) {
FD_SET (fd1, &er);
nfds = max (nfds, fd1);
}
if (fd2 > 0) {
FD_SET (fd2, &er);
nfds = max (nfds, fd2);
}
r = select (nfds + 1, &rd, &wr, &er, NULL);
if (r == -1 && errno == EINTR)
continue;
if (r < 0) {
perror ("select()");
exit (1);
}
if (FD_ISSET (h, &rd)) {
unsigned int l;
struct sockaddr_in client_address;
memset (&client_address, 0, l =
sizeof (client_address));
r = accept (h, (struct sockaddr *)
&client_address, &l);
if (r < 0) {
perror ("accept()");
} else {
SHUT_FD1;
SHUT_FD2;
buf1_avail = buf1_written = 0;
buf2_avail = buf2_written = 0;
fd1 = r;
fd2 =
connect_socket (forward_port,
argv[3]);
if (fd2 < 0) {
SHUT_FD1;
} else
printf ("connect from %s\n",
inet_ntoa
(client_address.sin_addr));
}
}
/* NB: read oob data before normal reads */
if (fd1 > 0)
if (FD_ISSET (fd1, &er)) {
char c;
errno = 0;
r = recv (fd1, &c, 1, MSG_OOB);
if (r < 1) {
SHUT_FD1;
} else
send (fd2, &c, 1, MSG_OOB);
}
if (fd2 > 0)
if (FD_ISSET (fd2, &er)) {
char c;
errno = 0;
r = recv (fd2, &c, 1, MSG_OOB);
if (r < 1) {
SHUT_FD1;
} else
send (fd1, &c, 1, MSG_OOB);
}
if (fd1 > 0)
if (FD_ISSET (fd1, &rd)) {
r =
read (fd1, buf1 + buf1_avail,
BUF_SIZE - buf1_avail);
if (r < 1) {
SHUT_FD1;
} else
buf1_avail += r;
}
if (fd2 > 0)
if (FD_ISSET (fd2, &rd)) {
r =
read (fd2, buf2 + buf2_avail,
BUF_SIZE - buf2_avail);
if (r < 1) {
SHUT_FD2;
} else
buf2_avail += r;
}
if (fd1 > 0)
if (FD_ISSET (fd1, &wr)) {
r =
write (fd1,
buf2 + buf2_written,
buf2_avail -
buf2_written);
if (r < 1) {
SHUT_FD1;
} else
buf2_written += r;
}
if (fd2 > 0)
if (FD_ISSET (fd2, &wr)) {
r =
write (fd2,
buf1 + buf1_written,
buf1_avail -
buf1_written);
if (r < 1) {
SHUT_FD2;
} else
buf1_written += r;
}
/* check if write data has caught read data */
if (buf1_written == buf1_avail)
buf1_written = buf1_avail = 0;
if (buf2_written == buf2_avail)
buf2_written = buf2_avail = 0;
/* one side has closed the connection, keep
writing to the other side until empty */
if (fd1 < 0
&& buf1_avail - buf1_written == 0) {
SHUT_FD2;
}
if (fd2 < 0
&& buf2_avail - buf2_written == 0) {
SHUT_FD1;
}
}
return 0;
}
vetica, arial, sans-serif; color: rgb(0, 0, 0);"> The above program properly forwards most kinds of TCP connections including OOB signal data transmitted by telnet servers. It handles the tricky problem of having data flow in both directions simultaneously. You might think it more efficient to use a fork() call and devote a thread to each stream. This becomes more tricky than you might suspect. Another idea is to set non-blocking I/O using an ioctl() call. This also has its problems because you end up having to have inefficient timeouts.
The program does not handle more than one simultaneous connection at a time, although it could easily be extended to do this with a linked list of buffers — one for each connection. At the moment, new connections cause the current connection to be dropped.
SELECT LAW
Many people who try to use select() come across behavior that is difficult to understand and produces non-portable or borderline results. For instance, the above program is carefully written not to block at any point, even though it does not set its file descriptors to non-blocking mode at all (see ioctl(2)). It is easy to introduce subtle errors that will remove the advantage of using select(), hence I will present a list of essentials to watch for when using the select() call.
标签 | 描述 |
1. | You should always try to use select() without a timeout. Your program should have nothing to do if there is no data available. Code that depends on timeouts is not usually portable and is difficult to debug. |
2. | The value nfds must be properly calculated for efficiency as explained above. |
3. | No file descriptor must be added to any set if you do not intend to check its result after the select() call, and respond appropriately. See next rule. |
4. | After select() returns, all file descriptors in all sets should be checked to see if they are ready. |
5. | The functions read(), recv(), write(), and send() do notnecessarily read/write the full amount of data that you have requested. If they do read/write the full amount, its because you have a low traffic load and a fast stream. This is not always going to be the case. You should cope with the case of your functions only managing to send or receive a single byte. |
6. | Never read/write only in single bytes at a time unless your are really sure that you have a small amount of data to process. It is extremely inefficient not to read/write as much data as you can buffer each time. The buffers in the example above are 1024 bytes although they could easily be made larger. |
7. | The functions read(), recv(), write(), and send() as well as theselect() call can return -1 with errno set to EINTR, or with errnoset to EAGAIN (EWOULDBLOCK). These results must be properly managed (not done properly above). If your program is not going to receive any signals then it is unlikely you will getEINTR. If your program does not set non-blocking I/O, you will not get EAGAIN. Nonetheless you should still cope with these errors for completeness. |
8. | Never call read(), recv(), write(), or send() with a buffer length of zero. |
9. | If the functions read(), recv(), write(), and send() fail with errors other than those listed in 7., or one of the input functions returns 0, indicating end of file, then you should not pass that descriptor to select() again. In the above example, I close the descriptor immediately, and then set it to -1 to prevent it being included in a set. |
10. | The timeout value must be initialized with each new call toselect(), since some operating systems modify the structure.pselect() however does not modify its timeout structure. |
11. | I have heard that the Windows socket layer does not cope with OOB data properly. It also does not cope with select() calls when no file descriptors are set at all. Having no file descriptors set is a useful way to sleep the process with sub-second precision by using the timeout. (See further on.) |
USLEEP EMULATION
On systems that do not have a usleep() function, you can call select() with a finite timeout and no file descriptors as follows:
struct timeval tv; |
This is only guaranteed to work on Unix systems, however.
返回值
On success, select() returns the total number of file descriptors still present in the file descriptor sets.
If select() timed out, then the return value will be zero. The file descriptors set should be all empty (but may not be on some systems).
A return value of -1 indicates an error, with errno being set appropriately. In the case of an error, the returned sets and the timeout struct contents are undefined and should not be used. pselect() however never modifies ntimeout.
注意
Generally speaking, all operating systems that support sockets, also support select(). Many types of programs become extremely complicated without the use of select().select() can be used to solve many problems in a portable and efficient way that naive programmers try to solve in a more complicated manner using threads, forking, IPCs, signals, memory sharing, and so on.
The poll(2) system call has the same functionality as select(), and is somewhat more efficient when monitoring sparse file descriptor sets. It is nowadays widely available, but historically was less portable than select().
The Linux-specific epoll(7) API provides an interface that that is more efficient thanselect(2) and poll(2) when monitoring large numbers of file descriptors.
另请参阅
semctl()函数
内容简介
#include <sys/types.h> int semctl(int semid, int semnum, int cmd, ...); |
描述
semctl() performs the control operation specified by cmd on the semaphore set identified by semid, or on the semnum-th semaphore of that set. (The semaphores in a set are numbered starting at 0.)
This function has three or four arguments, depending on cmd. When there are four, the fourth has the type union semun. The calling program must define this union as follows:
union semun { |
The semid_ds data structure is defined in <sys/sem.h> as follows:
struct semid_ds { |
The ipc_perm structure is defined in <sys/ipc.h> as follows (the highlighted fields are settable using IPC_SET):
struct ipc_perm { |
Valid values for cmd are:
标签 | 描述 | |
IPC_STAT | Copy information from the kernel data structure associated withsemid into the semid_ds structure pointed to by arg.buf. The argument semnum is ignored. The calling process must have read permission on the semaphore set. | |
IPC_SET | Write the values of some members of the semid_ds structure pointed to by arg.buf to the kernel data structure associated with this semaphore set, updating also its sem_ctime member. The following members of the structure are updated:sem_perm.uid, sem_perm.gid, and (the least significant 9 bits of)sem_perm.mode. The effective UID of the calling process must match the owner (sem_perm.uid) or creator (sem_perm.cuid) of the semaphore set, or the caller must be privileged. The argument semnum is ignored. | |
IPC_RMID | Immediately remove the semaphore set, awakening all processes blocked in semop() calls on the set (with an error return and errno set to EIDRM). The effective user ID of the calling process must match the creator or owner of the semaphore set, or the caller must be privileged. The argumentsemnum is ignored. | |
IPC_INFO (Linux specific) | ||
Returns information about system-wide semaphore limits and parameters in the structure pointed to by arg.__buf. This structure is of type seminfo, defined in <sys/sem.h> if the _GNU_SOURCE feature test macro is defined:
The semmsl, semmns, semopm, and semmni settings can be changed via /proc/sys/kernel/sem; see proc(5) for details. | ||
SEM_INFO (Linux specific) | ||
Returns a seminfo structure containing the same information as for IPC_INFO, except that the following fields are returned with information about system resources consumed by semaphores: the semusz field returns the number of semaphore sets that currently exist on the system; and the semaem field returns the total number of semaphores in all semaphore sets on the system. | ||
SEM_STAT (Linux specific) | ||
Returns a semid_ds structure as for IPC_STAT. However, thesemid argument is not a semaphore identifier, but instead an index into the kernel’s internal array that maintains information about all semaphore sets on the system. | ||
GETALL | Return semval (i.e., the current value) for all semaphores of the set into arg.array. The argument semnum is ignored. The calling process must have read permission on the semaphore set. | |
GETNCNT | The system call returns the value of semncnt (i.e., the number of processes waiting for the value of this semaphore to increase) for the semnum-th semaphore of the set (i.e. the number of processes waiting for an increase of semval for thesemnum-th semaphore of the set). The calling process must have read permission on the semaphore set. | |
GETPID | The system call returns the value of sempid for the semnum-th semaphore of the set (i.e. the PID of the process that executed the last semop() call for the semnum-th semaphore of the set). The calling process must have read permission on the semaphore set. | |
GETVAL | The system call returns the value of semval for the semnum-th semaphore of the set. The calling process must have read permission on the semaphore set. | |
GETZCNT | The system call returns the value of semzcnt (i.e., the number of processes waiting for the value of this semaphore to become zero) for the semnum-th semaphore of the set (i.e. the number of processes waiting for semval of the semnum-th semaphore of the set to become 0). The calling process must have read permission on the semaphore set. | |
SETALL | Set semval for all semaphores of the set using arg.array,updating also the sem_ctime member of the semid_ds structure associated with the set. Undo entries (see semop(2)) are cleared for altered semaphores in all processes. If the changes to semaphore values would permit blocked semop() calls in other processes to proceed, then those processes are woken up. The argument semnum is ignored. The calling process must have alter (write) permission on the semaphore set. | |
SETVAL | Set the value of semval to arg.val for the semnum-th semaphore of the set, updating also the sem_ctime member of the semid_ds structure associated with the set. Undo entries are cleared for altered semaphores in all processes. If the changes to semaphore values would permit blocked semop() calls in other processes to proceed, then those processes are woken up. The calling process must have alter permission on the semaphore set. |
返回值
On failure semctl() returns -1 with errno indicating the error.
Otherwise the system call returns a nonnegative value depending on cmd as follows:
标签 | 描述 |
GETNCNT | the value of semncnt. |
GETPID | the value of sempid. |
GETVAL | the value of semval. |
GETZCNT | the value of semzcnt. |
IPC_INFO | the index of the highest used entry in the kernel’s internal array recording information about all semaphore sets. (This information can be used with repeated SEM_STAT operations to obtain information about all semaphore sets on the system.) |
SEM_INFO | As for IPC_INFO. |
SEM_STAT | the identifier of the semaphore set whose index was given insemid. |
All other cmd values return 0 on success.
错误
On failure, errno will be set to one of the following:
标签 | 描述 |
EACCES | The argument cmd has one of the values GETALL, GETPID,GETVAL, GETNCNT, GETZCNT, IPC_STAT, SEM_STAT,SETALL, or SETVAL and the calling process does not have the required permissions on the semaphore set and does not have the CAP_IPC_OWNER capability. |
EFAULT | The address pointed to by arg.buf or arg.array isn’t accessible. |
EIDRM | The semaphore set was removed. |
EINVAL | Invalid value for cmd or semid. Or: for a SEM_STAT operation, the index value specified in semid referred to an array slot that is currently unused. |
EPERM | The argument cmd has the value IPC_SET or IPC_RMID but the effective user ID of the calling process is not the creator (as found in sem_perm.cuid) or the owner (as found insem_perm.uid) of the semaphore set, and the process does not have the CAP_SYS_ADMIN capability. |
ERANGE | The argument cmd has the value SETALL or SETVAL and the value to which semval is to be set (for some semaphore of the set) is less than 0 or greater than the implementation limitSEMVMX. |
注意
The IPC_INFO, SEM_STAT and SEM_INFO operations are used by the ipcs(8) program to provide information on allocated resources. In the future these may modified or moved to a /proc file system interface.
Various fields in a struct semid_ds were shorts under Linux 2.2 and have become longs under Linux 2.4. To take advantage of this, a recompilation under glibc-2.1.91 or later should suffice. (The kernel distinguishes old and new calls by an IPC_64 flag in cmd.)
In some earlier versions of glibc, the semun union was defined in <sys/sem.h>, but POSIX.1-2001 requires that the caller define this union. On versions of glibc where this union is not defined, the macro _SEM_SEMUN_UNDEFINED is defined in <sys/sem.h>.
The following system limit on semaphore sets affects a semctl() call:
标签 | 描述 |
SEMVMX | Maximum value for semval: implementation dependent (32767). |
For greater portability it is best to always call semctl() with four arguments.
Under Linux, semctl() is not a system call, but is implemented via the system call ipc(2).
遵循于
SVr4, POSIX.1-2001.
另请参阅




