diff -urN oldtree/CREDITS newtree/CREDITS --- oldtree/CREDITS 2006-09-29 15:59:29.000000000 -0400 +++ newtree/CREDITS 2006-09-30 09:51:44.000000000 -0400 @@ -2228,6 +2228,12 @@ S: Freiburg S: Germany +N: Paul E. McKenney +E: paulmck@us.ibm.com +W: http://www.rdrop.com/users/paulmck/ +D: RCU and variants +D: rcutorture module + N: Mike McLagan E: mike.mclagan@linux.org W: http://www.invlogic.com/~mmclagan @@ -2969,6 +2975,10 @@ S: 75013 Paris S: France +N: Dipankar Sarma +E: dipankar@in.ibm.com +D: RCU + N: Hannu Savolainen E: hannu@opensound.com D: Maintainer of the sound drivers until 2.1.x days. @@ -3281,6 +3291,12 @@ S: MacGregor A.C.T 2615 S: Australia +N: Josh Triplett +E: josh@freedesktop.org +P: 1024D/D0FE7AFB B24A 65C9 1D71 2AC2 DE87 CA26 189B 9946 D0FE 7AFB +D: rcutorture maintainer +D: lock annotations, finding and fixing lock bugs + N: Winfried Trümper E: winni@xpilot.org W: http://www.shop.de/~winni/ diff -urN oldtree/Documentation/RCU/checklist.txt newtree/Documentation/RCU/checklist.txt --- oldtree/Documentation/RCU/checklist.txt 2006-09-29 13:50:42.000000000 -0400 +++ newtree/Documentation/RCU/checklist.txt 2006-09-30 09:45:06.000000000 -0400 @@ -221,3 +221,41 @@ disable irq on a given acquisition of that lock will result in deadlock as soon as the RCU callback happens to interrupt that acquisition's critical section. + +13. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) + may only be invoked from process context. Unlike other forms of + RCU, it -is- permissible to block in an SRCU read-side critical + section (demarked by srcu_read_lock() and srcu_read_unlock()), + hence the "SRCU": "sleepable RCU". Please note that if you + don't need to sleep in read-side critical sections, you should + be using RCU rather than SRCU, because RCU is almost always + faster and easier to use than is SRCU. + + Also unlike other forms of RCU, explicit initialization + and cleanup is required via init_srcu_struct() and + cleanup_srcu_struct(). These are passed a "struct srcu_struct" + that defines the scope of a given SRCU domain. Once initialized, + the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() + and synchronize_srcu(). A given synchronize_srcu() waits only + for SRCU read-side critical sections governed by srcu_read_lock() + and srcu_read_unlock() calls that have been passd the same + srcu_struct. This property is what makes sleeping read-side + critical sections tolerable -- a given subsystem delays only + its own updates, not those of other subsystems using SRCU. + Therefore, SRCU is less prone to OOM the system than RCU would + be if RCU's read-side critical sections were permitted to + sleep. + + The ability to sleep in read-side critical sections does not + come for free. First, corresponding srcu_read_lock() and + srcu_read_unlock() calls must be passed the same srcu_struct. + Second, grace-period-detection overhead is amortized only + over those updates sharing a given srcu_struct, rather than + being globally amortized as they are for other forms of RCU. + Therefore, SRCU should be used in preference to rw_semaphore + only in extremely read-intensive situations, or in situations + requiring SRCU's read-side deadlock immunity or low read-side + realtime latency. + + Note that, rcu_assign_pointer() and rcu_dereference() relate to + SRCU just as they do to other forms of RCU. diff -urN oldtree/Documentation/RCU/rcu.txt newtree/Documentation/RCU/rcu.txt --- oldtree/Documentation/RCU/rcu.txt 2006-09-29 13:50:42.000000000 -0400 +++ newtree/Documentation/RCU/rcu.txt 2006-09-30 09:45:06.000000000 -0400 @@ -45,7 +45,8 @@ Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu", "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh", - "synchronize_rcu", and "synchronize_net". + "srcu_read_lock", "srcu_read_unlock", "synchronize_rcu", + "synchronize_net", and "synchronize_srcu". o What guidelines should I follow when writing code that uses RCU? diff -urN oldtree/Documentation/RCU/torture.txt newtree/Documentation/RCU/torture.txt --- oldtree/Documentation/RCU/torture.txt 2006-09-29 13:50:42.000000000 -0400 +++ newtree/Documentation/RCU/torture.txt 2006-09-30 09:48:06.000000000 -0400 @@ -28,6 +28,15 @@ To properly exercise RCU implementations with preemptible read-side critical sections. +nfakewriters This is the number of RCU fake writer threads to run. Fake + writer threads repeatedly use the synchronous "wait for + current readers" function of the interface selected by + torture_type, with a delay between calls to allow for various + different numbers of writers running in parallel. + nfakewriters defaults to 4, which provides enough parallelism + to trigger special cases caused by multiple writers, such as + the synchronize_srcu() early return optimization. + stat_interval The number of seconds between output of torture statistics (via printk()). Regardless of the interval, statistics are printed when the module is unloaded. @@ -44,9 +53,12 @@ a kernel that disables the scheduling-clock interrupt to idle CPUs. Boolean parameter, "1" to test, "0" otherwise. -torture_type The type of RCU to test: "rcu" for the rcu_read_lock() - API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu" - for the "srcu_read_lock()" API. +torture_type The type of RCU to test: "rcu" for the rcu_read_lock() API, + "rcu_sync" for rcu_read_lock() with synchronous reclamation, + "rcu_bh" for the rcu_read_lock_bh() API, "rcu_bh_sync" for + rcu_read_lock_bh() with synchronous reclamation, "srcu" for + the "srcu_read_lock()" API, and "sched" for the use of + preempt_disable() together with synchronize_sched(). verbose Enable debug printk()s. Default is disabled. @@ -118,6 +130,21 @@ as it is only incremented if a torture structure's counter somehow gets incremented farther than it should. +Different implementations of RCU can provide implementation-specific +additional information. For example, SRCU provides the following: + + srcu-torture: rtc: f8cf46a8 ver: 355 tfle: 0 rta: 356 rtaf: 0 rtf: 346 rtmbe: 0 + srcu-torture: Reader Pipe: 559738 939 0 0 0 0 0 0 0 0 0 + srcu-torture: Reader Batch: 560434 243 0 0 0 0 0 0 0 0 + srcu-torture: Free-Block Circulation: 355 354 353 352 351 350 349 348 347 346 0 + srcu-torture: per-CPU(idx=1): 0(0,1) 1(0,1) 2(0,0) 3(0,1) + +The first four lines are similar to those for RCU. The last line shows +the per-CPU counter state. The numbers in parentheses are the values +of the "old" and "current" counters for the corresponding CPU. The +"idx" value maps the "old" and "current" values to the underlying array, +and is useful for debugging. + USAGE diff -urN oldtree/Documentation/RCU/whatisRCU.txt newtree/Documentation/RCU/whatisRCU.txt --- oldtree/Documentation/RCU/whatisRCU.txt 2006-09-29 13:50:42.000000000 -0400 +++ newtree/Documentation/RCU/whatisRCU.txt 2006-09-30 09:45:06.000000000 -0400 @@ -778,6 +778,8 @@ rcu_read_unlock rcu_read_lock_bh rcu_read_unlock_bh + srcu_read_lock + srcu_read_unlock RCU pointer/list traversal: @@ -804,6 +806,7 @@ synchronize_net synchronize_sched synchronize_rcu + synchronize_srcu call_rcu call_rcu_bh diff -urN oldtree/Documentation/kernel-parameters.txt newtree/Documentation/kernel-parameters.txt --- oldtree/Documentation/kernel-parameters.txt 2006-09-29 14:03:18.000000000 -0400 +++ newtree/Documentation/kernel-parameters.txt 2006-09-30 09:50:24.000000000 -0400 @@ -1357,9 +1357,9 @@ rcu.qlowmark= [KNL,BOOT] Set threshold of queued RCU callbacks below which batch limiting is re-enabled. - rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional - RCU callbacks to queued before forcing reschedule - on all cpus. + rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional + RCU callbacks to queued before forcing reschedule + on all cpus. rdinit= [KNL] Format: diff -urN oldtree/MAINTAINERS newtree/MAINTAINERS --- oldtree/MAINTAINERS 2006-09-30 05:15:13.000000000 -0400 +++ newtree/MAINTAINERS 2006-09-30 09:51:44.000000000 -0400 @@ -2480,6 +2480,19 @@ L: linux-kernel@vger.kernel.org S: Maintained +READ-COPY UPDATE (RCU) +P: Dipankar Sarma +M: dipankar@in.ibm.com +W: http://www.rdrop.com/users/paulmck/rclock/ +L: linux-kernel@vger.kernel.org +S: Supported + +RCUTORTURE MODULE +P: Josh Triplett +M: josh@freedesktop.org +L: linux-kernel@vger.kernel.org +S: Maintained + REAL TIME CLOCK DRIVER P: Paul Gortmaker M: p_gortmaker@yahoo.com diff -urN oldtree/drivers/cpufreq/cpufreq.c newtree/drivers/cpufreq/cpufreq.c --- oldtree/drivers/cpufreq/cpufreq.c 2006-09-29 14:03:20.000000000 -0400 +++ newtree/drivers/cpufreq/cpufreq.c 2006-09-30 09:46:42.000000000 -0400 @@ -52,8 +52,14 @@ * The mutex locks both lists. */ static BLOCKING_NOTIFIER_HEAD(cpufreq_policy_notifier_list); -static BLOCKING_NOTIFIER_HEAD(cpufreq_transition_notifier_list); +static struct srcu_notifier_head cpufreq_transition_notifier_list; +static int __init init_cpufreq_transition_notifier_list(void) +{ + srcu_init_notifier_head(&cpufreq_transition_notifier_list); + return 0; +} +core_initcall(init_cpufreq_transition_notifier_list); static LIST_HEAD(cpufreq_governor_list); static DEFINE_MUTEX (cpufreq_governor_mutex); @@ -262,14 +268,14 @@ freqs->old = policy->cur; } } - blocking_notifier_call_chain(&cpufreq_transition_notifier_list, + srcu_notifier_call_chain(&cpufreq_transition_notifier_list, CPUFREQ_PRECHANGE, freqs); adjust_jiffies(CPUFREQ_PRECHANGE, freqs); break; case CPUFREQ_POSTCHANGE: adjust_jiffies(CPUFREQ_POSTCHANGE, freqs); - blocking_notifier_call_chain(&cpufreq_transition_notifier_list, + srcu_notifier_call_chain(&cpufreq_transition_notifier_list, CPUFREQ_POSTCHANGE, freqs); if (likely(policy) && likely(policy->cpu == freqs->cpu)) policy->cur = freqs->new; @@ -1049,7 +1055,7 @@ freqs.old = cpu_policy->cur; freqs.new = cur_freq; - blocking_notifier_call_chain(&cpufreq_transition_notifier_list, + srcu_notifier_call_chain(&cpufreq_transition_notifier_list, CPUFREQ_SUSPENDCHANGE, &freqs); adjust_jiffies(CPUFREQ_SUSPENDCHANGE, &freqs); @@ -1130,7 +1136,7 @@ freqs.old = cpu_policy->cur; freqs.new = cur_freq; - blocking_notifier_call_chain( + srcu_notifier_call_chain( &cpufreq_transition_notifier_list, CPUFREQ_RESUMECHANGE, &freqs); adjust_jiffies(CPUFREQ_RESUMECHANGE, &freqs); @@ -1176,7 +1182,7 @@ switch (list) { case CPUFREQ_TRANSITION_NOTIFIER: - ret = blocking_notifier_chain_register( + ret = srcu_notifier_chain_register( &cpufreq_transition_notifier_list, nb); break; case CPUFREQ_POLICY_NOTIFIER: @@ -1208,7 +1214,7 @@ switch (list) { case CPUFREQ_TRANSITION_NOTIFIER: - ret = blocking_notifier_chain_unregister( + ret = srcu_notifier_chain_unregister( &cpufreq_transition_notifier_list, nb); break; case CPUFREQ_POLICY_NOTIFIER: diff -urN oldtree/drivers/hwmon/hdaps.c newtree/drivers/hwmon/hdaps.c --- oldtree/drivers/hwmon/hdaps.c 2006-09-30 06:00:09.000000000 -0400 +++ newtree/drivers/hwmon/hdaps.c 2006-09-30 08:37:37.000000000 -0400 @@ -772,8 +772,6 @@ out_driver: platform_driver_unregister(&hdaps_driver); hdaps_device_shutdown(); -out_region: - release_region(HDAPS_LOW_PORT, HDAPS_NR_PORTS); out: printk(KERN_WARNING "hdaps: driver init failed (ret=%d)!\n", ret); return ret; diff -urN oldtree/include/linux/init_task.h newtree/include/linux/init_task.h --- oldtree/include/linux/init_task.h 2006-09-29 14:47:25.000000000 -0400 +++ newtree/include/linux/init_task.h 2006-09-30 09:01:24.000000000 -0400 @@ -120,7 +120,6 @@ .files = &init_files, \ .signal = &init_signals, \ .sighand = &init_sighand, \ - .nsproxy = &init_nsproxy, \ .pending = { \ .list = LIST_HEAD_INIT(tsk.pending.list), \ .signal = {{0}}}, \ diff -urN oldtree/include/linux/notifier.h newtree/include/linux/notifier.h --- oldtree/include/linux/notifier.h 2006-09-29 13:50:42.000000000 -0400 +++ newtree/include/linux/notifier.h 2006-09-30 09:46:17.000000000 -0400 @@ -12,9 +12,10 @@ #include #include #include +#include /* - * Notifier chains are of three types: + * Notifier chains are of four types: * * Atomic notifier chains: Chain callbacks run in interrupt/atomic * context. Callouts are not allowed to block. @@ -23,13 +24,27 @@ * Raw notifier chains: There are no restrictions on callbacks, * registration, or unregistration. All locking and protection * must be provided by the caller. + * SRCU notifier chains: A variant of blocking notifier chains, with + * the same restrictions. * * atomic_notifier_chain_register() may be called from an atomic context, - * but blocking_notifier_chain_register() must be called from a process - * context. Ditto for the corresponding _unregister() routines. + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() + * must be called from a process context. Ditto for the corresponding + * _unregister() routines. * - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() - * _must not_ be called from within the call chain. + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), + * and srcu_notifier_chain_unregister() _must not_ be called from within + * the call chain. + * + * SRCU notifier chains are an alternative form of blocking notifier chains. + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for + * protection of the chain links. This means there is _very_ low overhead + * in srcu_notifier_call_chain(): no cache bounces and no memory barriers. + * As compensation, srcu_notifier_chain_unregister() is rather expensive. + * SRCU notifier chains should be used when the chain will be called very + * often but notifier_blocks will seldom be removed. Also, SRCU notifier + * chains are slightly more difficult to use because they require special + * runtime initialization. */ struct notifier_block { @@ -52,6 +67,12 @@ struct notifier_block *head; }; +struct srcu_notifier_head { + struct mutex mutex; + struct srcu_struct srcu; + struct notifier_block *head; +}; + #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ spin_lock_init(&(name)->lock); \ (name)->head = NULL; \ @@ -64,6 +85,11 @@ (name)->head = NULL; \ } while (0) +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); +#define srcu_cleanup_notifier_head(name) \ + cleanup_srcu_struct(&(name)->srcu); + #define ATOMIC_NOTIFIER_INIT(name) { \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .head = NULL } @@ -72,6 +98,7 @@ .head = NULL } #define RAW_NOTIFIER_INIT(name) { \ .head = NULL } +/* srcu_notifier_heads cannot be initialized statically */ #define ATOMIC_NOTIFIER_HEAD(name) \ struct atomic_notifier_head name = \ @@ -91,6 +118,8 @@ struct notifier_block *); extern int raw_notifier_chain_register(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, struct notifier_block *); @@ -98,6 +127,8 @@ struct notifier_block *); extern int raw_notifier_chain_unregister(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_call_chain(struct atomic_notifier_head *, unsigned long val, void *v); @@ -105,6 +136,8 @@ unsigned long val, void *v); extern int raw_notifier_call_chain(struct raw_notifier_head *, unsigned long val, void *v); +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, + unsigned long val, void *v); #define NOTIFY_DONE 0x0000 /* Don't care */ #define NOTIFY_OK 0x0001 /* Suits me */ diff -urN oldtree/include/linux/rcuclassic.h newtree/include/linux/rcuclassic.h --- oldtree/include/linux/rcuclassic.h 2006-09-30 05:17:30.000000000 -0400 +++ newtree/include/linux/rcuclassic.h 2006-09-30 10:07:58.000000000 -0400 @@ -18,8 +18,8 @@ * Copyright (C) IBM Corporation, 2001 * * Author: Dipankar Sarma - * - * Based on the original work by Paul McKenney + * + * Based on the original work by Paul McKenney * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. * Papers: * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf @@ -49,6 +49,8 @@ long completed; /* Number of the last completed batch */ int next_pending; /* Is the next batch already waiting? */ + int signaled; + spinlock_t lock ____cacheline_internodealigned_in_smp; cpumask_t cpumask; /* CPUs that need to switch in order */ /* for current batch to proceed. */ @@ -88,9 +90,6 @@ struct rcu_head **donetail; long blimit; /* Upper limit on a processed batch */ int cpu; -#ifdef CONFIG_SMP - long last_rs_qlen; /* qlen during the last resched */ -#endif }; DECLARE_PER_CPU(struct rcu_data, rcu_data); @@ -116,25 +115,37 @@ extern int rcu_pending(int cpu); extern int rcu_needs_cpu(int cpu); +#ifdef CONFIG_DEBUG_SPINLOCK_SLEEP +extern void rcu_add_read_count(void); +extern void rcu_sub_read_count(void); +#else +static inline void rcu_add_read_count(void) {} +static inline void rcu_sub_read_count(void) {} +#endif + #define __rcu_read_lock() \ do { \ preempt_disable(); \ + rcu_add_read_count(); \ __acquire(RCU); \ } while(0) #define __rcu_read_unlock() \ do { \ __release(RCU); \ + rcu_sub_read_count(); \ preempt_enable(); \ } while(0) #define __rcu_read_lock_bh() \ do { \ local_bh_disable(); \ + rcu_add_read_count(); \ __acquire(RCU_BH); \ } while(0) #define __rcu_read_unlock_bh() \ do { \ __release(RCU_BH); \ + rcu_sub_read_count(); \ local_bh_enable(); \ } while(0) @@ -143,7 +154,6 @@ extern void __rcu_init(void); extern void rcu_check_callbacks(int cpu, int user); extern void rcu_restart_cpu(int cpu); -extern long rcu_batches_completed(void); #endif /* __KERNEL__ */ #endif /* __LINUX_RCUCLASSIC_H */ diff -urN oldtree/include/linux/rcupdate.h newtree/include/linux/rcupdate.h --- oldtree/include/linux/rcupdate.h 2006-09-30 05:17:30.000000000 -0400 +++ newtree/include/linux/rcupdate.h 2006-09-30 10:16:26.000000000 -0400 @@ -19,7 +19,7 @@ * * Author: Dipankar Sarma * - * Based on the original work by Paul McKenney + * Based on the original work by Paul McKenney * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. * Papers: * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf @@ -222,6 +222,8 @@ /* Exported common interfaces */ extern void synchronize_rcu(void); extern void rcu_barrier(void); +extern long rcu_batches_completed(void); +extern long rcu_batches_completed_bh(void); /* Internal to kernel */ extern void rcu_init(void); diff -urN oldtree/include/linux/rcupreempt.h newtree/include/linux/rcupreempt.h --- oldtree/include/linux/rcupreempt.h 2006-09-30 05:17:30.000000000 -0400 +++ newtree/include/linux/rcupreempt.h 2006-09-30 10:07:54.000000000 -0400 @@ -18,7 +18,7 @@ * Copyright (C) IBM Corporation, 2006 * * Author: Paul McKenney - * + * * Based on the original work by Paul McKenney * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. * Papers: @@ -60,7 +60,6 @@ extern void __rcu_init(void); extern void rcu_check_callbacks(int cpu, int user); extern void rcu_restart_cpu(int cpu); -extern long rcu_batches_completed(void); #endif /* __KERNEL__ */ #endif /* __LINUX_RCUPREEMPT_H */ diff -urN oldtree/include/linux/rcupreempt_trace.h newtree/include/linux/rcupreempt_trace.h --- oldtree/include/linux/rcupreempt_trace.h 2006-09-30 05:17:30.000000000 -0400 +++ newtree/include/linux/rcupreempt_trace.h 2006-09-30 10:07:54.000000000 -0400 @@ -18,8 +18,8 @@ * Copyright (C) IBM Corporation, 2006 * * Author: Paul McKenney - * - * Based on the original work by Paul McKenney + * + * Based on the original work by Paul McKenney * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. * Papers: * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf diff -urN oldtree/include/linux/srcu.h newtree/include/linux/srcu.h --- oldtree/include/linux/srcu.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/linux/srcu.h 2006-09-30 09:46:33.000000000 -0400 @@ -0,0 +1,53 @@ +/* + * Sleepable Read-Copy Update mechanism for mutual exclusion + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006 + * + * Author: Paul McKenney + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU/ *.txt + * + */ + +#ifndef _LINUX_SRCU_H +#define _LINUX_SRCU_H + +struct srcu_struct_array { + int c[2]; +}; + +struct srcu_struct { + int completed; + struct srcu_struct_array *per_cpu_ref; + struct mutex mutex; +}; + +#ifndef CONFIG_PREEMPT +#define srcu_barrier() barrier() +#else /* #ifndef CONFIG_PREEMPT */ +#define srcu_barrier() +#endif /* #else #ifndef CONFIG_PREEMPT */ + +int init_srcu_struct(struct srcu_struct *sp); +void cleanup_srcu_struct(struct srcu_struct *sp); +int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); +void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp); +void synchronize_srcu(struct srcu_struct *sp); +long srcu_batches_completed(struct srcu_struct *sp); + +#endif diff -urN oldtree/init/Kconfig.staircase newtree/init/Kconfig.staircase --- oldtree/init/Kconfig.staircase 2006-09-29 15:02:32.000000000 -0400 +++ newtree/init/Kconfig.staircase 2006-09-30 08:58:20.000000000 -0400 @@ -1,4 +1,5 @@ menu "Staircase Tunable Options" +depends on STAIRCASE choice prompt "Staircase Kernel Tunable Preset: " default STAIRCASE_DESKTOP diff -urN oldtree/kernel/Makefile newtree/kernel/Makefile --- oldtree/kernel/Makefile 2006-09-30 05:17:30.000000000 -0400 +++ newtree/kernel/Makefile 2006-09-30 09:45:22.000000000 -0400 @@ -11,7 +11,7 @@ signal.o sys.o kmod.o workqueue.o pid.o \ extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ - hrtimer.o rwsem.o + hrtimer.o rwsem.o srcu.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += time/ diff -urN oldtree/kernel/rcuclassic.c newtree/kernel/rcuclassic.c --- oldtree/kernel/rcuclassic.c 2006-09-30 05:17:30.000000000 -0400 +++ newtree/kernel/rcuclassic.c 2006-09-30 10:06:14.000000000 -0400 @@ -19,7 +19,7 @@ * * Authors: Dipankar Sarma * Manfred Spraul - * + * * Based on the original work by Paul McKenney * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen. * @@ -55,13 +55,13 @@ static struct rcu_ctrlblk rcu_ctrlblk = { .cur = -300, .completed = -300, - .lock = SPIN_LOCK_UNLOCKED, + .lock = __SPIN_LOCK_UNLOCKED(&rcu_ctrlblk.lock), .cpumask = CPU_MASK_NONE, }; static struct rcu_ctrlblk rcu_bh_ctrlblk = { .cur = -300, .completed = -300, - .lock = SPIN_LOCK_UNLOCKED, + .lock = __SPIN_LOCK_UNLOCKED(&rcu_bh_ctrlblk.lock), .cpumask = CPU_MASK_NONE, }; @@ -72,9 +72,6 @@ static int blimit = 10; static int qhimark = 10000; static int qlowmark = 100; -#ifdef CONFIG_SMP -static int rsinterval = 1000; -#endif #ifdef CONFIG_SMP static void force_quiescent_state(struct rcu_data *rdp, @@ -83,8 +80,8 @@ int cpu; cpumask_t cpumask; set_need_resched(); - if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) { - rdp->last_rs_qlen = rdp->qlen; + if (unlikely(!rcp->signaled)) { + rcp->signaled = 1; /* * Don't send IPI to itself. With irqs disabled, * rdp->cpu is the current cpu. @@ -103,7 +100,7 @@ } #endif -/* +/** * call_rcu - Queue an RCU callback for invocation after a grace period. * @head: structure to be used for queueing the RCU updates. * @func: actual update function to be invoked after the grace period @@ -133,7 +130,7 @@ local_irq_restore(flags); } -/* +/** * call_rcu_bh - Queue an RCU for invocation after a quicker grace period. * @head: structure to be used for queueing the RCU updates. * @func: actual update function to be invoked after the grace period @@ -202,12 +199,16 @@ next = rdp->donelist = list->next; list->func(list); list = next; - rdp->qlen--; if (++count >= rdp->blimit) break; } + + local_irq_disable(); + rdp->qlen -= count; + local_irq_enable(); if (rdp->blimit == INT_MAX && rdp->qlen <= qlowmark) rdp->blimit = blimit; + if (!rdp->donelist) rdp->donetail = &rdp->donelist; else @@ -258,6 +259,7 @@ smp_mb(); cpus_andnot(rcp->cpumask, cpu_online_map, nohz_cpu_mask); + rcp->signaled = 0; } } @@ -299,7 +301,7 @@ if (!rdp->qs_pending) return; - /* + /* * Was there a quiescent state since the beginning of the grace * period? If no, then exit and wait for the next call. */ @@ -373,7 +375,7 @@ #endif /* - * This does the RCU processing work from softirq context. + * This does the RCU processing work from softirq context. */ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp, struct rcu_data *rdp) @@ -475,8 +477,8 @@ void rcu_check_callbacks(int cpu, int user) { - if (user || - (idle_cpu(cpu) && !in_softirq() && + if (user || + (idle_cpu(cpu) && !in_softirq() && hardirq_count() <= (1 << HARDIRQ_SHIFT))) { rcu_qsctr_inc(cpu); rcu_bh_qsctr_inc(cpu); @@ -543,13 +545,42 @@ register_cpu_notifier(&rcu_nb); } +#ifdef CONFIG_DEBUG_SPINLOCK_SLEEP +static DEFINE_PER_CPU(int, rcu_read_count); +int rcu_read_in_atomic(void) +{ + int val; + int cpu = get_cpu(); + val = per_cpu(rcu_read_count, cpu); + put_cpu(); + return val; +} + +void rcu_add_read_count(void) +{ + int cpu, flags; + local_irq_save(flags); + cpu = smp_processor_id(); + per_cpu(rcu_read_count, cpu)++; + local_irq_restore(flags); +} + +void rcu_sub_read_count(void) +{ + int cpu, flags; + local_irq_save(flags); + cpu = smp_processor_id(); + per_cpu(rcu_read_count, cpu)--; + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(rcu_read_in_atomic); +EXPORT_SYMBOL_GPL(rcu_add_read_count); +EXPORT_SYMBOL_GPL(rcu_sub_read_count); +#endif + module_param(blimit, int, 0); module_param(qhimark, int, 0); module_param(qlowmark, int, 0); -#ifdef CONFIG_SMP -module_param(rsinterval, int, 0); -#endif - EXPORT_SYMBOL_GPL(rcu_batches_completed); EXPORT_SYMBOL_GPL(rcu_batches_completed_bh); EXPORT_SYMBOL_GPL(call_rcu); diff -urN oldtree/kernel/rcupreempt.c newtree/kernel/rcupreempt.c --- oldtree/kernel/rcupreempt.c 2006-09-30 05:17:30.000000000 -0400 +++ newtree/kernel/rcupreempt.c 2006-09-30 10:01:35.000000000 -0400 @@ -89,6 +89,15 @@ return rcu_ctrlblk.completed; } +/* + * Return the number of RCU batches processed thus far. Useful + * for debug and statistics. (This is fake for preempt RCU). + */ +long rcu_batches_completed_bh(void) +{ + return rcu_ctrlblk.completed; +} + void __rcu_read_lock(void) { int flipctr; @@ -222,7 +231,7 @@ RCU_TRACE(rcupreempt_trace_try_flip2, &rcu_data.trace); for_each_possible_cpu(cpu) { if (atomic_read(&per_cpu(rcu_flipctr, cpu)[!flipctr]) != 0) { - RCU_TRACE(rcupreempt_trace_try_flip_e3, + RCU_TRACE(rcupreempt_trace_try_flip_e3, &rcu_data.trace); spin_unlock_irqrestore(&rcu_ctrlblk.fliplock, oldirq); return; @@ -282,7 +291,7 @@ } } -void fastcall call_rcu(struct rcu_head *head, +void fastcall call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) { unsigned long flags; @@ -415,8 +424,16 @@ #endif /* #ifdef CONFIG_RCU_TRACE */ +#ifdef CONFIG_DEBUG_SPINLOCK_SLEEP +int rcu_read_in_atomic(void) +{ + return current->rcu_read_lock_nesting; +} +#endif + EXPORT_SYMBOL_GPL(call_rcu); EXPORT_SYMBOL_GPL(rcu_batches_completed); +EXPORT_SYMBOL_GPL(rcu_batches_completed_bh); EXPORT_SYMBOL_GPL(__synchronize_sched); EXPORT_SYMBOL_GPL(__rcu_read_lock); EXPORT_SYMBOL_GPL(__rcu_read_unlock); diff -urN oldtree/kernel/rcutorture.c newtree/kernel/rcutorture.c --- oldtree/kernel/rcutorture.c 2006-09-29 14:03:22.000000000 -0400 +++ newtree/kernel/rcutorture.c 2006-09-30 09:48:06.000000000 -0400 @@ -15,9 +15,10 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * - * Copyright (C) IBM Corporation, 2005 + * Copyright (C) IBM Corporation, 2005, 2006 * * Authors: Paul E. McKenney + * Josh Triplett * * See also: Documentation/RCU/torture.txt */ @@ -44,19 +45,25 @@ #include #include #include +#include MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Paul E. McKenney and " + "Josh Triplett "); -static int nreaders = -1; /* # reader threads, defaults to 4*ncpus */ +static int nreaders = -1; /* # reader threads, defaults to 2*ncpus */ +static int nfakewriters = 4; /* # fake writer threads */ static int stat_interval; /* Interval between stats, in seconds. */ /* Defaults to "only at end of test". */ static int verbose; /* Print more debug info. */ static int test_no_idle_hz; /* Test RCU's support for tickless idle CPUs. */ static int shuffle_interval = 5; /* Interval between shuffles (in sec)*/ -static char *torture_type = "rcu"; /* What to torture. */ +static char *torture_type = "rcu"; /* What RCU implementation to torture. */ module_param(nreaders, int, 0); MODULE_PARM_DESC(nreaders, "Number of RCU reader threads"); +module_param(nfakewriters, int, 0); +MODULE_PARM_DESC(nfakewriters, "Number of RCU fake writer threads"); module_param(stat_interval, int, 0); MODULE_PARM_DESC(stat_interval, "Number of seconds between stats printk()s"); module_param(verbose, bool, 0); @@ -66,7 +73,7 @@ module_param(shuffle_interval, int, 0); MODULE_PARM_DESC(shuffle_interval, "Number of seconds between shuffles"); module_param(torture_type, charp, 0); -MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh)"); +MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, rcu_bh, srcu)"); #define TORTURE_FLAG "-torture:" #define PRINTK_STRING(s) \ @@ -80,6 +87,7 @@ static int nrealreaders; static struct task_struct *writer_task; +static struct task_struct **fakewriter_tasks; static struct task_struct **reader_tasks; static struct task_struct *stats_task; static struct task_struct *shuffler_task; @@ -104,11 +112,12 @@ static DEFINE_PER_CPU(long [RCU_TORTURE_PIPE_LEN + 1], rcu_torture_batch) = { 0 }; static atomic_t rcu_torture_wcount[RCU_TORTURE_PIPE_LEN + 1]; -atomic_t n_rcu_torture_alloc; -atomic_t n_rcu_torture_alloc_fail; -atomic_t n_rcu_torture_free; -atomic_t n_rcu_torture_mberror; -atomic_t n_rcu_torture_error; +static atomic_t n_rcu_torture_alloc; +static atomic_t n_rcu_torture_alloc_fail; +static atomic_t n_rcu_torture_free; +static atomic_t n_rcu_torture_mberror; +static atomic_t n_rcu_torture_error; +static struct list_head rcu_torture_removed; /* * Allocate an element from the rcu_tortures pool. @@ -145,7 +154,7 @@ struct rcu_random_state { unsigned long rrs_state; - unsigned long rrs_count; + long rrs_count; }; #define RCU_RANDOM_MULT 39916801 /* prime */ @@ -158,7 +167,7 @@ * Crude but fast random-number generator. Uses a linear congruential * generator, with occasional help from get_random_bytes(). */ -static long +static unsigned long rcu_random(struct rcu_random_state *rrsp) { long refresh; @@ -180,9 +189,11 @@ void (*init)(void); void (*cleanup)(void); int (*readlock)(void); + void (*readdelay)(struct rcu_random_state *rrsp); void (*readunlock)(int idx); int (*completed)(void); void (*deferredfree)(struct rcu_torture *p); + void (*sync)(void); int (*stats)(char *page); char *name; }; @@ -198,6 +209,18 @@ return 0; } +static void rcu_read_delay(struct rcu_random_state *rrsp) +{ + long delay; + const long longdelay = 200; + + /* We want there to be long-running readers, but not all the time. */ + + delay = rcu_random(rrsp) % (nrealreaders * 2 * longdelay); + if (!delay) + udelay(longdelay); +} + static void rcu_torture_read_unlock(int idx) __releases(RCU) { rcu_read_unlock(); @@ -239,13 +262,54 @@ .init = NULL, .cleanup = NULL, .readlock = rcu_torture_read_lock, + .readdelay = rcu_read_delay, .readunlock = rcu_torture_read_unlock, .completed = rcu_torture_completed, .deferredfree = rcu_torture_deferred_free, + .sync = synchronize_rcu, .stats = NULL, .name = "rcu" }; +static void rcu_sync_torture_deferred_free(struct rcu_torture *p) +{ + int i; + struct rcu_torture *rp; + struct rcu_torture *rp1; + + cur_ops->sync(); + list_add(&p->rtort_free, &rcu_torture_removed); + list_for_each_entry_safe(rp, rp1, &rcu_torture_removed, rtort_free) { + i = rp->rtort_pipe_count; + if (i > RCU_TORTURE_PIPE_LEN) + i = RCU_TORTURE_PIPE_LEN; + atomic_inc(&rcu_torture_wcount[i]); + if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) { + rp->rtort_mbtest = 0; + list_del(&rp->rtort_free); + rcu_torture_free(rp); + } + } +} + +static void rcu_sync_torture_init(void) +{ + INIT_LIST_HEAD(&rcu_torture_removed); +} + +static struct rcu_torture_ops rcu_sync_ops = { + .init = rcu_sync_torture_init, + .cleanup = NULL, + .readlock = rcu_torture_read_lock, + .readdelay = rcu_read_delay, + .readunlock = rcu_torture_read_unlock, + .completed = rcu_torture_completed, + .deferredfree = rcu_sync_torture_deferred_free, + .sync = synchronize_rcu, + .stats = NULL, + .name = "rcu_sync" +}; + /* * Definitions for rcu_bh torture testing. */ @@ -271,19 +335,176 @@ call_rcu_bh(&p->rtort_rcu, rcu_torture_cb); } +struct rcu_bh_torture_synchronize { + struct rcu_head head; + struct completion completion; +}; + +static void rcu_bh_torture_wakeme_after_cb(struct rcu_head *head) +{ + struct rcu_bh_torture_synchronize *rcu; + + rcu = container_of(head, struct rcu_bh_torture_synchronize, head); + complete(&rcu->completion); +} + +static void rcu_bh_torture_synchronize(void) +{ + struct rcu_bh_torture_synchronize rcu; + + init_completion(&rcu.completion); + call_rcu_bh(&rcu.head, rcu_bh_torture_wakeme_after_cb); + wait_for_completion(&rcu.completion); +} + static struct rcu_torture_ops rcu_bh_ops = { .init = NULL, .cleanup = NULL, .readlock = rcu_bh_torture_read_lock, + .readdelay = rcu_read_delay, /* just reuse rcu's version. */ .readunlock = rcu_bh_torture_read_unlock, .completed = rcu_bh_torture_completed, .deferredfree = rcu_bh_torture_deferred_free, + .sync = rcu_bh_torture_synchronize, .stats = NULL, .name = "rcu_bh" }; +static struct rcu_torture_ops rcu_bh_sync_ops = { + .init = rcu_sync_torture_init, + .cleanup = NULL, + .readlock = rcu_bh_torture_read_lock, + .readdelay = rcu_read_delay, /* just reuse rcu's version. */ + .readunlock = rcu_bh_torture_read_unlock, + .completed = rcu_bh_torture_completed, + .deferredfree = rcu_sync_torture_deferred_free, + .sync = rcu_bh_torture_synchronize, + .stats = NULL, + .name = "rcu_bh_sync" +}; + +/* + * Definitions for srcu torture testing. + */ + +static struct srcu_struct srcu_ctl; + +static void srcu_torture_init(void) +{ + init_srcu_struct(&srcu_ctl); + rcu_sync_torture_init(); +} + +static void srcu_torture_cleanup(void) +{ + synchronize_srcu(&srcu_ctl); + cleanup_srcu_struct(&srcu_ctl); +} + +static int srcu_torture_read_lock(void) +{ + return srcu_read_lock(&srcu_ctl); +} + +static void srcu_read_delay(struct rcu_random_state *rrsp) +{ + long delay; + const long uspertick = 1000000 / HZ; + const long longdelay = 10; + + /* We want there to be long-running readers, but not all the time. */ + + delay = rcu_random(rrsp) % (nrealreaders * 2 * longdelay * uspertick); + if (!delay) + schedule_timeout_interruptible(longdelay); +} + +static void srcu_torture_read_unlock(int idx) +{ + srcu_read_unlock(&srcu_ctl, idx); +} + +static int srcu_torture_completed(void) +{ + return srcu_batches_completed(&srcu_ctl); +} + +static void srcu_torture_synchronize(void) +{ + synchronize_srcu(&srcu_ctl); +} + +static int srcu_torture_stats(char *page) +{ + int cnt = 0; + int cpu; + int idx = srcu_ctl.completed & 0x1; + + cnt += sprintf(&page[cnt], "%s%s per-CPU(idx=%d):", + torture_type, TORTURE_FLAG, idx); + for_each_possible_cpu(cpu) { + cnt += sprintf(&page[cnt], " %d(%d,%d)", cpu, + per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx], + per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]); + } + cnt += sprintf(&page[cnt], "\n"); + return cnt; +} + +static struct rcu_torture_ops srcu_ops = { + .init = srcu_torture_init, + .cleanup = srcu_torture_cleanup, + .readlock = srcu_torture_read_lock, + .readdelay = srcu_read_delay, + .readunlock = srcu_torture_read_unlock, + .completed = srcu_torture_completed, + .deferredfree = rcu_sync_torture_deferred_free, + .sync = srcu_torture_synchronize, + .stats = srcu_torture_stats, + .name = "srcu" +}; + +/* + * Definitions for sched torture testing. + */ + +static int sched_torture_read_lock(void) +{ + preempt_disable(); + return 0; +} + +static void sched_torture_read_unlock(int idx) +{ + preempt_enable(); +} + +static int sched_torture_completed(void) +{ + return 0; +} + +static void sched_torture_synchronize(void) +{ + synchronize_sched(); +} + +static struct rcu_torture_ops sched_ops = { + .init = rcu_sync_torture_init, + .cleanup = NULL, + .readlock = sched_torture_read_lock, + .readdelay = rcu_read_delay, /* just reuse rcu's version. */ + .readunlock = sched_torture_read_unlock, + .completed = sched_torture_completed, + .deferredfree = rcu_sync_torture_deferred_free, + .sync = sched_torture_synchronize, + .stats = NULL, + .name = "sched" +}; + static struct rcu_torture_ops *torture_ops[] = - { &rcu_ops, &rcu_bh_ops, NULL }; + { &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops, &srcu_ops, + &sched_ops, NULL }; /* * RCU torture writer kthread. Repeatedly substitutes a new structure @@ -330,6 +551,30 @@ } /* + * RCU torture fake writer kthread. Repeatedly calls sync, with a random + * delay between calls. + */ +static int +rcu_torture_fakewriter(void *arg) +{ + DEFINE_RCU_RANDOM(rand); + + VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task started"); + set_user_nice(current, 19); + + do { + schedule_timeout_uninterruptible(1 + rcu_random(&rand)%10); + udelay(rcu_random(&rand) & 0x3ff); + cur_ops->sync(); + } while (!kthread_should_stop() && !fullstop); + + VERBOSE_PRINTK_STRING("rcu_torture_fakewriter task stopping"); + while (!kthread_should_stop()) + schedule_timeout_uninterruptible(1); + return 0; +} + +/* * RCU torture reader kthread. Repeatedly dereferences rcu_torture_current, * incrementing the corresponding element of the pipeline array. The * counter in the element should never be greater than 1, otherwise, the @@ -359,7 +604,7 @@ } if (p->rtort_mbtest == 0) atomic_inc(&n_rcu_torture_mberror); - udelay(rcu_random(&rand) & 0x7f); + cur_ops->readdelay(&rand); preempt_disable(); pipe_count = p->rtort_pipe_count; if (pipe_count > RCU_TORTURE_PIPE_LEN) { @@ -483,7 +728,7 @@ /* Shuffle tasks such that we allow @rcu_idle_cpu to become idle. A special case * is when @rcu_idle_cpu = -1, when we allow the tasks to run on all CPUs. */ -void rcu_torture_shuffle_tasks(void) +static void rcu_torture_shuffle_tasks(void) { cpumask_t tmp_mask = CPU_MASK_ALL; int i; @@ -507,6 +752,12 @@ set_cpus_allowed(reader_tasks[i], tmp_mask); } + if (fakewriter_tasks != NULL) { + for (i = 0; i < nfakewriters; i++) + if (fakewriter_tasks[i]) + set_cpus_allowed(fakewriter_tasks[i], tmp_mask); + } + if (writer_task) set_cpus_allowed(writer_task, tmp_mask); @@ -540,11 +791,12 @@ static inline void rcu_torture_print_module_parms(char *tag) { - printk(KERN_ALERT "%s" TORTURE_FLAG "--- %s: nreaders=%d " + printk(KERN_ALERT "%s" TORTURE_FLAG + "--- %s: nreaders=%d nfakewriters=%d " "stat_interval=%d verbose=%d test_no_idle_hz=%d " "shuffle_interval = %d\n", - torture_type, tag, nrealreaders, stat_interval, verbose, - test_no_idle_hz, shuffle_interval); + torture_type, tag, nrealreaders, nfakewriters, + stat_interval, verbose, test_no_idle_hz, shuffle_interval); } static void @@ -579,6 +831,19 @@ } rcu_torture_current = NULL; + if (fakewriter_tasks != NULL) { + for (i = 0; i < nfakewriters; i++) { + if (fakewriter_tasks[i] != NULL) { + VERBOSE_PRINTK_STRING( + "Stopping rcu_torture_fakewriter task"); + kthread_stop(fakewriter_tasks[i]); + } + fakewriter_tasks[i] = NULL; + } + kfree(fakewriter_tasks); + fakewriter_tasks = NULL; + } + if (stats_task != NULL) { VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task"); kthread_stop(stats_task); @@ -666,7 +931,25 @@ writer_task = NULL; goto unwind; } - reader_tasks = kmalloc(nrealreaders * sizeof(reader_tasks[0]), + fakewriter_tasks = kzalloc(nfakewriters * sizeof(fakewriter_tasks[0]), + GFP_KERNEL); + if (fakewriter_tasks == NULL) { + VERBOSE_PRINTK_ERRSTRING("out of memory"); + firsterr = -ENOMEM; + goto unwind; + } + for (i = 0; i < nfakewriters; i++) { + VERBOSE_PRINTK_STRING("Creating rcu_torture_fakewriter task"); + fakewriter_tasks[i] = kthread_run(rcu_torture_fakewriter, NULL, + "rcu_torture_fakewriter"); + if (IS_ERR(fakewriter_tasks[i])) { + firsterr = PTR_ERR(fakewriter_tasks[i]); + VERBOSE_PRINTK_ERRSTRING("Failed to create fakewriter"); + fakewriter_tasks[i] = NULL; + goto unwind; + } + } + reader_tasks = kzalloc(nrealreaders * sizeof(reader_tasks[0]), GFP_KERNEL); if (reader_tasks == NULL) { VERBOSE_PRINTK_ERRSTRING("out of memory"); diff -urN oldtree/kernel/sched_ingosched.c newtree/kernel/sched_ingosched.c --- oldtree/kernel/sched_ingosched.c 2006-09-30 05:15:13.000000000 -0400 +++ newtree/kernel/sched_ingosched.c 2006-09-30 09:01:52.000000000 -0400 @@ -435,7 +435,7 @@ /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, diff -urN oldtree/kernel/srcu.c newtree/kernel/srcu.c --- oldtree/kernel/srcu.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/srcu.c 2006-09-30 09:46:33.000000000 -0400 @@ -0,0 +1,258 @@ +/* + * Sleepable Read-Copy Update mechanism for mutual exclusion. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006 + * + * Author: Paul McKenney + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU/ *.txt + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/** + * init_srcu_struct - initialize a sleep-RCU structure + * @sp: structure to initialize. + * + * Must invoke this on a given srcu_struct before passing that srcu_struct + * to any other function. Each srcu_struct represents a separate domain + * of SRCU protection. + */ +int init_srcu_struct(struct srcu_struct *sp) +{ + sp->completed = 0; + mutex_init(&sp->mutex); + sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array); + return (sp->per_cpu_ref ? 0 : -ENOMEM); +} + +/* + * srcu_readers_active_idx -- returns approximate number of readers + * active on the specified rank of per-CPU counters. + */ + +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx) +{ + int cpu; + int sum; + + sum = 0; + for_each_possible_cpu(cpu) + sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx]; + return sum; +} + +/** + * srcu_readers_active - returns approximate number of readers. + * @sp: which srcu_struct to count active readers (holding srcu_read_lock). + * + * Note that this is not an atomic primitive, and can therefore suffer + * severe errors when invoked on an active srcu_struct. That said, it + * can be useful as an error check at cleanup time. + */ +int srcu_readers_active(struct srcu_struct *sp) +{ + return srcu_readers_active_idx(sp, 0) + srcu_readers_active_idx(sp, 1); +} + +/** + * cleanup_srcu_struct - deconstruct a sleep-RCU structure + * @sp: structure to clean up. + * + * Must invoke this after you are finished using a given srcu_struct that + * was initialized via init_srcu_struct(), else you leak memory. + */ +void cleanup_srcu_struct(struct srcu_struct *sp) +{ + int sum; + + sum = srcu_readers_active(sp); + WARN_ON(sum); /* Leakage unless caller handles error. */ + if (sum != 0) + return; + free_percpu(sp->per_cpu_ref); + sp->per_cpu_ref = NULL; +} + +/** + * srcu_read_lock - register a new reader for an SRCU-protected structure. + * @sp: srcu_struct in which to register the new reader. + * + * Counts the new reader in the appropriate per-CPU element of the + * srcu_struct. Must be called from process context. + * Returns an index that must be passed to the matching srcu_read_unlock(). + */ +int srcu_read_lock(struct srcu_struct *sp) +{ + int idx; + + preempt_disable(); + idx = sp->completed & 0x1; + barrier(); /* ensure compiler looks -once- at sp->completed. */ + per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; + srcu_barrier(); /* ensure compiler won't misorder critical section. */ + preempt_enable(); + return idx; +} + +/** + * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. + * @sp: srcu_struct in which to unregister the old reader. + * @idx: return value from corresponding srcu_read_lock(). + * + * Removes the count for the old reader from the appropriate per-CPU + * element of the srcu_struct. Note that this may well be a different + * CPU than that which was incremented by the corresponding srcu_read_lock(). + * Must be called from process context. + */ +void srcu_read_unlock(struct srcu_struct *sp, int idx) +{ + preempt_disable(); + srcu_barrier(); /* ensure compiler won't misorder critical section. */ + per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--; + preempt_enable(); +} + +/** + * synchronize_srcu - wait for prior SRCU read-side critical-section completion + * @sp: srcu_struct with which to synchronize. + * + * Flip the completed counter, and wait for the old count to drain to zero. + * As with classic RCU, the updater must use some separate means of + * synchronizing concurrent updates. Can block; must be called from + * process context. + * + * Note that it is illegal to call synchornize_srcu() from the corresponding + * SRCU read-side critical section; doing so will result in deadlock. + * However, it is perfectly legal to call synchronize_srcu() on one + * srcu_struct from some other srcu_struct's read-side critical section. + */ +void synchronize_srcu(struct srcu_struct *sp) +{ + int idx; + + idx = sp->completed; + mutex_lock(&sp->mutex); + + /* + * Check to see if someone else did the work for us while we were + * waiting to acquire the lock. We need -two- advances of + * the counter, not just one. If there was but one, we might have + * shown up -after- our helper's first synchronize_sched(), thus + * having failed to prevent CPU-reordering races with concurrent + * srcu_read_unlock()s on other CPUs (see comment below). So we + * either (1) wait for two or (2) supply the second ourselves. + */ + + if ((sp->completed - idx) >= 2) { + mutex_unlock(&sp->mutex); + return; + } + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * The preceding synchronize_sched() ensures that any CPU that + * sees the new value of sp->completed will also see any preceding + * changes to data structures made by this CPU. This prevents + * some other CPU from reordering the accesses in its SRCU + * read-side critical section to precede the corresponding + * srcu_read_lock() -- ensuring that such references will in + * fact be protected. + * + * So it is now safe to do the flip. + */ + + idx = sp->completed & 0x1; + sp->completed++; + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * At this point, because of the preceding synchronize_sched(), + * all srcu_read_lock() calls using the old counters have completed. + * Their corresponding critical sections might well be still + * executing, but the srcu_read_lock() primitives themselves + * will have finished executing. + */ + + while (srcu_readers_active_idx(sp, idx)) + schedule_timeout_interruptible(1); + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * The preceding synchronize_sched() forces all srcu_read_unlock() + * primitives that were executing concurrently with the preceding + * for_each_possible_cpu() loop to have completed by this point. + * More importantly, it also forces the corresponding SRCU read-side + * critical sections to have also completed, and the corresponding + * references to SRCU-protected data items to be dropped. + * + * Note: + * + * Despite what you might think at first glance, the + * preceding synchronize_sched() -must- be within the + * critical section ended by the following mutex_unlock(). + * Otherwise, a task taking the early exit can race + * with a srcu_read_unlock(), which might have executed + * just before the preceding srcu_readers_active() check, + * and whose CPU might have reordered the srcu_read_unlock() + * with the preceding critical section. In this case, there + * is nothing preventing the synchronize_sched() task that is + * taking the early exit from freeing a data structure that + * is still being referenced (out of order) by the task + * doing the srcu_read_unlock(). + * + * Alternatively, the comparison with "2" on the early exit + * could be changed to "3", but this increases synchronize_srcu() + * latency for bulk loads. So the current code is preferred. + */ + + mutex_unlock(&sp->mutex); +} + +/** + * srcu_batches_completed - return batches completed. + * @sp: srcu_struct on which to report batch completion. + * + * Report the number of batches, correlated with, but not necessarily + * precisely the same as, the number of grace periods that have elapsed. + */ + +long srcu_batches_completed(struct srcu_struct *sp) +{ + return sp->completed; +} + +EXPORT_SYMBOL_GPL(init_srcu_struct); +EXPORT_SYMBOL_GPL(cleanup_srcu_struct); +EXPORT_SYMBOL_GPL(srcu_read_lock); +EXPORT_SYMBOL_GPL(srcu_read_unlock); +EXPORT_SYMBOL_GPL(synchronize_srcu); +EXPORT_SYMBOL_GPL(srcu_batches_completed); +EXPORT_SYMBOL_GPL(srcu_readers_active); diff -urN oldtree/kernel/sys.c newtree/kernel/sys.c --- oldtree/kernel/sys.c 2006-09-29 14:03:22.000000000 -0400 +++ newtree/kernel/sys.c 2006-09-30 09:46:35.000000000 -0400 @@ -152,7 +152,7 @@ /* * Atomic notifier chain routines. Registration and unregistration - * use a mutex, and call_chain is synchronized by RCU (no locks). + * use a spinlock, and call_chain is synchronized by RCU (no locks). */ /** @@ -400,6 +400,129 @@ EXPORT_SYMBOL_GPL(raw_notifier_call_chain); +/* + * SRCU notifier chain routines. Registration and unregistration + * use a mutex, and call_chain is synchronized by SRCU (no locks). + */ + +/** + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: New entry in notifier chain + * + * Adds a notifier to an SRCU notifier chain. + * Must be called in process context. + * + * Currently always returns zero. + */ + +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_register(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_register(&nh->head, n); + mutex_unlock(&nh->mutex); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); + +/** + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: Entry to remove from notifier chain + * + * Removes a notifier from an SRCU notifier chain. + * Must be called from process context. + * + * Returns zero on success or %-ENOENT on failure. + */ +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_unregister(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_unregister(&nh->head, n); + mutex_unlock(&nh->mutex); + synchronize_srcu(&nh->srcu); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); + +/** + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @val: Value passed unmodified to notifier function + * @v: Pointer passed unmodified to notifier function + * + * Calls each function in a notifier chain in turn. The functions + * run in a process context, so they are allowed to block. + * + * If the return value of the notifier can be and'ed + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain + * will return immediately, with the return value of + * the notifier function which halted execution. + * Otherwise the return value is the return value + * of the last notifier function called. + */ + +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, + unsigned long val, void *v) +{ + int ret; + int idx; + + idx = srcu_read_lock(&nh->srcu); + ret = notifier_call_chain(&nh->head, val, v); + srcu_read_unlock(&nh->srcu, idx); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); + +/** + * srcu_init_notifier_head - Initialize an SRCU notifier head + * @nh: Pointer to head of the srcu notifier chain + * + * Unlike other sorts of notifier heads, SRCU notifier heads require + * dynamic initialization. Be sure to call this routine before + * calling any of the other SRCU notifier routines for this head. + * + * If an SRCU notifier head is deallocated, it must first be cleaned + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's + * per-cpu data (used by the SRCU mechanism) will leak. + */ + +void srcu_init_notifier_head(struct srcu_notifier_head *nh) +{ + mutex_init(&nh->mutex); + if (init_srcu_struct(&nh->srcu) < 0) + BUG(); + nh->head = NULL; +} + +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); + /** * register_reboot_notifier - Register function to be called at reboot time * @nb: Info about notifier function to be called diff -urN oldtree/mm/filemap.c newtree/mm/filemap.c --- oldtree/mm/filemap.c 2006-09-30 05:15:13.000000000 -0400 +++ newtree/mm/filemap.c 2006-09-30 09:02:58.000000000 -0400 @@ -910,6 +910,9 @@ #ifdef CONFIG_STAIRCASE_CUSTOM int vm_tail_largefiles __read_mostly = CONFIG_VM_TAIL_LARGEFILES_SETTING; #endif +#ifdef CONFIG_INGOSCHED +int vm_tail_largefiles __read_mostly = 1; +#endif static inline int nr_mapped(void) { diff -urN oldtree/mm/page-writeback.c newtree/mm/page-writeback.c --- oldtree/mm/page-writeback.c 2006-09-29 15:02:32.000000000 -0400 +++ newtree/mm/page-writeback.c 2006-09-30 09:41:15.000000000 -0400 @@ -92,6 +92,9 @@ #ifdef CONFIG_STAIRCASE_CUSTOM int vm_dirty_ratio __read_mostly = CONFIG_VM_DIRTY_RATIO_SETTING; #endif +#ifdef CONFIG_INGOSCHED +int vm_dirty_ratio __read_mostly = 0; +#endif /* * The interval between `kupdate'-style writebacks, in jiffies diff -urN oldtree/mm/vmscan.c newtree/mm/vmscan.c --- oldtree/mm/vmscan.c 2006-09-29 15:29:54.000000000 -0400 +++ newtree/mm/vmscan.c 2006-09-30 09:38:43.000000000 -0400 @@ -143,6 +143,10 @@ int vm_mapped __read_mostly = CONFIG_VM_MAPPED_SETTING; int vm_hardmaplimit __read_mostly = CONFIG_VM_HARDMAPLIMIT_SETTING; #endif +#ifdef CONFIG_INGOSCHED +int vm_mapped __read_mostly = 66; +int vm_hardmaplimit __read_mostly = 1; +#endif long vm_total_pages __read_mostly; /* The total number of pages which the VM controls */ Files oldtree/scripts/kconfig/mconf and newtree/scripts/kconfig/mconf differ