diff -urN oldtree/Documentation/kernel-parameters.txt newtree/Documentation/kernel-parameters.txt --- oldtree/Documentation/kernel-parameters.txt 2006-01-02 22:21:10.000000000 -0500 +++ newtree/Documentation/kernel-parameters.txt 2006-02-13 14:51:53.783988848 -0500 @@ -71,6 +71,7 @@ SERIAL Serial support is enabled. SMP The kernel is an SMP kernel. SPARC Sparc architecture is enabled. + SUSPEND2 Suspend2 is enabled. SWSUSP Software suspend is enabled. TS Appropriate touchscreen support is enabled. USB USB support is enabled. @@ -966,6 +967,8 @@ noresume [SWSUSP] Disables resume and restores original swap space. + noresume2 [SUSPEND2] Disables resuming and restores original swap signature. + no-scroll [VGA] Disables scrollback. This is required for the Braillex ib80-piezo Braille reader made by F.H. Papenmeier (Germany). @@ -1215,6 +1218,11 @@ resume= [SWSUSP] Specify the partition device for software suspend + resume2= [SUSPEND2] Specify the storage device for Suspend2. + Format: :. + See Documentation/power/suspend2.txt for details of the + formats for available image writers. + rhash_entries= [KNL,NET] Set number of hash buckets for route cache diff -urN oldtree/Documentation/power/internals.txt newtree/Documentation/power/internals.txt --- oldtree/Documentation/power/internals.txt 1969-12-31 19:00:00.000000000 -0500 +++ newtree/Documentation/power/internals.txt 2006-02-13 14:51:53.797986720 -0500 @@ -0,0 +1,360 @@ + Software Suspend 2.2 Internal Documentation. + Version 1 + +1. Introduction. + + Software Suspend 2.2 is an addition to the Linux Kernel, designed to + allow the user to quickly shutdown and quickly boot a computer, without + needing to close documents or programs. It is equivalent to the + hibernate facility in some laptops. This implementation, however, + requires no special BIOS or hardware support. + + The code in these files is based upon the original implementation + prepared by Gabor Kuti and additional work by Pavel Machek and a + host of others. This code has been substantially reworked by Nigel + Cunningham, again with the help and testing of many others, not the + least of whom is Michael Frank, At its heart, however, the operation is + essentially the same as Gabor's version. + +2. Overview of operation. + + The basic sequence of operations is as follows: + + a. Quiesce all other activity. + b. Ensure enough memory and storage space are available, and attempt + to free memory/storage if necessary. + c. Allocate the required memory and storage space. + d. Write the image. + e. Power down. + + There are a number of complicating factors which mean that things are + not as simple as the above would imply, however... + + o The activity of each process must be stopped at a point where it will + not be holding locks necessary for saving the image, or unexpectedly + restart operations due to something like a timeout and thereby make + our image inconsistent. + + o It is desirous that we sync outstanding I/O to disk before calculating + image statistics. This reduces corruption if one should suspend but + then not resume, and also makes later parts of the operation safer (see + below). + + o We need to get as close as we can to an atomic copy of the data. + Inconsistencies in the image will result inconsistent memory contents at + resume time, and thus in instability of the system and/or file system + corruption. This would appear to imply a maximum image size of one half of + the amount of RAM, but we have a solution... (again, below). + + o In 2.6, we must play nicely with the other suspend-to-disk + implementations. + +3. Detailed description of internals. + + a. Quiescing activity. + + Safely quiescing the system is achieved using two methods. + + First, we note that the vast majority of processes don't need to run during + suspend. They can be 'frozen'. We therefore implement a refrigerator + routine, which processes enter and in which they remain until the cycle is + complete. In the vanilla kernel, processes enter the refrigerator via + try_to_freeze() invocations at appropriate places. A process cannot be + frozen in any old place. It must not be holding locks that will be needed + for writing the image or freezing other processes. For this reason, + userspace processes generally enter the refrigerator via the signal handling + code, and kernel threads at the place in their event loops where they drop + locks and yield to other processes or sleep. + + In this revision of Suspend2, Christoph Lameter's todo list concept is + utilised to do the freezing. This means that we replace direct invocation of + the refrigerator function with a notifier list implementation, allowing + other applications of the hooks. + + The second part of our method for quisescing the system involves freezing + the filesystems. We use the standard freeze_bdev and thaw_bdev functions to + ensure that all of the user's data is synced to disk before we begin to + write the image. + + Quiescing the system works most quickly and reliably when we add one more + element to the algorithm: separating the freezing of userspace processes + from the freezing of kernel space processes, and doing the filesystem freeze + in between. The filesystem freeze needs to be done while kernel threads such + as kjournald can still run.At the same time, though, everything will be less + racy and run more quickly if we stop userspace submitting more I/O work + while we're trying to quiesce. + + Quiescing the system is therefore done in three steps: + - Freeze userspace + - Freeze filesystems + - Freeze kernel threads + + If we need to free memory, we thaw kernel threads and filesystems, but not + userspace. We can then free caches without worrying about deadlocks due to + swap files being on frozen filesystems or such like. + + b. Ensure enough memory & storage are available. + + We have a number of constraints to meet to be able to successfully suspend + and resume. + + First, the image will be written in two parts, described below. One of these + parts needs to have an atomic copy made, which of course implies a maximum + size of one half of the amount of system memory. The other part ('pageset') + is not atomically copied, and can therefore be as large or small as desired. + + Second, we have constraints on the amount of storage available. In these + calculations, we may also consider any compression that will be done. The + cryptoapi plugin allows the user to configure an expected compression ratio. + + Third, the user can specify an arbitrary limit on the image size, in + megabytes. This limit is treated as a soft limit, so that we don't fail the + attempt to suspend if we cannot meet this constraint. + + c. Allocate the required memory and storage space. + + Having done the initial freeze, we determine whether the above constraints + are met, and seek to allocate the metadata for the image. If the constraints + are not met, or we fail to allocate the required space for the metadata, we + seek to free the amount of memory that we calculate is needed and try again. + We allow up to four iterations of this loop before aborting the cycle. If we + do fail, it should only be because of a bug in Suspend's calculations. + + These steps are merged together in the prepare_image function, found in + prepare_image.c. The functions are merged because of the cyclical nature + of the problem of calculating how much memory and storage is needed. Since + the data structures containing the information about the image must + themselves take memory and use storage, the amount of memory and storage + required changes as we prepare the image. Since the changes are not large, + only one or two iterations will be required to achieve a solution. + + d. Write the image. + + We previously mentioned the need to create an atomic copy of the data, and + the half-of-memory limitation that is implied in this. This limitation is + circumvented by dividing the memory to be saved into two parts, called + pagesets. + + Pageset2 contains the page cache - the pages on the active and inactive + lists. These pages are saved first and reloaded last. While saving these + pages, the swapwriter plugin carefully ensures that the work of writing + the pages doesn't make the image inconsistent. Pages added to the LRU + lists are immediately shot down, and careful accounting for available + memory aids debugging. No atomic copy of these pages needs to be made. + + Writing the image requires memory, of course, and at this point we have + also not yet suspended the drivers. To avoid the possibility of remaining + activity corrupting the image, we allocate a special memory pool. Calls + to __alloc_pages and __free_pages_ok are then diverted to use our memory + pool. Pages in the memory pool are saved as part of pageset1 regardless of + whether or not they are used. + + Once pageset2 has been saved, we suspend the drivers and save the CPU + context before making an atomic copy of pageset1, resuming the drivers + and saving the atomic copy. After saving the two pagesets, we just need to + save our metadata before powering down. + + Having saved pageset2 pages, we can safely overwrite their contents with + the atomic copy of pageset1. This is how we manage to overcome the half of + memory limitation. Pageset2 is normally far larger than pageset1, and + pageset1 is normally much smaller than half of the memory, with the result + that pageset2 pages can be safely overwritten with the atomic copy of + pageset1. This is where we need to be careful about syncing, however. + Pageset2 will probably contain filesystem meta data. If this is overwritten + with pageset1 and then a sync occurs, the filesystem will be corrupted - + at least until resume time and another sync of the restored data. Since + there is a possibility that the user might not resume or (may it never be!) + that suspend might oops, we do our utmost to avoid syncing filesystems after + copying pageset1. + + e. Power down. + + Powering down uses standard kernel routines. Prior to this, however, we + suspend drivers again, ensuring that write caches are flushed. + +4. The method of writing the image. + + Suspend2 contains an internal API which is designed to simplify the + implementation of new methods of transforming the image to be written and + writing the image itself. In early versions of Suspend2, compression support + was inlined in the image writing code, and the data structures and code for + managing swap were intertwined with the rest of the code. A number of people + had expressed interest in implementing image encryption, and alternative + methods of storing the image. This internal API makes that possible by + implementing 'plugins'. + + A plugin is a single file which encapsulates the functionality needed + to transform a pageset of data (encryption or compression, for example), + or to write the pageset to a device. The former type of plugin is called + a 'page-transformer', the later a 'writer'. + + Plugins are linked together in pipeline fashion. There may be zero or more + page transformers in a pipeline, and there is always exactly one writer. + The pipeline follows this pattern: + + --------------------------------- + | Suspend2 Core + --------------------------------- + | + | + --------------------------------- + | Page transformer 1 | + --------------------------------- + | + | + --------------------------------- + | Page transformer 2 | + --------------------------------- + | + | + --------------------------------- + | Writer | + --------------------------------- + + During the writing of an image, the core code feeds pages one at a time + to the first plugin. This plugin performs whatever transformations it + implements on the incoming data, completely consuming the incoming data and + feeding output in a similar manner to the next plugin. A plugin may buffer + its output. + + During reading, the pipeline works in the reverse direction. The core code + calls the first plugin with the address of a buffer which should be filled. + (Note that the buffer size is always PAGE_SIZE at this time). This plugin + will in turn request data from the next plugin and so on down until the + writer is made to read from the stored image. + + Part of definition of the structure of a plugin thus looks like this: + + /* Writing the image proper */ + int (*write_init) (int stream_number); + int (*write_chunk) (char *buffer_start); + int (*write_cleanup) (void); + + /* Reading the image proper */ + int (*read_init) (int stream_number); + int (*read_chunk) (char *buffer_start, int sync); + int (*read_cleanup) (void); + + It should be noted that the _cleanup routines may be called before the + full stream of data has been read or written. While writing the image, + the user may (depending upon settings) choose to abort suspending, and + if we are in the midst of writing the last portion of the image, a portion + of the second pageset may be reread. + + In addition to the above routines for writing the data, all plugins have a + number of other routines: + + TYPE indicates whether the plugin is a page transformer or a writer. + #define TRANSFORMER_PLUGIN 1 + #define WRITER_PLUGIN 2 + + NAME is the name of the plugin, used in generic messages. + + PLUGIN_LIST is used to link the plugin into the list of all plugins. + + MEMORY_NEEDED returns the number of pages of memory required by the plugin + to do its work. + + STORAGE_NEEDED returns the number of pages in the suspend header required + to store the plugin's configuration data. + + PRINT_DEBUG_INFO fills a buffer with information to be displayed about the + operation or settings of the plugin. + + SAVE_CONFIG_INFO returns a buffer of PAGE_SIZE or smaller (the size is the + return code), containing the plugin's configuration info. This information + will be written in the image header and restored at resume time. Since this + buffer is allocated after the atomic copy of the kernel is made, you don't + need to worry about the buffer being freed. + + LOAD_CONFIG_INFO gives the plugin a pointer to the the configuration info + which was saved during suspending. Once again, the plugin doesn't need to + worry about freeing the buffer. The kernel will be overwritten with the + original kernel, so no memory leak will occur. + + OPS contains the operations specific to transformers and writers. These are + described below. + + The complete definition of struct suspend_plugin_ops is: + + struct suspend_plugin_ops { + /* Functions common to transformers and writers */ + int type; + char *name; + struct list_head plugin_list; + unsigned long (*memory_needed) (void); + unsigned long (*storage_needed) (void); + int (*print_debug_info) (char *buffer, int size); + int (*save_config_info) (char *buffer); + void (*load_config_info) (char *buffer, int len); + + /* Writing the image proper */ + int (*write_init) (int stream_number); + int (*write_chunk) (char *buffer_start); + int (*write_cleanup) (void); + + /* Reading the image proper */ + int (*read_init) (int stream_number); + int (*read_chunk) (char *buffer_start, int sync); + int (*read_cleanup) (void); + + union { + struct suspend_transformer_ops transformer; + struct suspend_writer_ops writer; + } ops; + }; + + + The operations specific to transformers are few in number: + + struct suspend_transformer_ops { + int (*expected_compression) (void); + struct list_head transformer_list; + }; + + Expected compression returns the expected ratio between the amount of + data sent to this plugin and the amount of data it passes to the next + plugin. The value is used by the core code to calculate the amount of + space required to write the image. If the ratio is not achieved, the + writer will complain when it runs out of space with data still to + write, and the core code will abort the suspend. + + transformer_list links together page transformers, in the order in + which they register, which is in turn determined by order in the + Makefile. + + There are many more operations specific to a writer: + + struct suspend_writer_ops { + + long (*storage_available) (void); + + unsigned long (*storage_allocated) (void); + + int (*release_storage) (void); + + long (*allocate_header_space) (unsigned long space_requested); + int (*allocate_storage) (unsigned long space_requested); + + int (*write_header_init) (void); + int (*write_header_chunk) (char *buffer_start, int buffer_size); + int (*write_header_cleanup) (void); + + int (*read_header_init) (void); + int (*read_header_chunk) (char *buffer_start, int buffer_size); + int (*read_header_cleanup) (void); + + int (*prepare_save) (void); + int (*post_load) (void); + + int (*parse_image_location) (char *buffer); + + int (*image_exists) (void); + + int (*invalidate_image) (void); + + int (*wait_on_io) (int flush_all); + + struct list_head writer_list; + }; + diff -urN oldtree/Documentation/power/kernel_threads.txt newtree/Documentation/power/kernel_threads.txt --- oldtree/Documentation/power/kernel_threads.txt 2006-01-02 22:21:10.000000000 -0500 +++ newtree/Documentation/power/kernel_threads.txt 2006-02-13 14:51:53.818983528 -0500 @@ -4,15 +4,15 @@ Freezer Upon entering a suspended state the system will freeze all -tasks. This is done by delivering pseudosignals. This affects -kernel threads, too. To successfully freeze a kernel thread -the thread has to check for the pseudosignal and enter the -refrigerator. Code to do this looks like this: +tasks. This is done by making all processes execute a notifier. +This affects kernel threads, too. To successfully freeze a kernel thread +the thread has to check for the notifications and call the notifier +chain for the process. Code to do this looks like this: do { hub_events(); wait_event_interruptible(khubd_wait, !list_empty(&hub_event_list)); - try_to_freeze(); + try_todo_list(); } while (!signal_pending(current)); from drivers/usb/core/hub.c::hub_thread() diff -urN oldtree/Documentation/power/suspend2.txt newtree/Documentation/power/suspend2.txt --- oldtree/Documentation/power/suspend2.txt 1969-12-31 19:00:00.000000000 -0500 +++ newtree/Documentation/power/suspend2.txt 2006-02-13 14:51:53.819983376 -0500 @@ -0,0 +1,631 @@ + --- Suspend2, version 2.1.9 --- + +1. What is it? +2. Why would you want it? +3. What do you need to use it? +4. How do you use it? +5. What do all those entries in /proc/suspend2 do? +6. How do you get support? +7. I think I've found a bug. What should I do? +8. When will XXX be supported? +9. How does it work? +10. Who wrote Suspend2? + +1. What is it? + + Imagine you're sitting at your computer, working away. For some reason, you + need to turn off your computer for a while - perhaps it's time to go home + for the day. When you come back to your computer next, you're going to want + to carry on where you left off. Now imagine that you could push a button and + have your computer store the contents of its memory to disk and power down. + Then, when you next start up your computer, it loads that image back into + memory and you can carry on from where you were, just as if you'd never + turned the computer off. Far less time to start up, no reopening + applications and finding what directory you put that file in yesterday. + That's what Suspend2 does. + +2. Why would you want it? + + Why wouldn't you want it? + + Being able to save the state of your system and quickly restore it improves + your productivity - you get a useful system in far less time than through + the normal boot process. + +3. What do you need to use it? + + a. Kernel Support. + + i) The Suspend2 patch. + + Suspend2 is part of the Linux Kernel. This version is not part of Linus's + 2.6 tree at the moment, so you will need to download the kernel source and + apply the latest patch. Having done that, enable the appropriate options in + make [menu|x]config (under General Setup), compile and install your kernel. + Suspend2 works with SMP, Highmem, preemption, x86-32, PPC and mac. + x86-64 support is coming. + + Suspend2 patches are available from http://suspend2.net. + + ii) Compression and encryption support. + + As of 2.1.9.2, compression and encryption support are implemented via the + cryptoapi. You will therefore want to select any Cryptoapi transforms that + you want to use on your image from the Cryptoapi menu while configuring + your kernel. + + You can also tell Suspend to write it's image to an encrypted and/or + compressed filesystem/swap partition. In that case, you don't need to do + anything special for Suspend2 when it comes to kernel configuration. + + iii) Configuring other options. + + While you're configuring your kernel, try to configure as much as possible + to build as modules. We recommend this because there are a number of drivers + that are still in the process of implementing proper power management + support. In those cases, the best way to work around their current lack is + to build them as modules and remove the modules while suspending. You might + also bug the driver authors to get their support up to speed, or even help! + + b. Storage. + + i) Swap. + + Suspend2 can store the suspend image in your swap partition, a swap file or + a combination thereof. Whichever combination you choose, you will probably + want to create enough swap space to store the largest image you could have, + plus the space you'd normally use for swap. A good rule of thumb would be + to calculate the amount of swap you'd want without using Suspend2, and then + add the amount of memory you have. This swapspace can be arranged in any way + you'd like. It can be in one partition or file, or spread over a number. The + only requirement is that they be active when you start a suspend cycle. + + There is one exception to this requirement. Suspend2 has the ability to turn + on one swap file or partition at the start of suspending and turn it back off + at the end. If you want to ensure you have enough memory to store a image + when your memory is fully used, you might want to make one swap partition or + file for 'normal' use, and another for Suspend2 to activate & deactivate + automatically. (Further details below). + + ii) Normal files. + + As of 2.1.8.5, Suspend2 includes a 'filewriter'. The filewriter can store + your image in a simple file. Since Linux has the idea of everything being + a file, this is more powerful than it initially sounds. If, for example, + you were to set up a network block device file, you could suspend to a + network server. This has been tested and works to a point, but nbd itself + isn't stateless enough for our purposes. + + Take extra care when setting up the filewriter. If you just type commands + without thinking and then try to suspend, you could cause irreversible + corruption on your filesystems! Make sure you have backups. Also, because + the filewriter is comparatively new, it's not as well tested as the + swapwriter. Be aware that there may be bugs that could cause damage to your + data even if you are careful! You have been warned! + + Most people will only want to suspend to a local file. To achieve that, do + something along the lines of: + + echo Suspend2 > /suspend-file + dd if=/dev/zero bs=1M count=512 >> suspend-file + + This will create a 512MB file called /suspend-file. To get Suspend2 to use + it: + + echo /suspend-file > /proc/suspend2/filewriter_target + + Then + + cat /proc/suspend2/resume2 + + Put the results of this into your bootloader's configuration (see also step + C, below: + + ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- + # cat /proc/suspend2/resume2 + file:/dev/hda2:0x1e001 + + In this example, we would edit the append= line of our lilo.conf|menu.lst + so that it included: + + resume2=file:/dev/hda2:0x1e001 + ---EXAMPLE-ONLY-DON'T-COPY-AND-PASTE--- + + For those who are thinking 'Could I make the file sparse?', the answer is + 'No!'. At the moment, there is no way for Suspend2 to fill in the holes in + a sparse file while suspending. In the longer term (post merge!), I'd like + to change things so that the file could be dynamically resized as needed. + Right now, however, that's not possible. + + c. Bootloader configuration. + + Using Suspend2 also requires that you add an extra parameter to + your lilo.conf or equivalent. Here's an example for a swap partition: + + append="resume2=swap:/dev/hda1" + + This would tell Suspend2 that /dev/hda1 is a swap partition you + have. Suspend2 will use the swap signature of this partition as a + pointer to your data when you suspend. This means that (in this example) + /dev/hda1 doesn't need to be _the_ swap partition where all of your data + is actually stored. It just needs to be a swap partition that has a + valid signature. + + You don't need to have a swap partition for this purpose. Suspend2 + can also use a swap file, but usage is a little more complex. Having made + your swap file, turn it on and do + + cat /proc/suspend2/headerlocations + + (this assumes you've already compiled your kernel with Suspend2 + support and booted it). The results of the cat command will tell you + what you need to put in lilo.conf: + + For swap partitions like /dev/hda1, simply use resume2=/dev/hda1. + For swapfile `swapfile`, use resume2=swap:/dev/hda2:0x242d@4096. + + If the swapfile changes for any reason (it is moved to a different + location, it is deleted and recreated, or the filesystem is + defragmented) then you will have to check + /proc/suspend2/headerlocations for a new resume_block value. + + Once you've compiled and installed the kernel, adjusted your lilo.conf + and rerun lilo, you should only need to reboot for the most basic part + of Suspend2 to be ready. + + If you only compile in the swapwriter, or only compile in the filewriter, + you don't need to add the "swap:" part of the resume2= parameters above. + resume2=/dev/hda2:0x242d@4096 will work just as well. + + d. The hibernate script. + + Since the driver model in 2.6 kernels is still being developed, you may need + to do more, however. Users of Suspend2 usually start the process via a script + which prepares for the suspend, tells the kernel to do its stuff and then + restore things afterwards. This script might involve: + + - Switching to a text console and back if X doesn't like the video card + status on resume. + - Un/reloading PCMCIA support since it doesn't play well with suspend. + + Note that you might not be able to unload some drivers if there are + processes using them. You might have to kill off processes that hold + devices open. Hint: if your X server accesses an USB mouse, doing a + 'chvt' to a text console releases the device and you can unload the + module. + + Check out the latest script (available on suspend2.net). + +4. How do you use it? + + Once your script is properly set up, you should just be able to start it + and everything should go like clockwork. Of course things aren't always + that easy out of the box. + + Check out (in the kernel source tree) include/linux/suspend2.h for + settings you can use to get detailed information about what suspend is doing. + The kernel parameters suspend_act, suspend_dbg and suspend_lvl allow you to + set the action and debugging parameters prior to starting a suspend and/or + at the lilo prompt before resuming. There is also a nice little program that + should be available from suspend2.net which makes it easier to turn these + debugging settings on and off. Note that to get any debugging output, you + need to enable CONFIG_PM_DEBUG when compiling the kernel. + + A neat feature of Suspend2 is that you can press Escape at any time + during suspending, and the process will be aborted. + + Due to the way suspend works, this means you'll have your system back and + perfectly usable almost instantly. The only exception is when it's at + the very end of writing the image. Then it will need to reload a small + (usually 4-50MBs, depending upon the image characteristics) portion first. + + If you run into problems with resuming, adding the "noresume2" option to + the kernel command line will let you skip the resume step and recover your + system. + +5. What do all those entries in /proc/suspend2 do? + + /proc/suspend2 is the directory which contains files you can use to + tune and configure Suspend2 to your liking. The exact contents of + the directory will depend upon the version of Suspend2 you're + running and the options you selected at compile time. In the following + descriptions, names in brackets refer to compile time options. + (Note that they're all dependant upon you having selected CONFIG_SUSPEND2 + in the first place!) + + Since the values of these settings can open potential security risks, they + are usually accessible only to the root user. You can, however, enable a + compile time option which makes all of these files world-accessible. This + should only be done if you trust everyone with shell access to this + computer! + + - all_settings: + + This file provides a convenient way to save and restore all of the other + settings in one hit. The contents include binary data, so you'll want to + redirect the output to a file: + + cat /proc/suspend2/all_settings > /etc/hibernate/all_settings.conf + + cat /etc/hibernate/all_settings.conf > /proc/suspend2/all_settings + + - debug_info: + + This file returns information about your configuration that may be helpful + in diagnosing problems with suspending. + + - debug_sections (CONFIG_PM_DEBUG): + + This value, together with the console log level, controls what debugging + information is displayed. The console log level determines the level of + detail, and this value determines what detail is displayed. This value is + a bit vector, and the meaning of the bits can be found in the kernel tree + in include/linux/suspend2.h. It can be overridden using the kernel's + command line option suspend_dbg. + + - default_console_level (CONFIG_PM_DEBUG): + + This determines the value of the console log level at the start of a + suspend cycle. If debugging is compiled in, the console log level can be + changed during a cycle by pressing the digit keys. Meanings are: + + 0: Nice display. + 1: Nice display plus numerical progress. + 2: Errors only. + 3: Low level debugging info. + 4: Medium level debugging info. + 5: High level debugging info. + 6: Verbose debugging info. + + This value can be overridden using the kernel command line option + suspend_lvl. + + - disable_* + + This option can be used to temporarily disable various parts of suspend. + Note that these flags can be set by restoring all_settings: If the saved + settings don't include any information about how a part of suspend should + be configured, that section will be disabled. + + - do_resume: + + When anything is written to this file suspend will attempt to read and + restore an image. If there is no image, it will return almost immediately. + If an image exists, the echo > will never return. Instead, the original + kernel context will be restored and the original echo > do_suspend will + return. + + - do_suspend: + + When anything is written to this file, the kernel side of Suspend2 will + begin to attempt to write an image to disk and power down. You'll normally + want to run the hibernate script instead, to get modules unloaded first. + + - enable_escape: + + Setting this to "1" will enable you abort a suspend by + pressing escape, "0" (default) disables this feature. Note that enabling + this option means that you cannot initiate a suspend and then walk away + from your computer, expecting it to be secure. With feature disabled, + you can validly have this expectation once Suspend begins to write the + image to disk. (Prior to this point, it is possible that Suspend might + about because of failure to freeze all processes or because constraints + on its ability to save the image are not met). + + - expected_compression: + + These values allow you to set an expected compression ratio, which Software + Suspend will use in calculating whether it meets constraints on the image + size. If this expected compression ratio is not attained, the suspend will + abort, so it is wise to allow some spare. You can see what compression + ratio is achieved in the logs after suspending. + + - filewriter_target: + + Read this value to get the current setting. Write to it to point Suspend + at a new storage location for the filewriter. See above for details of how + to set up the filewriter. + + - headerlocations: + + This option tells you the resume2= options to use for swap devices you + currently have activated. It is particularly useful when you only want to + use a swap file to store your image. See above for further details. + + - image_exists: + + Can be used in a script to determine whether a valid image exists at the + location currently pointed to by resume2=. Echoing anything to this entry + removes any current image. + + - image_size_limit: + + The maximum size of suspend image written to disk, measured in megabytes + (1024*1024). + + - interface_version: + + The value returned by this file can be used by scripts and configuration + tools to determine what entries should be looked for. The value is + incremented whenever an entry in /proc/suspend2 is obsoleted or + added. + + - last_result: + + The result of the last suspend, as defined in + include/linux/suspend-debug.h with the values SUSPEND_ABORTED to + SUSPEND_KEPT_IMAGE. This is a bitmask. + + - log_everything (CONFIG_PM_DEBUG): + + Setting this option results in all messages printed being logged. Normally, + only a subset are logged, so as to not slow the process and not clutter the + logs. Useful for debugging. It can be toggled during a cycle by pressing + 'L'. + + - pause_between_steps (CONFIG_PM_DEBUG): + + This option is used during debugging, to make Suspend2 pause between + each step of the process. It is ignored when the nice display is on. + + - powerdown_method: + + Used to select a method by which Suspend2 should powerdown after writing the + image. Currently: + + 3: Attempt to enter Suspend-to-ram. + 4: Attempt to enter ACPI S4 mode. + 5: Normal power down. + + Note that these options are highly dependant upon your hardware & software. + + - progressbar_granularity_limit: + + This option can be used to limit the granularity of the progress bar + displayed with a bootsplash screen. The value is the maximum number of + steps. That is, 10 will make the progress bar jump in 10% increments. + + - reboot: + + This option causes Suspend2 to reboot rather than powering down + at the end of saving an image. It can be toggled during a cycle by pressing + 'R'. + + - resume_commandline: + + This entry can be read after resuming to see the commandline that was used + when resuming began. You might use this to set up two bootloader entries + that are the same apart from the fact that one includes a extra append= + argument "at_work=1". You could then grep resume_commandline in your + post-resume scripts and configure networking (for example) differently + depending upon whether you're at home or work. resume_commandline can be + set to arbitrary text if you wish to remove sensitive contents. + + - swapfile: + + This entry is used to specify the swapfile or partition that + Suspend2 will attempt to swapon/swapoff automatically. Thus, if + I normally use /dev/hda1 for swap, and want to use /dev/hda2 for specifically + for my suspend image, I would + + echo /dev/hda2 > /proc/suspend2/swapfile + + /dev/hda2 would then be automatically swapon'd and swapoff'd. Note that the + swapon and swapoff occur while other processes are frozen (including kswapd) + so this swap file will not be used up when attempting to free memory. The + parition/file is also given the highest priority, so other swapfiles/partitions + will only be used to save the image when this one is filled. + + The value of this file is used by headerlocations along with any currently + activated swapfiles/partitions. + + - toggle_process_nofreeze + + This entry can be used to toggle the NOFREEZE flag on a process, to allow it + to run during Suspending. It should be used with extreme caution. There are + strict limitations on what a process running during suspend can do. This is + really only intended for use by Suspend's helpers (userui in particular). + + - userui_program + + This entry is used to tell Suspend what userspace program to use for + providing a user interface while suspending. The program uses a netlink + socket to pass messages back and forward to the kernel, allowing all of the + functions formerly implemented in the kernel user interface components. + + - version: + + The version of suspend you have compiled into the currently running kernel. + +6. How do you get support? + + Glad you asked. Suspend2 is being actively maintained and supported + by Nigel (the guy doing most of the kernel coding at the moment), Bernard + (who maintains the hibernate script and userspace user interface components) + and its users. + + Resources availble include HowTos, FAQs and a Wiki, all available via + suspend2.net. You can find the mailing lists there. + +7. I think I've found a bug. What should I do? + + By far and a way, the most common problems people have with suspend2 + related to drivers not having adequate power management support. In this + case, it is not a bug with suspend2, but we can still help you. As we + mentioned above, such issues can usually be worked around by building the + functionality as modules and unloading them while suspending. Please visit + the Wiki for up-to-date lists of known issues and work arounds. + + If this information doesn't help, try running: + + hibernate --bug-report + + ..and sending the output to the users mailing list. + + Good information on how to provide us with useful information from an + oops is found in the file REPORTING-BUGS, in the top level directory + of the kernel tree. If you get an oops, please especially note the + information about running what is printed on the screen through ksymoops. + The raw information is useless. + +8. When will XXX be supported? + + Suspend2 currently lacks support for x86-64. It is work in progress, but + hasn't been made a great priority because debugging is difficult (Nigel + doesn't have access to the hardware). 64GB Highmem and discontig-mem are + also not supported at the moment. + + Patches for the other items (and anything that's been missed) are welcome. + Please send to the list. + +9. How does it work? + + Suspend2 does its work in a number of steps. + + a. Freezing system activity. + + The first main stage in suspending is to stop all other activity. This is + achieved in stages. Processes are considered in fours groups, which we will + describe in reverse order for clarity's sake: Threads with the PF_NOFREEZE + flag, kernel threads without this flag, userspace processes with the + PF_SYNCTHREAD flag and all other processes. The first set (PF_NOFREEZE) are + untouched by the refrigerator code. They are allowed to run during suspending + and resuming, and are used to support user interaction, storage access or the + like. Other kernel threads (those unneeded while suspending) are frozen last. + This leaves us with userspace processes that need to be frozen. When a + process enters one of the *_sync system calls, we set a PF_SYNCTHREAD flag on + that process for the duration of that call. Processes that have this flag are + frozen after processes without it, so that we can seek to ensure that dirty + data is synced to disk as quickly as possible in a situation where other + processes may be submitting writes at the same time. Freezing the processes + that are submitting data stops new I/O from being submitted. Syncthreads can + then cleanly finish their work. So the order is: + + - Userspace processes without PF_SYNCTHREAD or PF_NOFREEZE; + - Userspace processes with PF_SYNCTHREAD (they won't have NOFREEZE); + - Kernel processes without PF_NOFREEZE. + + b. Eating memory. + + For a successful suspend, you need to have enough disk space to store the + image and enough memory for the various limitations of Suspend2's + algorithm. You can also specify a maximum image size. In order to attain + to those constraints, Suspend2 may 'eat' memory. If, after freezing + processes, the constraints aren't met, Suspend2 will thaw all the + other processes and begin to eat memory until its calculations indicate + the constraints are met. It will then freeze processes again and recheck + its calculations. + + c. Allocation of storage. + + Next, Suspend2 allocates the storage that will be used to save + the image. + + The core of Suspend2 knows nothing about how or where pages are stored. We + therefore request the active writer (remember you might have compiled in + more than one!) to allocate enough storage for our expect image size. If + this request cannot be fulfilled, we eat more memory and try again. If it + is fulfiled, we seek to allocate additional storage, just in case our + expected compression ratio (if any) isn't achieved. This time, however, we + just continue if we can't allocate enough storage. + + If these calls to our writer change the characteristics of the image such + that we haven't allocated enough memory, we also loop. (The writer may well + need to allocate space for its storage information). + + d. Write the first part of the image. + + Suspend2 stores the image in two sets of pages called 'pagesets'. + Pageset 2 contains pages on the active and inactive lists; essentially + the page cache. Pageset 1 contains all other pages, including the kernel. + We use two pagesets for one important reason: We need to make an atomic copy + of the kernel to ensure consistency of the image. Without a second pageset, + that would limit us to an image that was at most half the amount of memory + available. Using two pagesets allows us to store a full image. Since pageset + 2 pages won't be needed in saving pageset 1, we first save pageset 2 pages. + We can then make our atomic copy of the remaining pages using both pageset 2 + pages and any other pages that are free. While saving both pagesets, we are + careful not to corrupt the image. Among other things, we use lowlevel block + I/O routines that don't change the pagecache contents. + + The next step, then, is writing pageset 2. + + e. Suspending drivers and storing processor context. + + Having written pageset2, Suspend2 calls the power management functions to + notify drivers of the suspend, and saves the processor state in preparation + for the atomic copy of memory we are about to make. + + f. Atomic copy. + + At this stage, everything else but the Suspend2 code is halted. Processes + are frozen or idling, drivers are quiesced and have stored (ideally and where + necessary) their configuration in memory we are about to atomically copy. + In our lowlevel architecture specific code, we have saved the CPU state. + We can therefore now do our atomic copy before resuming drivers etc. + + g. Save the atomic copy (pageset 1). + + Suspend can then write the atomic copy of the remaining pages. Since we + have copied the pages into other locations, we can continue to use the + normal block I/O routines without fear of corruption our image. + + f. Save the suspend header. + + Nearly there! We save our settings and other parameters needed for + reloading pageset 1 in a 'suspend header'. We also tell our writer to + serialise its data at this stage, so that it can reread the image at resume + time. Note that the writer can write this data in any format - in the case + of the swapwriter, for example, it splits header pages in 4092 byte blocks, + using the last four bytes to link pages of data together. This is completely + transparent to the core. + + g. Set the image header. + + Finally, we edit the header at our resume2= location. The signature is + changed by the writer to reflect the fact that an image exists, and to point + to the start of that data if necessary (swapwriter). + + h. Power down. + + Or reboot if we're debugging and the appropriate option is selected. + + Whew! + + Reloading the image. + -------------------- + + Reloading the image is essentially the reverse of all the above. We load + our copy of pageset 1, being careful to choose locations that aren't going + to be overwritten as we copy it back (We start very early in the boot + process, so there are no other processes to quiesce here). We then copy + pageset 1 back to its original location in memory and restore the process + context. We are now running with the original kernel. Next, we reload the + pageset 2 pages, free the memory and swap used by Suspend2, restore + the pageset header and restart processes. Sounds easy in comparison to + suspending, doesn't it! + + There is of course more to Suspend2 than this, but this explanation + should be a good start. If there's interest, I'll write further + documentation on range pages and the low level I/O. + +10. Who wrote Suspend2? + + (Answer based on the writings of Florent Chabaud, credits in files and + Nigel's limited knowledge; apologies to anyone missed out!) + + The main developers of Suspend2 have been... + + Gabor Kuti + Pavel Machek + Florent Chabaud + Bernard Blackham + Nigel Cunningham + + They have been aided in their efforts by a host of hundreds, if not thousands + of testers and people who have submitted bug fixes & suggestions. Of special + note are the efforts of Michael Frank, who had his computers repetitively + suspend and resume for literally tens of thousands of cycles and developed + scripts to stress the system and test Suspend2 far beyond the point + most of us (Nigel included!) would consider testing. His efforts have + contributed as much to Suspend2 as any of the names above. diff -urN oldtree/Documentation/power/swsusp.txt newtree/Documentation/power/swsusp.txt --- oldtree/Documentation/power/swsusp.txt 2006-01-02 22:21:10.000000000 -0500 +++ newtree/Documentation/power/swsusp.txt 2006-02-13 14:51:53.846979272 -0500 @@ -130,7 +130,8 @@ website, and not to the Linux Kernel Mailing List. We are working toward merging suspend2 into the mainline kernel. -Q: A kernel thread must voluntarily freeze itself (call 'refrigerator'). +Q: A kernel thread must work on the todo list (call 'run_todo_list') +to enter the refrigerator. I found some kernel threads that don't do it, and they don't freeze so the system can't sleep. Is this a known behavior? @@ -139,7 +140,7 @@ should be held at that point and it must be safe to sleep there), and add: - try_to_freeze(); + try_todo_list(); If the thread is needed for writing the image to storage, you should instead set the PF_NOFREEZE process flag when creating the thread (and diff -urN oldtree/arch/arm/kernel/signal.c newtree/arch/arm/kernel/signal.c --- oldtree/arch/arm/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/arm/kernel/signal.c 2006-02-13 14:51:53.852978360 -0500 @@ -637,7 +637,7 @@ if (!user_mode(regs)) return 0; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (current->ptrace & PT_SINGLESTEP) diff -urN oldtree/arch/arm/mm/init.c newtree/arch/arm/mm/init.c --- oldtree/arch/arm/mm/init.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/arm/mm/init.c 2006-02-13 14:51:53.862976840 -0500 @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -86,6 +87,11 @@ printk("%d pages swap cached\n", cached); } +int page_is_ram(int pfn) +{ + return pfn_valid(pfn); +} + static inline pmd_t *pmd_off(pgd_t *pgd, unsigned long virt) { return pmd_offset(pgd, virt); @@ -660,6 +666,15 @@ */ sysctl_overcommit_memory = OVERCOMMIT_ALWAYS; } +#ifdef CONFIG_SUSPEND2 + { + unsigned long addr; + for (addr = &__nosave_begin; addr < &__nosave_end; + addr += PAGE_SIZE) { + SetPageNosave(virt_to_page(addr)); + } + } +#endif } void free_initmem(void) diff -urN oldtree/arch/frv/kernel/signal.c newtree/arch/frv/kernel/signal.c --- oldtree/arch/frv/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/frv/kernel/signal.c 2006-02-13 14:51:53.877974560 -0500 @@ -535,7 +535,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/h8300/kernel/signal.c newtree/arch/h8300/kernel/signal.c --- oldtree/arch/h8300/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/h8300/kernel/signal.c 2006-02-13 14:51:53.881973952 -0500 @@ -516,7 +516,7 @@ if ((regs->ccr & 0x10)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; current->thread.esp0 = (unsigned long) regs; diff -urN oldtree/arch/i386/kernel/io_apic.c newtree/arch/i386/kernel/io_apic.c --- oldtree/arch/i386/kernel/io_apic.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/i386/kernel/io_apic.c 2006-02-13 14:51:53.896971672 -0500 @@ -578,7 +578,7 @@ for ( ; ; ) { time_remaining = schedule_timeout_interruptible(time_remaining); - try_to_freeze(); + try_todo_list(); if (time_after(jiffies, prev_balance_time+balanced_irq_interval)) { preempt_disable(); diff -urN oldtree/arch/i386/kernel/signal.c newtree/arch/i386/kernel/signal.c --- oldtree/arch/i386/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/i386/kernel/signal.c 2006-02-13 14:51:53.900971064 -0500 @@ -615,7 +615,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/i386/kernel/smp.c newtree/arch/i386/kernel/smp.c --- oldtree/arch/i386/kernel/smp.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/i386/kernel/smp.c 2006-02-13 14:51:53.902970760 -0500 @@ -455,7 +455,7 @@ } EXPORT_SYMBOL(flush_tlb_page); -static void do_flush_tlb_all(void* info) +void do_flush_tlb_all(void* info) { unsigned long cpu = smp_processor_id(); diff -urN oldtree/arch/i386/kernel/time.c newtree/arch/i386/kernel/time.c --- oldtree/arch/i386/kernel/time.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/i386/kernel/time.c 2006-02-13 14:51:53.905970304 -0500 @@ -372,7 +372,8 @@ mod_timer(&sync_cmos_timer, jiffies + 1); } -static long clock_cmos_diff, sleep_start; +static long clock_cmos_diff; +static unsigned long sleep_start; static struct timer_opts *last_timer; static int timer_suspend(struct sys_device *dev, pm_message_t state) @@ -380,9 +381,11 @@ /* * Estimate time zone so that set_time can update the clock */ - clock_cmos_diff = -get_cmos_time(); + long cmos_time = __get_cmos_time(); + + clock_cmos_diff = -cmos_time; clock_cmos_diff += get_seconds(); - sleep_start = get_cmos_time(); + sleep_start = cmos_time; last_timer = cur_timer; cur_timer = &timer_none; if (last_timer->suspend) @@ -395,14 +398,16 @@ unsigned long flags; unsigned long sec; unsigned long sleep_length; + unsigned long cmos_time; #ifdef CONFIG_HPET_TIMER if (is_hpet_enabled()) hpet_reenable(); #endif + cmos_time = get_cmos_time(); + sec = cmos_time + clock_cmos_diff; + sleep_length = (cmos_time - sleep_start) * HZ; setup_pit_timer(); - sec = get_cmos_time() + clock_cmos_diff; - sleep_length = (get_cmos_time() - sleep_start) * HZ; write_seqlock_irqsave(&xtime_lock, flags); xtime.tv_sec = sec; xtime.tv_nsec = 0; diff -urN oldtree/arch/i386/mm/init.c newtree/arch/i386/mm/init.c --- oldtree/arch/i386/mm/init.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/i386/mm/init.c 2006-02-13 14:51:53.913969088 -0500 @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -48,6 +49,7 @@ unsigned long highstart_pfn, highend_pfn; static int noinline do_test_wp_bit(void); +int bad_ppro; /* * Creates a middle page table and puts a pointer to it in the @@ -279,9 +281,12 @@ { if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) { ClearPageReserved(page); + ClearPageNosave(page); free_new_highpage(page); - } else + } else { SetPageReserved(page); + SetPageNosave(page); + } } static int add_one_highpage_hotplug(struct page *page, unsigned long pfn) @@ -384,7 +389,7 @@ #endif } -#ifdef CONFIG_SOFTWARE_SUSPEND +#ifdef CONFIG_PM /* * Swap suspend & friends need this for resume because things like the intel-agp * driver might have split up a kernel 4MB mapping. @@ -570,7 +575,7 @@ extern int ppro_with_ram_bug(void); int codesize, reservedpages, datasize, initsize; int tmp; - int bad_ppro; + struct page *tmp_page; #ifdef CONFIG_FLATMEM if (!mem_map) @@ -601,12 +606,23 @@ totalram_pages += free_all_bootmem(); reservedpages = 0; - for (tmp = 0; tmp < max_low_pfn; tmp++) - /* - * Only count reserved RAM pages - */ - if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp))) - reservedpages++; + for (tmp = 0; tmp < max_low_pfn; tmp++) { + if (page_is_ram(tmp)) { + /* + * Only count reserved RAM pages + */ + if (PageReserved(pfn_to_page(tmp))) + reservedpages++; + } else + /* + * Non-RAM pages are always nosave + */ + SetPageNosave(pfn_to_page(tmp)); + } + + for (tmp_page = virt_to_page(&__nosave_begin); + tmp_page < virt_to_page(&__nosave_end); tmp_page++) + SetPageNosave(tmp_page); set_highmem_pages_init(bad_ppro); @@ -727,6 +743,7 @@ addr = (unsigned long)(&__init_begin); for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); + ClearPageNosave(virt_to_page(addr)); set_page_count(virt_to_page(addr), 1); memset((void *)addr, 0xcc, PAGE_SIZE); free_page(addr); @@ -742,6 +759,7 @@ printk (KERN_INFO "Freeing initrd memory: %ldk freed\n", (end - start) >> 10); for (; start < end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); + ClearPageNosave(virt_to_page(start)); set_page_count(virt_to_page(start), 1); free_page(start); totalram_pages++; diff -urN oldtree/arch/m32r/kernel/signal.c newtree/arch/m32r/kernel/signal.c --- oldtree/arch/m32r/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/m32r/kernel/signal.c 2006-02-13 14:51:53.924967416 -0500 @@ -370,7 +370,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/mips/kernel/irixsig.c newtree/arch/mips/kernel/irixsig.c --- oldtree/arch/mips/kernel/irixsig.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/mips/kernel/irixsig.c 2006-02-13 14:51:53.934965896 -0500 @@ -185,7 +185,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/mips/kernel/signal32.c newtree/arch/mips/kernel/signal32.c --- oldtree/arch/mips/kernel/signal32.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/mips/kernel/signal32.c 2006-02-13 14:51:53.938965288 -0500 @@ -822,7 +822,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/powerpc/kernel/signal_32.c newtree/arch/powerpc/kernel/signal_32.c --- oldtree/arch/powerpc/kernel/signal_32.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/powerpc/kernel/signal_32.c 2006-02-13 14:51:53.946964072 -0500 @@ -1185,7 +1185,7 @@ int signr, ret; #ifdef CONFIG_PPC32 - if (try_to_freeze()) { + if (try_todo_list()) { signr = 0; if (!signal_pending(current)) goto no_signal; diff -urN oldtree/arch/ppc/mm/init.c newtree/arch/ppc/mm/init.c --- oldtree/arch/ppc/mm/init.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/ppc/mm/init.c 2006-02-13 14:51:53.965961184 -0500 @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -144,6 +145,7 @@ while (start < end) { ClearPageReserved(virt_to_page(start)); + ClearPageNosave(virt_to_page(start)); set_page_count(virt_to_page(start), 1); free_page(start); cnt++; @@ -176,6 +178,7 @@ for (; start < end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); + ClearPageNosave(virt_to_page(start)); set_page_count(virt_to_page(start), 1); free_page(start); totalram_pages++; @@ -411,8 +414,10 @@ /* if we are booted from BootX with an initial ramdisk, make sure the ramdisk pages aren't reserved. */ if (initrd_start) { - for (addr = initrd_start; addr < initrd_end; addr += PAGE_SIZE) + for (addr = initrd_start; addr < initrd_end; addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); + ClearPageNosave(virt_to_page(addr)); + } } #endif /* CONFIG_BLK_DEV_INITRD */ @@ -421,17 +426,27 @@ if ( rtas_data ) for (addr = (ulong)__va(rtas_data); addr < PAGE_ALIGN((ulong)__va(rtas_data)+rtas_size) ; - addr += PAGE_SIZE) + addr += PAGE_SIZE) { SetPageReserved(virt_to_page(addr)); + SetPageNosave(virt_to_page(addr)); + } #endif #ifdef CONFIG_PPC_PMAC - if (agp_special_page) + if (agp_special_page) { SetPageReserved(virt_to_page(agp_special_page)); + SetPageNosave(virt_to_page(agp_special_page)); + } #endif for (addr = PAGE_OFFSET; addr < (unsigned long)high_memory; addr += PAGE_SIZE) { if (!PageReserved(virt_to_page(addr))) continue; + /* + * Mark nosave pages + */ + if (addr >= (void *)&__nosave_begin && addr < (void *)&__nosave_end) + SetPageNosave(virt_to_page(addr)); + if (addr < (ulong) etext) codepages++; else if (addr >= (unsigned long)&__init_begin @@ -449,6 +464,7 @@ struct page *page = mem_map + pfn; ClearPageReserved(page); + ClearPageNosave(page); set_page_count(page, 1); __free_page(page); totalhigh_pages++; diff -urN oldtree/arch/ppc/platforms/pmac_feature.c newtree/arch/ppc/platforms/pmac_feature.c --- oldtree/arch/ppc/platforms/pmac_feature.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/ppc/platforms/pmac_feature.c 2006-02-13 14:51:53.975959664 -0500 @@ -2301,7 +2301,10 @@ }, { "PowerBook5,1", "PowerBook G4 17\"", PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features, - PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE, + PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE +#ifdef CONFIG_SOFTWARE_REPLACE_SLEEP + | PMAC_MB_CAN_SLEEP, +#endif }, { "PowerBook5,2", "PowerBook G4 15\"", PMAC_TYPE_UNKNOWN_INTREPID, intrepid_features, diff -urN oldtree/arch/sh/kernel/signal.c newtree/arch/sh/kernel/signal.c --- oldtree/arch/sh/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/sh/kernel/signal.c 2006-02-13 14:51:54.000955864 -0500 @@ -578,7 +578,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/sh64/kernel/signal.c newtree/arch/sh64/kernel/signal.c --- oldtree/arch/sh64/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/sh64/kernel/signal.c 2006-02-13 14:51:54.021952672 -0500 @@ -696,7 +696,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/x86_64/kernel/asm-offsets.c newtree/arch/x86_64/kernel/asm-offsets.c --- oldtree/arch/x86_64/kernel/asm-offsets.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/kernel/asm-offsets.c 2006-02-13 14:51:54.031951152 -0500 @@ -61,8 +61,10 @@ offsetof (struct rt_sigframe32, uc.uc_mcontext)); BLANK(); #endif +#ifdef CONFIG_PM DEFINE(pbe_address, offsetof(struct pbe, address)); DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address)); DEFINE(pbe_next, offsetof(struct pbe, next)); +#endif return 0; } diff -urN oldtree/arch/x86_64/kernel/e820.c newtree/arch/x86_64/kernel/e820.c --- oldtree/arch/x86_64/kernel/e820.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/kernel/e820.c 2006-02-13 14:51:54.032951000 -0500 @@ -186,6 +186,23 @@ return end_pfn; } +int page_is_ram(unsigned long pagenr) +{ + unsigned long start = pagenr << PAGE_SHIFT; + int i; + for (i = 0; i < e820.nr_map; i++) { + struct e820entry *ei = &e820.map[i]; + + if (ei->addr+ei->size <= start || + ei->addr >= (start + PAGE_SIZE)) + continue; + + return (ei->type != E820_RAM); + } + + return 0; +} + /* * Compute how much memory is missing in a range. * Unlike the other functions in this file the arguments are in page numbers. diff -urN oldtree/arch/x86_64/kernel/signal.c newtree/arch/x86_64/kernel/signal.c --- oldtree/arch/x86_64/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/kernel/signal.c 2006-02-13 14:51:54.032951000 -0500 @@ -443,7 +443,7 @@ if (!user_mode(regs)) return 1; - if (try_to_freeze()) + if (try_todo_list()) goto no_signal; if (!oldset) diff -urN oldtree/arch/x86_64/kernel/suspend.c newtree/arch/x86_64/kernel/suspend.c --- oldtree/arch/x86_64/kernel/suspend.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/kernel/suspend.c 2006-02-13 14:51:54.032951000 -0500 @@ -13,6 +13,7 @@ #include #include #include +#include struct saved_context saved_context; @@ -22,6 +23,8 @@ unsigned long saved_context_r12, saved_context_r13, saved_context_r14, saved_context_r15; unsigned long saved_context_eflags; +void fix_processor_context(void); + void __save_processor_state(struct saved_context *ctxt) { kernel_fpu_begin(); @@ -141,7 +144,7 @@ } -#ifdef CONFIG_SOFTWARE_SUSPEND +#if defined(CONFIG_SOFTWARE_SUSPEND) || defined(CONFIG_SUSPEND2) /* Defined in arch/x86_64/kernel/suspend_asm.S */ extern int restore_image(void); @@ -220,4 +223,9 @@ restore_image(); return 0; } + +int suspend2_mapping_prepare(void) +{ + return set_up_temporary_mappings(); +} #endif /* CONFIG_SOFTWARE_SUSPEND */ diff -urN oldtree/arch/x86_64/kernel/time.c newtree/arch/x86_64/kernel/time.c --- oldtree/arch/x86_64/kernel/time.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/kernel/time.c 2006-02-13 14:51:54.033950848 -0500 @@ -509,11 +509,56 @@ return cycles_2_ns(a); } +unsigned long __get_cmos_time(void) +{ + unsigned int year, mon, day, hour, min, sec; + + /* + * Do we need the spinlock in here too? + * + * If we're called directly (not via get_cmos_time), + * we're in the middle of a sysdev suspend/resume + * and interrupts are disabled, so this + * should be safe without any locking. + * -- NC + */ + + do { + sec = CMOS_READ(RTC_SECONDS); + min = CMOS_READ(RTC_MINUTES); + hour = CMOS_READ(RTC_HOURS); + day = CMOS_READ(RTC_DAY_OF_MONTH); + mon = CMOS_READ(RTC_MONTH); + year = CMOS_READ(RTC_YEAR); + } while (sec != CMOS_READ(RTC_SECONDS)); + + /* + * We know that x86-64 always uses BCD format, no need to check the config + * register. + */ + + BCD_TO_BIN(sec); + BCD_TO_BIN(min); + BCD_TO_BIN(hour); + BCD_TO_BIN(day); + BCD_TO_BIN(mon); + BCD_TO_BIN(year); + + /* + * This will work up to Dec 31, 2069. + */ + + if ((year += 1900) < 1970) + year += 100; + + return mktime(year, mon, day, hour, min, sec); +} + unsigned long get_cmos_time(void) { - unsigned int timeout, year, mon, day, hour, min, sec; + unsigned int timeout; unsigned char last, this; - unsigned long flags; + unsigned long flags, result; /* * The Linux interpretation of the CMOS clock register contents: When the @@ -534,39 +579,10 @@ timeout--; } -/* - * Here we are safe to assume the registers won't change for a whole second, so - * we just go ahead and read them. - */ - - sec = CMOS_READ(RTC_SECONDS); - min = CMOS_READ(RTC_MINUTES); - hour = CMOS_READ(RTC_HOURS); - day = CMOS_READ(RTC_DAY_OF_MONTH); - mon = CMOS_READ(RTC_MONTH); - year = CMOS_READ(RTC_YEAR); - + result = __get_cmos_time(); spin_unlock_irqrestore(&rtc_lock, flags); -/* - * We know that x86-64 always uses BCD format, no need to check the config - * register. - */ - - BCD_TO_BIN(sec); - BCD_TO_BIN(min); - BCD_TO_BIN(hour); - BCD_TO_BIN(day); - BCD_TO_BIN(mon); - BCD_TO_BIN(year); - -/* - * x86-64 systems only exists since 2002. - * This will work up to Dec 31, 2100 - */ - year += 2000; - - return mktime(year, mon, day, hour, min, sec); + return result; } #ifdef CONFIG_CPU_FREQ @@ -1004,7 +1020,7 @@ /* * Estimate time zone so that set_time can update the clock */ - long cmos_time = get_cmos_time(); + long cmos_time = __get_cmos_time(); clock_cmos_diff = -cmos_time; clock_cmos_diff += get_seconds(); diff -urN oldtree/arch/x86_64/mm/init.c newtree/arch/x86_64/mm/init.c --- oldtree/arch/x86_64/mm/init.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/arch/x86_64/mm/init.c 2006-02-13 14:51:54.034950696 -0500 @@ -489,6 +489,7 @@ addr = (unsigned long)(&__init_begin); for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); + ClearPageNosave(virt_to_page(addr)); set_page_count(virt_to_page(addr), 1); memset((void *)(addr & ~(PAGE_SIZE-1)), 0xcc, PAGE_SIZE); free_page(addr); @@ -506,6 +507,7 @@ printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10); for (; start < end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); + ClearPageNosave(virt_to_page(start)); set_page_count(virt_to_page(start), 1); free_page(start); totalram_pages++; @@ -617,3 +619,22 @@ { return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END); } + +#if defined(CONFIG_SOFTWARE_SUSPEND) || defined(CONFIG_SUSPEND2) +/* + * Software suspend & friends need this for resume because things like the intel-agp + * driver might have split up a kernel 4MB mapping. + */ +char __nosavedata swsusp_pg_dir[PAGE_SIZE] + __attribute__ ((aligned (PAGE_SIZE))); + +static inline void save_pg_dir(void) +{ + memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE); +} +#else +static inline void save_pg_dir(void) +{ +} +#endif + diff -urN oldtree/block/ll_rw_blk.c newtree/block/ll_rw_blk.c --- oldtree/block/ll_rw_blk.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/block/ll_rw_blk.c 2006-02-13 14:51:54.035950544 -0500 @@ -27,6 +27,9 @@ #include #include #include +#include +#include +#include /* * for max sense size @@ -2920,12 +2923,26 @@ else mod_page_state(pgpgin, count); + if (unlikely(( bio->bi_flags & (1 << BIO_SUSPEND2)) && + test_action_state(SUSPEND_TEST_BIO) && + (rw & WRITE))) { + char b[BDEVNAME_SIZE]; + printk("FAKEDWRITE: %s(%d): %s block %Lu on %s\n", + current->comm, current->pid, + (rw & WRITE) ? "WRITE" : "READ", + (unsigned long long)bio->bi_sector, + bdevname(bio->bi_bdev,b)); + bio_endio(bio, PAGE_SIZE, 0); + return; + } + if (unlikely(block_dump)) { char b[BDEVNAME_SIZE]; - printk(KERN_DEBUG "%s(%d): %s block %Lu on %s\n", + printk(KERN_DEBUG "%s(%d): %s block %Lu size %d on %s\n", current->comm, current->pid, (rw & WRITE) ? "WRITE" : "READ", (unsigned long long)bio->bi_sector, + bio->bi_size, bdevname(bio->bi_bdev,b)); } @@ -3224,7 +3241,7 @@ int __init blk_dev_init(void) { - kblockd_workqueue = create_workqueue("kblockd"); + kblockd_workqueue = create_nofreeze_workqueue("kblockd"); if (!kblockd_workqueue) panic("Failed to create kblockd\n"); diff -urN oldtree/crypto/Kconfig newtree/crypto/Kconfig --- oldtree/crypto/Kconfig 2006-01-02 22:21:10.000000000 -0500 +++ newtree/crypto/Kconfig 2006-02-13 14:51:54.036950392 -0500 @@ -285,6 +285,13 @@ You will most probably want this if using IPSec. +config CRYPTO_LZF + tristate "LZF compression algorithm" + depends on CRYPTO + help + This is the LZF algorithm. It is especially useful for Suspend2, + because it achieves good compression quickly. + config CRYPTO_MICHAEL_MIC tristate "Michael MIC keyed digest algorithm" depends on CRYPTO diff -urN oldtree/crypto/Makefile newtree/crypto/Makefile --- oldtree/crypto/Makefile 2006-01-02 22:21:10.000000000 -0500 +++ newtree/crypto/Makefile 2006-02-13 14:51:54.036950392 -0500 @@ -30,5 +30,6 @@ obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o +obj-$(CONFIG_CRYPTO_LZF) += lzf.o obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o diff -urN oldtree/crypto/deflate.c newtree/crypto/deflate.c --- oldtree/crypto/deflate.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/crypto/deflate.c 2006-02-13 14:51:54.036950392 -0500 @@ -143,8 +143,15 @@ ret = zlib_deflate(stream, Z_FINISH); if (ret != Z_STREAM_END) { - ret = -EINVAL; - goto out; + if (!(ret == Z_OK && !stream->avail_in && !stream->avail_out)) { + ret = -EINVAL; + goto out; + } else { + u8 zerostuff = 0; + stream->next_out = &zerostuff; + stream->avail_out = 1; + ret = zlib_deflate(stream, Z_FINISH); + } } ret = 0; *dlen = stream->total_out; diff -urN oldtree/crypto/lzf.c newtree/crypto/lzf.c --- oldtree/crypto/lzf.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/crypto/lzf.c 2006-02-13 14:51:54.037950240 -0500 @@ -0,0 +1,335 @@ +/* + * Cryptoapi LZF compression module. + * + * Copyright (c) 2004-2005 Nigel Cunningham + * + * based on the deflate.c file: + * + * Copyright (c) 2003 James Morris + * + * and upon the LZF compression module donated to the Suspend2 project with + * the following copyright: + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * Copyright (c) 2000-2003 Marc Alexander Lehmann + * + * Redistribution and use in source and binary forms, with or without modifica- + * tion, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * 3. The name of the author may not be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER- + * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO + * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE- + * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; + * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH- + * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Alternatively, the contents of this file may be used under the terms of + * the GNU General Public License version 2 (the "GPL"), in which case the + * provisions of the GPL are applicable instead of the above. If you wish to + * allow the use of your version of this file only under the terms of the + * GPL and not to allow others to use your version of this file under the + * BSD license, indicate your decision by deleting the provisions above and + * replace them with the notice and other provisions required by the GPL. If + * you do not delete the provisions above, a recipient may use your version + * of this file under either the BSD or the GPL. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct lzf_ctx { + void *hbuf; + unsigned int bufofs; +}; + +/* + * size of hashtable is (1 << hlog) * sizeof (char *) + * decompression is independent of the hash table size + * the difference between 15 and 14 is very small + * for small blocks (and 14 is also faster). + * For a low-memory configuration, use hlog == 13; + * For best compression, use 15 or 16. + */ +static const int hlog = 14; + +/* + * don't play with this unless you benchmark! + * decompression is not dependent on the hash function + * the hashing function might seem strange, just believe me + * it works ;) + */ +static inline u16 first(const u8 *p) +{ + return ((p[0]) << 8) + p[1]; +} + +static inline u16 next(u8 v, const u8 *p) +{ + return ((v) << 8) + p[2]; +} + +static inline u32 idx(unsigned int h) +{ + return (((h ^ (h << 5)) >> (3*8 - hlog)) + h*3) & ((1 << hlog) - 1); +} + +/* + * IDX works because it is very similar to a multiplicative hash, e.g. + * (h * 57321 >> (3*8 - hlog)) + * the next one is also quite good, albeit slow ;) + * (int)(cos(h & 0xffffff) * 1e6) + */ + +static const int max_lit = (1 << 5); +static const int max_off = (1 << 13); +static const int max_ref = ((1 << 8) + (1 << 3)); + +/* + * compressed format + * + * 000LLLLL ; literal + * LLLOOOOO oooooooo ; backref L + * 111OOOOO LLLLLLLL oooooooo ; backref L+7 + * + */ + +static void lzf_compress_exit(void *context) +{ + struct lzf_ctx *ctx = (struct lzf_ctx *)context; + + if (ctx->hbuf) { + vfree(ctx->hbuf); + ctx->hbuf = NULL; + } +} + +static int lzf_compress_init(void *context) +{ + struct lzf_ctx *ctx = (struct lzf_ctx *)context; + + /* Get LZF ready to go */ + ctx->hbuf = vmalloc_32((1 << hlog) * sizeof(char *)); + if (!ctx->hbuf) { + printk(KERN_WARNING + "Failed to allocate %ld bytes for lzf workspace\n", + (1 << hlog) * sizeof(char *)); + return -ENOMEM; + } + return 0; +} + +static int lzf_compress(void *context, const u8 *in_data, unsigned int in_len, + u8 *out_data, unsigned int *out_len) +{ + struct lzf_ctx *ctx = (struct lzf_ctx *)context; + const u8 **htab = ctx->hbuf; + const u8 **hslot; + const u8 *ip = in_data; + u8 *op = out_data; + const u8 *in_end = ip + in_len; + u8 *out_end = op + *out_len - 3; + const u8 *ref; + + unsigned int hval = first(ip); + unsigned long off; + int lit = 0; + + memset(htab, 0, sizeof(htab)); + + for (;;) { + if (ip < in_end - 2) { + hval = next(hval, ip); + hslot = htab + idx(hval); + ref = *hslot; + *hslot = ip; + + if ((off = ip - ref - 1) < max_off + && ip + 4 < in_end && ref > in_data + && *(u16 *) ref == *(u16 *) ip && ref[2] == ip[2] + ) { + /* match found at *ref++ */ + unsigned int len = 2; + unsigned int maxlen = in_end - ip - len; + maxlen = maxlen > max_ref ? max_ref : maxlen; + + do + len++; + while (len < maxlen && ref[len] == ip[len]); + + if (op + lit + 1 + 3 >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + if (lit) { + *op++ = lit - 1; + lit = -lit; + do + *op++ = ip[lit]; + while (++lit); + } + + len -= 2; + ip++; + + if (len < 7) { + *op++ = (off >> 8) + (len << 5); + } else { + *op++ = (off >> 8) + (7 << 5); + *op++ = len - 7; + } + + *op++ = off; + + ip += len; + hval = first(ip); + hval = next(hval, ip); + htab[idx(hval)] = ip; + ip++; + continue; + } + } else if (ip == in_end) + break; + + /* one more literal byte we must copy */ + lit++; + ip++; + + if (lit == max_lit) { + if (op + 1 + max_lit >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + *op++ = max_lit - 1; + memcpy(op, ip - max_lit, max_lit); + op += max_lit; + lit = 0; + } + } + + if (lit) { + if (op + lit + 1 >= out_end) { + *out_len = PAGE_SIZE; + return 0; + } + + *op++ = lit - 1; + lit = -lit; + do + *op++ = ip[lit]; + while (++lit); + } + + *out_len = op - out_data; + return 0; +} + +static int lzf_decompress(void *context, const u8 *src, unsigned int slen, + u8 *dst, unsigned int *dlen) +{ + u8 const *ip = src; + u8 *op = dst; + u8 const *const in_end = ip + slen; + u8 *const out_end = op + *dlen; + + do { + unsigned int ctrl = *ip++; + + if (ctrl < (1 << 5)) { /* literal run */ + ctrl++; + + if (op + ctrl > out_end) { + *dlen = PAGE_SIZE; + return 0; + } + memcpy(op, ip, ctrl); + op += ctrl; + ip += ctrl; + } else { /* back reference */ + + unsigned int len = ctrl >> 5; + + u8 *ref = op - ((ctrl & 0x1f) << 8) - 1; + + if (len == 7) + len += *ip++; + + ref -= *ip++; + + if (op + len + 2 > out_end) { + *dlen = PAGE_SIZE; + return 0; + } + + if (ref < (u8 *) dst) { + *dlen = PAGE_SIZE; + return 0; + } + + *op++ = *ref++; + *op++ = *ref++; + + do + *op++ = *ref++; + while (--len); + } + } + while (op < out_end && ip < in_end); + + *dlen = op - (u8 *) dst; + return 0; +} + +static struct crypto_alg alg = { + .cra_name = "lzf", + .cra_flags = CRYPTO_ALG_TYPE_COMPRESS, + .cra_ctxsize = 0, + .cra_module = THIS_MODULE, + .cra_list = LIST_HEAD_INIT(alg.cra_list), + .cra_u = {.compress = { + .coa_init = lzf_compress_init, + .coa_exit = lzf_compress_exit, + .coa_compress = lzf_compress, + .coa_decompress = lzf_decompress}} +}; + +static int __init init(void) +{ + return crypto_register_alg(&alg); +} + +static void __exit fini(void) +{ + crypto_unregister_alg(&alg); +} + +module_init(init); +module_exit(fini); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("LZF Compression Algorithm"); +MODULE_AUTHOR("Marc Alexander Lehmann & Nigel Cunningham"); diff -urN oldtree/drivers/acpi/osl.c newtree/drivers/acpi/osl.c --- oldtree/drivers/acpi/osl.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/acpi/osl.c 2006-02-13 14:51:54.037950240 -0500 @@ -91,7 +91,7 @@ "Access to PCI configuration space unavailable\n"); return AE_NULL_ENTRY; } - kacpid_wq = create_singlethread_workqueue("kacpid"); + kacpid_wq = create_nofreeze_singlethread_workqueue("kacpid"); BUG_ON(!kacpid_wq); return AE_OK; diff -urN oldtree/drivers/acpi/sleep/proc.c newtree/drivers/acpi/sleep/proc.c --- oldtree/drivers/acpi/sleep/proc.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/acpi/sleep/proc.c 2006-02-13 14:51:54.038950088 -0500 @@ -58,6 +58,15 @@ goto Done; } state = simple_strtoul(str, NULL, 0); + + /* + * I used to put this after the CONFIG_SOFTWARE_SUSPEND + * test, but people who compile in suspend2 usually want + * to use it instead of swsusp. --NC + */ + if (may_try_suspend2(state)) + goto Done; + #ifdef CONFIG_SOFTWARE_SUSPEND if (state == 4) { error = software_suspend(); diff -urN oldtree/drivers/base/power/resume.c newtree/drivers/base/power/resume.c --- oldtree/drivers/base/power/resume.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/base/power/resume.c 2006-02-13 14:51:54.038950088 -0500 @@ -101,6 +101,11 @@ list_del_init(entry); list_add_tail(entry, &dpm_active); resume_device(dev); + if (!irqs_disabled()) { + printk("WARNING: Interrupts reenabled while resuming sysdev driver %s.\n", + kobject_name(&dev->kobj)); + local_irq_disable(); + } put_device(dev); } } diff -urN oldtree/drivers/base/power/suspend.c newtree/drivers/base/power/suspend.c --- oldtree/drivers/base/power/suspend.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/base/power/suspend.c 2006-02-13 14:51:54.038950088 -0500 @@ -94,6 +94,12 @@ error = suspend_device(dev, state); + if (irqs_disabled()) { + printk("WARNING: Interrupts disabled while suspending %s.\n", + dev->driver ? dev->driver->name : dev->kobj.name); + local_irq_enable(); + } + down(&dpm_list_sem); /* Check if the device got removed */ diff -urN oldtree/drivers/base/sys.c newtree/drivers/base/sys.c --- oldtree/drivers/base/sys.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/base/sys.c 2006-02-13 14:51:54.039949936 -0500 @@ -298,16 +298,34 @@ if (cls->resume) cls->resume(dev); + if (!irqs_disabled()) { + printk("WARNING: Interrupts reenabled while resuming sysdev class specific driver %s.\n", + kobject_name(&dev->kobj)); + local_irq_disable(); + } + /* Call auxillary drivers next. */ list_for_each_entry(drv, &cls->drivers, entry) { - if (drv->resume) + if (drv->resume) { drv->resume(dev); + if (!irqs_disabled()) { + printk("WARNING: Interrupts reenabled while resuming sysdev class driver %s.\n", + kobject_name(&dev->kobj)); + local_irq_disable(); + } + } } /* Call global drivers. */ list_for_each_entry(drv, &sysdev_drivers, entry) { - if (drv->resume) + if (drv->resume) { drv->resume(dev); + if (!irqs_disabled()) { + printk("WARNING: Interrupts reenabled while resuming sysdev driver %s.\n", + kobject_name(&dev->kobj)); + local_irq_disable(); + } + } } } diff -urN oldtree/drivers/block/pktcdvd.c newtree/drivers/block/pktcdvd.c --- oldtree/drivers/block/pktcdvd.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/block/pktcdvd.c 2006-02-13 14:51:54.040949784 -0500 @@ -1255,8 +1255,7 @@ residue = schedule_timeout(min_sleep_time); VPRINTK("kcdrwd: wake up\n"); - /* make swsusp happy with our thread */ - try_to_freeze(); + try_todo_list(); list_for_each_entry(pkt, &pd->cdrw.pkt_active_list, list) { if (!pkt->sleep_time) diff -urN oldtree/drivers/char/agp/agp_suspend.h newtree/drivers/char/agp/agp_suspend.h --- oldtree/drivers/char/agp/agp_suspend.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/drivers/char/agp/agp_suspend.h 2006-02-13 14:51:54.040949784 -0500 @@ -0,0 +1,43 @@ +/* + * Generic routines for suspending and resuming an agp bridge. + * + * Include "agp.h" first. + */ + +#ifdef CONFIG_PM +static int agp_common_suspend(struct pci_dev *pdev, pm_message_t state) +{ + pci_save_state(pdev); + pci_set_power_state(pdev, 3); + + return 0; +} + +static int agp_common_resume(struct pci_dev *pdev, + struct agp_bridge_driver * generic_bridge, + int reconfigure(void)) +{ + struct agp_bridge_data *bridge = pci_get_drvdata(pdev); + + /* set power state 0 and restore PCI space */ + pci_set_power_state(pdev, 0); + pci_restore_state(pdev); + + /* reconfigure AGP hardware again */ + if (bridge->driver == generic_bridge) + return reconfigure(); + + return 0; +} +#else +static int agp_common_suspend(struct pci_dev *pdev, pm_message_t state) +{ + return 0; +} + +static int agp_common_resume(struct pci_dev *pdev, _something_ * bridge, + void *reconfigure) +{ + return 0; +} +#endif diff -urN oldtree/drivers/char/agp/ati-agp.c newtree/drivers/char/agp/ati-agp.c --- oldtree/drivers/char/agp/ati-agp.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/char/agp/ati-agp.c 2006-02-13 14:51:54.041949632 -0500 @@ -11,6 +11,7 @@ #include #include #include "agp.h" +#include "agp_suspend.h" #define ATI_GART_MMBASE_ADDR 0x14 #define ATI_RS100_APSIZE 0xac @@ -506,6 +507,17 @@ agp_put_bridge(bridge); } +static int agp_ati_suspend(struct pci_dev *pdev, pm_message_t state) +{ + return (agp_common_suspend(pdev, state)); +} + +static int agp_ati_resume(struct pci_dev *pdev) +{ + return agp_common_resume(pdev, &ati_generic_bridge, + ati_configure); +} + static struct pci_device_id agp_ati_pci_table[] = { { .class = (PCI_CLASS_BRIDGE_HOST << 8), @@ -525,6 +537,8 @@ .id_table = agp_ati_pci_table, .probe = agp_ati_probe, .remove = agp_ati_remove, + .suspend = agp_ati_suspend, + .resume = agp_ati_resume, }; static int __init agp_ati_init(void) diff -urN oldtree/drivers/char/agp/nvidia-agp.c newtree/drivers/char/agp/nvidia-agp.c --- oldtree/drivers/char/agp/nvidia-agp.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/char/agp/nvidia-agp.c 2006-02-13 14:51:54.042949480 -0500 @@ -12,6 +12,7 @@ #include #include #include "agp.h" +#include "agp_suspend.h" /* NVIDIA registers */ #define NVIDIA_0_APSIZE 0x80 @@ -397,11 +398,24 @@ MODULE_DEVICE_TABLE(pci, agp_nvidia_pci_table); +static int agp_nvidia_suspend(struct pci_dev *pdev, pm_message_t state) +{ + return (agp_common_suspend(pdev, state)); +} + +static int agp_nvidia_resume(struct pci_dev *pdev) +{ + return agp_common_resume(pdev, &agp_bridge, + nvidia_configure); +} + static struct pci_driver agp_nvidia_pci_driver = { .name = "agpgart-nvidia", .id_table = agp_nvidia_pci_table, .probe = agp_nvidia_probe, .remove = agp_nvidia_remove, + .suspend = agp_nvidia_suspend, + .resume = agp_nvidia_resume, }; static int __init agp_nvidia_init(void) diff -urN oldtree/drivers/char/hvc_console.c newtree/drivers/char/hvc_console.c --- oldtree/drivers/char/hvc_console.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/char/hvc_console.c 2006-02-13 14:51:54.050948264 -0500 @@ -841,7 +841,7 @@ /* Always start the kthread because there can be hotplug vty adapters * added later. */ - hvc_task = kthread_run(khvcd, NULL, "khvcd"); + hvc_task = kthread_nofreeze_run(khvcd, NULL, "khvcd"); if (IS_ERR(hvc_task)) { panic("Couldn't create kthread for console.\n"); put_tty_driver(hvc_driver); diff -urN oldtree/drivers/char/hvcs.c newtree/drivers/char/hvcs.c --- oldtree/drivers/char/hvcs.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/char/hvcs.c 2006-02-13 14:51:54.052947960 -0500 @@ -1406,7 +1406,7 @@ return -ENOMEM; } - hvcs_task = kthread_run(khvcsd, NULL, "khvcsd"); + hvcs_task = kthread_nofreeze_run(khvcsd, NULL, "khvcsd"); if (IS_ERR(hvcs_task)) { printk(KERN_ERR "HVCS: khvcsd creation failed. Driver not loaded.\n"); kfree(hvcs_pi_buff); diff -urN oldtree/drivers/ieee1394/ieee1394_core.c newtree/drivers/ieee1394/ieee1394_core.c --- oldtree/drivers/ieee1394/ieee1394_core.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/ieee1394/ieee1394_core.c 2006-02-13 14:51:54.052947960 -0500 @@ -1032,7 +1032,7 @@ while (1) { if (down_interruptible(&khpsbpkt_sig)) { - if (try_to_freeze()) + if (try_todo_list()) continue; printk("khpsbpkt: received unexpected signal?!\n" ); break; diff -urN oldtree/drivers/ieee1394/nodemgr.c newtree/drivers/ieee1394/nodemgr.c --- oldtree/drivers/ieee1394/nodemgr.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/ieee1394/nodemgr.c 2006-02-13 14:51:54.053947808 -0500 @@ -1569,7 +1569,7 @@ if (down_interruptible(&hi->reset_sem) || down_interruptible(&nodemgr_serialize)) { - if (try_to_freeze()) + if (try_todo_list()) continue; printk("NodeMgr: received unexpected signal?!\n" ); break; diff -urN oldtree/drivers/input/gameport/gameport.c newtree/drivers/input/gameport/gameport.c --- oldtree/drivers/input/gameport/gameport.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/input/gameport/gameport.c 2006-02-13 14:51:54.054947656 -0500 @@ -442,7 +442,7 @@ gameport_handle_event(); wait_event_interruptible(gameport_wait, kthread_should_stop() || !list_empty(&gameport_event_list)); - try_to_freeze(); + try_todo_list(); } while (!kthread_should_stop()); printk(KERN_DEBUG "gameport: kgameportd exiting\n"); diff -urN oldtree/drivers/input/serio/serio.c newtree/drivers/input/serio/serio.c --- oldtree/drivers/input/serio/serio.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/input/serio/serio.c 2006-02-13 14:51:54.054947656 -0500 @@ -314,6 +314,12 @@ serio_remove_duplicate_events(event); serio_free_event(event); + + if (unlikely(todo_list_active())) { + up(&serio_sem); + try_todo_list(); + down(&serio_sem); + } } up(&serio_sem); @@ -377,7 +383,7 @@ serio_handle_event(); wait_event_interruptible(serio_wait, kthread_should_stop() || !list_empty(&serio_event_list)); - try_to_freeze(); + try_todo_list(); } while (!kthread_should_stop()); printk(KERN_DEBUG "serio: kseriod exiting\n"); @@ -899,7 +905,7 @@ static int __init serio_init(void) { - serio_task = kthread_run(serio_thread, NULL, "kseriod"); + serio_task = kthread_nofreeze_run(serio_thread, NULL, "kseriod"); if (IS_ERR(serio_task)) { printk(KERN_ERR "serio: Failed to start kseriod\n"); return PTR_ERR(serio_task); diff -urN oldtree/drivers/macintosh/Kconfig newtree/drivers/macintosh/Kconfig --- oldtree/drivers/macintosh/Kconfig 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/macintosh/Kconfig 2006-02-13 14:51:54.054947656 -0500 @@ -192,4 +192,8 @@ tristate "Support for ANS LCD display" depends on ADB_CUDA && PPC_PMAC +config SOFTWARE_REPLACE_SLEEP + bool "Using Software suspend replace broken sleep function" + depends on SUSPEND2 + endmenu diff -urN oldtree/drivers/macintosh/therm_adt746x.c newtree/drivers/macintosh/therm_adt746x.c --- oldtree/drivers/macintosh/therm_adt746x.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/macintosh/therm_adt746x.c 2006-02-13 14:51:54.055947504 -0500 @@ -328,7 +328,7 @@ struct thermostat* th = arg; while(!kthread_should_stop()) { - try_to_freeze(); + try_todo_list(); msleep_interruptible(2000); #ifndef DEBUG diff -urN oldtree/drivers/macintosh/via-pmu.c newtree/drivers/macintosh/via-pmu.c --- oldtree/drivers/macintosh/via-pmu.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/macintosh/via-pmu.c 2006-02-13 14:51:54.056947352 -0500 @@ -2882,6 +2882,13 @@ return -EACCES; if (sleep_in_progress) return -EBUSY; +#ifdef CONFIG_SOFTWARE_REPLACE_SLEEP + { + extern void software_suspend_pending(void); + software_suspend_pending(); + return (0); + } +#endif sleep_in_progress = 1; switch (pmu_kind) { case PMU_OHARE_BASED: diff -urN oldtree/drivers/md/dm-crypt.c newtree/drivers/md/dm-crypt.c --- oldtree/drivers/md/dm-crypt.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/md/dm-crypt.c 2006-02-13 14:51:54.056947352 -0500 @@ -923,7 +923,7 @@ if (!_crypt_io_pool) return -ENOMEM; - _kcryptd_workqueue = create_workqueue("kcryptd"); + _kcryptd_workqueue = create_nofreeze_workqueue("kcryptd"); if (!_kcryptd_workqueue) { r = -ENOMEM; DMERR(PFX "couldn't create kcryptd"); diff -urN oldtree/drivers/md/md.c newtree/drivers/md/md.c --- oldtree/drivers/md/md.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/md/md.c 2006-02-13 14:51:54.058947048 -0500 @@ -41,7 +41,6 @@ #include #include #include /* for invalidate_bdev */ -#include #include @@ -3486,7 +3485,8 @@ thread->run = run; thread->mddev = mddev; thread->timeout = MAX_SCHEDULE_TIMEOUT; - thread->tsk = kthread_run(md_thread, thread, name, mdname(thread->mddev)); + thread->tsk = kthread_nofreeze_run(md_thread, thread, + name, mdname(thread->mddev)); if (IS_ERR(thread->tsk)) { kfree(thread); return NULL; diff -urN oldtree/drivers/media/dvb/dvb-core/dvb_frontend.c newtree/drivers/media/dvb/dvb-core/dvb_frontend.c --- oldtree/drivers/media/dvb/dvb-core/dvb_frontend.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/media/dvb/dvb-core/dvb_frontend.c 2006-02-13 14:51:54.059946896 -0500 @@ -392,7 +392,7 @@ break; } - try_to_freeze(); + try_todo_list(); if (down_interruptible(&fepriv->sem)) break; diff -urN oldtree/drivers/media/video/msp3400.c newtree/drivers/media/video/msp3400.c --- oldtree/drivers/media/video/msp3400.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/media/video/msp3400.c 2006-02-13 14:51:54.060946744 -0500 @@ -860,7 +860,7 @@ } remove_wait_queue(&msp->wq, &wait); - try_to_freeze(); + try_todo_list(); return msp->restart; } diff -urN oldtree/drivers/media/video/tvaudio.c newtree/drivers/media/video/tvaudio.c --- oldtree/drivers/media/video/tvaudio.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/media/video/tvaudio.c 2006-02-13 14:51:54.061946592 -0500 @@ -295,7 +295,7 @@ schedule(); } remove_wait_queue(&chip->wq, &wait); - try_to_freeze(); + try_todo_list(); if (chip->done || signal_pending(current)) break; tvaudio_dbg("%s: thread wakeup\n", chip->c.name); diff -urN oldtree/drivers/media/video/video-buf-dvb.c newtree/drivers/media/video/video-buf-dvb.c --- oldtree/drivers/media/video/video-buf-dvb.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/media/video/video-buf-dvb.c 2006-02-13 14:51:54.061946592 -0500 @@ -62,7 +62,7 @@ break; if (kthread_should_stop()) break; - try_to_freeze(); + try_todo_list(); /* feed buffer data to demux */ if (buf->state == STATE_DONE) diff -urN oldtree/drivers/net/8139too.c newtree/drivers/net/8139too.c --- oldtree/drivers/net/8139too.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/net/8139too.c 2006-02-13 14:51:54.062946440 -0500 @@ -1607,8 +1607,7 @@ timeout = next_tick; do { timeout = interruptible_sleep_on_timeout (&tp->thr_wait, timeout); - /* make swsusp happy with our thread */ - try_to_freeze(); + try_todo_list(); } while (!signal_pending (current) && (timeout > 0)); if (signal_pending (current)) { diff -urN oldtree/drivers/net/irda/sir_kthread.c newtree/drivers/net/irda/sir_kthread.c --- oldtree/drivers/net/irda/sir_kthread.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/net/irda/sir_kthread.c 2006-02-13 14:51:54.063946288 -0500 @@ -112,6 +112,7 @@ DECLARE_WAITQUEUE(wait, current); daemonize("kIrDAd"); + current->flags |= PF_NOFREEZE; irda_rq_queue.thread = current; @@ -134,9 +135,6 @@ __set_task_state(current, TASK_RUNNING); remove_wait_queue(&irda_rq_queue.kick, &wait); - /* make swsusp happy with our thread */ - try_to_freeze(); - run_irda_queue(); } diff -urN oldtree/drivers/net/irda/stir4200.c newtree/drivers/net/irda/stir4200.c --- oldtree/drivers/net/irda/stir4200.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/net/irda/stir4200.c 2006-02-13 14:51:54.064946136 -0500 @@ -762,7 +762,7 @@ { #ifdef CONFIG_PM /* if suspending, then power off and wait */ - if (unlikely(freezing(current))) { + if (unlikely(todo_list_active())) { if (stir->receiving) receive_stop(stir); else @@ -770,7 +770,7 @@ write_reg(stir, REG_CTRL1, CTRL1_TXPWD|CTRL1_RXPWD); - refrigerator(); + run_todo_list(); if (change_speed(stir, stir->speed)) break; diff -urN oldtree/drivers/net/wireless/airo.c newtree/drivers/net/wireless/airo.c --- oldtree/drivers/net/wireless/airo.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/net/wireless/airo.c 2006-02-13 14:51:54.066945832 -0500 @@ -2910,7 +2910,7 @@ flush_signals(current); /* make swsusp happy with our thread */ - try_to_freeze(); + try_todo_list(); if (test_bit(JOB_DIE, &ai->flags)) break; diff -urN oldtree/drivers/pcmcia/cs.c newtree/drivers/pcmcia/cs.c --- oldtree/drivers/pcmcia/cs.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/pcmcia/cs.c 2006-02-13 14:51:54.067945680 -0500 @@ -691,7 +691,7 @@ break; schedule(); - try_to_freeze(); + try_todo_list(); } /* make sure we are running before we exit */ set_current_state(TASK_RUNNING); diff -urN oldtree/drivers/pnp/pnpbios/core.c newtree/drivers/pnp/pnpbios/core.c --- oldtree/drivers/pnp/pnpbios/core.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/pnp/pnpbios/core.c 2006-02-13 14:51:54.067945680 -0500 @@ -172,7 +172,7 @@ msleep_interruptible(2000); if(signal_pending(current)) { - if (try_to_freeze()) + if (try_todo_list()) continue; break; } diff -urN oldtree/drivers/scsi/hosts.c newtree/drivers/scsi/hosts.c --- oldtree/drivers/scsi/hosts.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/scsi/hosts.c 2006-02-13 14:51:54.068945528 -0500 @@ -227,7 +227,7 @@ if (shost->transportt->create_work_queue) { snprintf(shost->work_q_name, KOBJ_NAME_LEN, "scsi_wq_%d", shost->host_no); - shost->work_q = create_singlethread_workqueue( + shost->work_q = create_nofreeze_singlethread_workqueue( shost->work_q_name); if (!shost->work_q) goto out_free_shost_data; diff -urN oldtree/drivers/scsi/lpfc/lpfc_init.c newtree/drivers/scsi/lpfc/lpfc_init.c --- oldtree/drivers/scsi/lpfc/lpfc_init.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/scsi/lpfc/lpfc_init.c 2006-02-13 14:51:54.069945376 -0500 @@ -1475,7 +1475,7 @@ phba->work_ha_mask |= (HA_RXMASK << (LPFC_ELS_RING * 4)); /* Startup the kernel thread for this host adapter. */ - phba->worker_thread = kthread_run(lpfc_do_work, phba, + phba->worker_thread = kthread_nofreeze_run(lpfc_do_work, phba, "lpfc_worker_%d", phba->brd_no); if (IS_ERR(phba->worker_thread)) { error = PTR_ERR(phba->worker_thread); diff -urN oldtree/drivers/usb/core/hub.c newtree/drivers/usb/core/hub.c --- oldtree/drivers/usb/core/hub.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/usb/core/hub.c 2006-02-13 14:51:54.070945224 -0500 @@ -2814,7 +2814,7 @@ wait_event_interruptible(khubd_wait, !list_empty(&hub_event_list) || kthread_should_stop()); - try_to_freeze(); + try_todo_list(); } while (!kthread_should_stop() || !list_empty(&hub_event_list)); pr_debug("%s: khubd exiting\n", usbcore_name); diff -urN oldtree/drivers/usb/gadget/file_storage.c newtree/drivers/usb/gadget/file_storage.c --- oldtree/drivers/usb/gadget/file_storage.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/usb/gadget/file_storage.c 2006-02-13 14:51:54.072944920 -0500 @@ -1550,7 +1550,7 @@ rc = wait_event_interruptible(fsg->thread_wqh, fsg->thread_wakeup_needed); fsg->thread_wakeup_needed = 0; - try_to_freeze(); + try_todo_list(); return (rc ? -EINTR : 0); } diff -urN oldtree/drivers/usb/net/pegasus.c newtree/drivers/usb/net/pegasus.c --- oldtree/drivers/usb/net/pegasus.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/usb/net/pegasus.c 2006-02-13 14:51:54.073944768 -0500 @@ -1417,7 +1417,7 @@ static int __init pegasus_init(void) { pr_info("%s: %s, " DRIVER_DESC "\n", driver_name, DRIVER_VERSION); - pegasus_workqueue = create_singlethread_workqueue("pegasus"); + pegasus_workqueue = create_nofreeze_singlethread_workqueue("pegasus"); if (!pegasus_workqueue) return -ENOMEM; return usb_register(&pegasus_driver); diff -urN oldtree/drivers/usb/storage/usb.c newtree/drivers/usb/storage/usb.c --- oldtree/drivers/usb/storage/usb.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/usb/storage/usb.c 2006-02-13 14:51:54.073944768 -0500 @@ -890,7 +890,7 @@ wait_event_interruptible_timeout(us->delay_wait, test_bit(US_FLIDX_DISCONNECTING, &us->flags), delay_use * HZ); - if (try_to_freeze()) + if (try_todo_list()) goto retry; } diff -urN oldtree/drivers/w1/w1.c newtree/drivers/w1/w1.c --- oldtree/drivers/w1/w1.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/drivers/w1/w1.c 2006-02-13 14:51:54.074944616 -0500 @@ -720,7 +720,7 @@ while (!control_needs_exit || have_to_wait) { have_to_wait = 0; - try_to_freeze(); + try_todo_list(); msleep_interruptible(w1_control_timeout * 1000); if (signal_pending(current)) @@ -796,7 +796,7 @@ allow_signal(SIGTERM); while (!test_bit(W1_MASTER_NEED_EXIT, &dev->flags)) { - try_to_freeze(); + try_todo_list(); msleep_interruptible(w1_timeout * 1000); if (signal_pending(current)) diff -urN oldtree/fs/afs/kafsasyncd.c newtree/fs/afs/kafsasyncd.c --- oldtree/fs/afs/kafsasyncd.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/afs/kafsasyncd.c 2006-02-13 14:51:54.074944616 -0500 @@ -116,7 +116,7 @@ remove_wait_queue(&kafsasyncd_sleepq, &myself); set_current_state(TASK_RUNNING); - try_to_freeze(); + try_todo_list(); /* discard pending signals */ afs_discard_my_signals(); diff -urN oldtree/fs/afs/kafstimod.c newtree/fs/afs/kafstimod.c --- oldtree/fs/afs/kafstimod.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/afs/kafstimod.c 2006-02-13 14:51:54.075944464 -0500 @@ -91,7 +91,7 @@ complete_and_exit(&kafstimod_dead, 0); } - try_to_freeze(); + try_todo_list(); /* discard pending signals */ afs_discard_my_signals(); diff -urN oldtree/fs/jbd/journal.c newtree/fs/jbd/journal.c --- oldtree/fs/jbd/journal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/jbd/journal.c 2006-02-13 14:51:54.075944464 -0500 @@ -153,7 +153,7 @@ } wake_up(&journal->j_wait_done_commit); - if (freezing(current)) { + if (todo_list_active()) { /* * The simpler the better. Flushing journal isn't a * good idea, because that depends on threads that may @@ -161,7 +161,7 @@ */ jbd_debug(1, "Now suspending kjournald\n"); spin_unlock(&journal->j_state_lock); - refrigerator(); + run_todo_list(); spin_lock(&journal->j_state_lock); } else { /* diff -urN oldtree/fs/jffs/intrep.c newtree/fs/jffs/intrep.c --- oldtree/fs/jffs/intrep.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/jffs/intrep.c 2006-02-13 14:51:54.077944160 -0500 @@ -3391,7 +3391,7 @@ siginfo_t info; unsigned long signr = 0; - if (try_to_freeze()) + if (try_todo_list()) continue; spin_lock_irq(¤t->sighand->siglock); diff -urN oldtree/fs/jffs2/background.c newtree/fs/jffs2/background.c --- oldtree/fs/jffs2/background.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/jffs2/background.c 2006-02-13 14:51:54.077944160 -0500 @@ -96,7 +96,7 @@ schedule(); } - if (try_to_freeze()) + if (try_todo_list()) continue; cond_resched(); diff -urN oldtree/fs/jfs/jfs_logmgr.c newtree/fs/jfs/jfs_logmgr.c --- oldtree/fs/jfs/jfs_logmgr.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/jfs/jfs_logmgr.c 2006-02-13 14:51:54.078944008 -0500 @@ -2362,9 +2362,9 @@ lbmStartIO(bp); spin_lock_irq(&log_redrive_lock); } - if (freezing(current)) { + if (todo_list_active()) { spin_unlock_irq(&log_redrive_lock); - refrigerator(); + run_todo_list(); } else { add_wait_queue(&jfs_IO_thread_wait, &wq); set_current_state(TASK_INTERRUPTIBLE); diff -urN oldtree/fs/jfs/jfs_txnmgr.c newtree/fs/jfs/jfs_txnmgr.c --- oldtree/fs/jfs/jfs_txnmgr.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/jfs/jfs_txnmgr.c 2006-02-13 14:51:54.079943856 -0500 @@ -2795,9 +2795,9 @@ /* In case a wakeup came while all threads were active */ jfs_commit_thread_waking = 0; - if (freezing(current)) { + if (todo_list_active()) { LAZY_UNLOCK(flags); - refrigerator(); + run_todo_list(); } else { DECLARE_WAITQUEUE(wq, current); @@ -2994,9 +2994,9 @@ /* Add anon_list2 back to anon_list */ list_splice_init(&TxAnchor.anon_list2, &TxAnchor.anon_list); - if (freezing(current)) { + if (todo_list_active()) { TXN_UNLOCK(); - refrigerator(); + run_todo_list(); } else { DECLARE_WAITQUEUE(wq, current); diff -urN oldtree/fs/lockd/clntlock.c newtree/fs/lockd/clntlock.c --- oldtree/fs/lockd/clntlock.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/lockd/clntlock.c 2006-02-13 14:51:54.080943704 -0500 @@ -237,6 +237,7 @@ fl->fl_u.nfs_fl.flags &= ~NFS_LCK_RECLAIM; nlmclnt_reclaim(host, fl); + try_todo_list(); if (signalled()) break; goto restart; diff -urN oldtree/fs/lockd/clntproc.c newtree/fs/lockd/clntproc.c --- oldtree/fs/lockd/clntproc.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/lockd/clntproc.c 2006-02-13 14:51:54.080943704 -0500 @@ -311,7 +311,7 @@ prepare_to_wait(queue, &wait, TASK_INTERRUPTIBLE); if (!signalled ()) { schedule_timeout(NLMCLNT_GRACE_WAIT); - try_to_freeze(); + try_todo_list(); if (!signalled ()) status = 0; } diff -urN oldtree/fs/lockd/svc.c newtree/fs/lockd/svc.c --- oldtree/fs/lockd/svc.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/lockd/svc.c 2006-02-13 14:51:54.081943552 -0500 @@ -138,6 +138,8 @@ while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) { long timeout = MAX_SCHEDULE_TIMEOUT; + try_todo_list(); + if (signalled()) { flush_signals(current); if (nlmsvc_ops) { diff -urN oldtree/fs/xfs/linux-2.6/xfs_buf.c newtree/fs/xfs/linux-2.6/xfs_buf.c --- oldtree/fs/xfs/linux-2.6/xfs_buf.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/xfs/linux-2.6/xfs_buf.c 2006-02-13 14:51:54.082943400 -0500 @@ -1709,9 +1709,9 @@ INIT_LIST_HEAD(&tmp); do { - if (unlikely(freezing(current))) { + if (unlikely(todo_list_active())) { xfsbufd_force_sleep = 1; - refrigerator(); + run_todo_list(); } else { xfsbufd_force_sleep = 0; } diff -urN oldtree/fs/xfs/linux-2.6/xfs_super.c newtree/fs/xfs/linux-2.6/xfs_super.c --- oldtree/fs/xfs/linux-2.6/xfs_super.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/fs/xfs/linux-2.6/xfs_super.c 2006-02-13 14:51:54.082943400 -0500 @@ -575,7 +575,7 @@ for (;;) { timeleft = schedule_timeout_interruptible(timeleft); /* swsusp */ - try_to_freeze(); + try_todo_list(); if (kthread_should_stop()) break; diff -urN oldtree/include/asm-arm/hw_irq.h newtree/include/asm-arm/hw_irq.h --- oldtree/include/asm-arm/hw_irq.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-arm/hw_irq.h 2006-02-13 14:51:54.082943400 -0500 @@ -0,0 +1,4 @@ +#ifndef __ASM_HARDIRQ_H +#define __ASM_HARDIRQ_H +#include +#endif diff -urN oldtree/include/asm-arm/suspend2.h newtree/include/asm-arm/suspend2.h --- oldtree/include/asm-arm/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-arm/suspend2.h 2006-02-13 14:51:54.083943248 -0500 @@ -0,0 +1,136 @@ +#ifndef _ASMARM_SUSPEND_H +#define _ASMARM_SUSPEND_H +/* + * Based on code + * Copyright 2005 Sony Corporation + * Copyright 2003-2004 Nigel Cunningham + * Copyright 2001-2002 Pavel Machek + * Copyright 2001 Patrick Mochel + */ + +/* image of the saved processor state */ +struct suspend2_saved_context { + /* general registers */ + __u32 r[15]; + + /* coprocessor 15 registers */ +/* __u32 ID_code; read only reg */ +/* __u32 cache_type; read only reg */ +/* __u32 TCM_stat; read only reg */ + __u32 CR; + __u32 TTBR; + __u32 DACR; + __u32 D_FSR; + __u32 I_FSR; + __u32 FAR; +/* __u32 COR; write only reg */ +/* __u32 TLBOR; write only reg */ + __u32 D_CLR; + __u32 I_CLR; + __u32 D_TCMRR; + __u32 I_TCMRR; + __u32 TLBLR; + __u32 FCSE; + __u32 CID; +} __attribute__((packed)); +typedef struct suspend2_saved_context suspend2_saved_context_t; + +/* temporary storage */ +extern struct suspend2_saved_context suspend2_saved_context; + +static inline void suspend2_arch_save_processor_context(void) +{ + /* save general registers */ + asm volatile ("stmia %0, {r4-r14}" + :: "r" (suspend2_saved_context.r)); + /* save coprocessor 15 registers */ + asm volatile ("mrc p15, 0, %0, c1, c0, 0" + : "=r" (suspend2_saved_context.CR)); + asm volatile ("mrc p15, 0, %0, c3, c0, 0" + : "=r" (suspend2_saved_context.DACR)); + asm volatile ("mrc p15, 0, %0, c5, c0, 0" + : "=r" (suspend2_saved_context.D_FSR)); + asm volatile ("mrc p15, 0, %0, c5, c0, 1" + : "=r" (suspend2_saved_context.I_FSR)); + asm volatile ("mrc p15, 0, %0, c6, c0, 0" + : "=r" (suspend2_saved_context.FAR)); + asm volatile ("mrc p15, 0, %0, c9, c0, 0" + : "=r" (suspend2_saved_context.D_CLR)); + asm volatile ("mrc p15, 0, %0, c9, c0, 1" + : "=r" (suspend2_saved_context.I_CLR)); + asm volatile ("mrc p15, 0, %0, c9, c1, 0" + : "=r" (suspend2_saved_context.D_TCMRR)); + asm volatile ("mrc p15, 0, %0, c9, c1, 1" + : "=r" (suspend2_saved_context.I_TCMRR)); + asm volatile ("mrc p15, 0, %0, c10, c0, 0" + : "=r" (suspend2_saved_context.TLBLR)); + asm volatile ("mrc p15, 0, %0, c13, c0, 0" + : "=r" (suspend2_saved_context.FCSE)); + asm volatile ("mrc p15, 0, %0, c13, c0, 1" + : "=r" (suspend2_saved_context.CID)); + asm volatile ("mrc p15, 0, %0, c2, c0, 0" + : "=r" (suspend2_saved_context.TTBR)); +} + +static inline void suspend2_arch_restore_processor_context(void) +{ + /* restore coprocessor 15 registers */ + asm volatile ("mcr p15, 0, %0, c2, c0, 0" + :: "r" (suspend2_saved_context.TTBR)); + asm volatile ("mcr p15, 0, %0, c13, c0, 1" + :: "r" (suspend2_saved_context.CID)); + asm volatile ("mcr p15, 0, %0, c13, c0, 0" + :: "r" (suspend2_saved_context.FCSE)); + asm volatile ("mcr p15, 0, %0, c10, c0, 0" + :: "r" (suspend2_saved_context.TLBLR)); + asm volatile ("mcr p15, 0, %0, c9, c1, 1" + :: "r" (suspend2_saved_context.I_TCMRR)); + asm volatile ("mcr p15, 0, %0, c9, c1, 0" + :: "r" (suspend2_saved_context.D_TCMRR)); + asm volatile ("mcr p15, 0, %0, c9, c0, 1" + :: "r" (suspend2_saved_context.I_CLR)); + asm volatile ("mcr p15, 0, %0, c9, c0, 0" + :: "r" (suspend2_saved_context.D_CLR)); + asm volatile ("mcr p15, 0, %0, c6, c0, 0" + :: "r" (suspend2_saved_context.FAR)); + asm volatile ("mcr p15, 0, %0, c5, c0, 1" + :: "r" (suspend2_saved_context.I_FSR)); + asm volatile ("mcr p15, 0, %0, c5, c0, 0" + :: "r" (suspend2_saved_context.D_FSR)); + asm volatile ("mcr p15, 0, %0, c3, c0, 0" + :: "r" (suspend2_saved_context.DACR)); + asm volatile ("mcr p15, 0, %0, c1, c0, 0" + :: "r" (suspend2_saved_context.CR)); + + /* restore general registers */ + asm volatile ("ldmia r3, {r4-r14}" : "=m" (suspend2_saved_context.r)); +} + +static inline void save_context(void) +{ +} + +static inline void restore_context(void) +{ +} + +static inline void suspend2_arch_pre_copy(void) +{ +} + +static inline void suspend2_arch_post_copy(void) +{ +} + +static inline void suspend2_arch_pre_copyback(void) +{ +} + +static inline void suspend2_arch_post_copyback(void) +{ +} + +static inline void suspend2_arch_flush_caches(void) +{ +} +#endif diff -urN oldtree/include/asm-i386/mach-default/mach_time.h newtree/include/asm-i386/mach-default/mach_time.h --- oldtree/include/asm-i386/mach-default/mach_time.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/asm-i386/mach-default/mach_time.h 2006-02-13 14:51:54.083943248 -0500 @@ -79,24 +79,19 @@ return retval; } -static inline unsigned long mach_get_cmos_time(void) +/* __get_cmos_time + * + * Separated out from mach_get_cmos_time so that we can + * quickly get the cmos time when we don't care about + * whether the second has just started. + * + * Used from suspend and resume sysdev calls. + */ +static inline unsigned long __get_cmos_time(void) { unsigned int year, mon, day, hour, min, sec; - int i; - /* The Linux interpretation of the CMOS clock register contents: - * When the Update-In-Progress (UIP) flag goes from 1 to 0, the - * RTC registers show the second which has precisely just started. - * Let's hope other operating systems interpret the RTC the same way. - */ - /* read RTC exactly on falling edge of update flag */ - for (i = 0 ; i < 1000000 ; i++) /* may take up to 1 second... */ - if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) - break; - for (i = 0 ; i < 1000000 ; i++) /* must try at least 2.228 ms */ - if (!(CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) - break; - do { /* Isn't this overkill ? UIP above should guarantee consistency */ + do { sec = CMOS_READ(RTC_SECONDS); min = CMOS_READ(RTC_MINUTES); hour = CMOS_READ(RTC_HOURS); @@ -104,6 +99,7 @@ mon = CMOS_READ(RTC_MONTH); year = CMOS_READ(RTC_YEAR); } while (sec != CMOS_READ(RTC_SECONDS)); + if (!(CMOS_READ(RTC_CONTROL) & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { BCD_TO_BIN(sec); @@ -119,4 +115,24 @@ return mktime(year, mon, day, hour, min, sec); } +static inline unsigned long mach_get_cmos_time(void) +{ + int i; + + /* The Linux interpretation of the CMOS clock register contents: + * When the Update-In-Progress (UIP) flag goes from 1 to 0, the + * RTC registers show the second which has precisely just started. + * Let's hope other operating systems interpret the RTC the same way. + */ + /* read RTC exactly on falling edge of update flag */ + for (i = 0 ; i < 1000000 ; i++) /* may take up to 1 second... */ + if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) + break; + for (i = 0 ; i < 1000000 ; i++) /* must try at least 2.228 ms */ + if (!(CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP)) + break; + + return __get_cmos_time(); +} + #endif /* !_MACH_TIME_H */ diff -urN oldtree/include/asm-i386/suspend.h newtree/include/asm-i386/suspend.h --- oldtree/include/asm-i386/suspend.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/asm-i386/suspend.h 2006-02-13 14:51:54.083943248 -0500 @@ -3,6 +3,7 @@ * Based on code * Copyright 2001 Patrick Mochel */ +#include #include #include diff -urN oldtree/include/asm-i386/suspend2.h newtree/include/asm-i386/suspend2.h --- oldtree/include/asm-i386/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-i386/suspend2.h 2006-02-13 14:51:54.084943096 -0500 @@ -0,0 +1,288 @@ + /* + * Copyright 2003-2005 Nigel Cunningham + * Based on code + * Copyright 2001-2002 Pavel Machek + * Based on code + * Copyright 2001 Patrick Mochel + */ +#include +#include +#include +#include +#include +#include + +/* image of the saved processor states */ +struct suspend2_saved_context { + u32 eax, ebx, ecx, edx; + u32 esp, ebp, esi, edi; + u16 es, fs, gs, ss; + u32 cr0, cr2, cr3, cr4; + u16 gdt_pad; + u16 gdt_limit; + u32 gdt_base; + u16 idt_pad; + u16 idt_limit; + u32 idt_base; + u16 ldt; + u16 tss; + u32 tr; + u32 safety; + u32 return_address; + u32 eflags; +} __attribute__((packed)); +typedef struct suspend2_saved_context suspend2_saved_context_t; + +/* temporary storage */ +extern struct suspend2_saved_context suspend2_saved_context; + +/* + * save_processor_context + * + * Save the state of the processor before we go to sleep. + * + * return_stack is the value of the stack pointer (%esp) as the caller sees it. + * A good way could not be found to obtain it from here (don't want to make + * _too_ many assumptions about the layout of the stack this far down.) Also, + * the handy little __builtin_frame_pointer(level) where level > 0, is blatantly + * buggy - it returns the value of the stack at the proper location, not the + * location, like it should (as of gcc 2.91.66) + * + * Note that the context and timing of this function is pretty critical. + * With a minimal amount of things going on in the caller and in here, gcc + * does a good job of being just a dumb compiler. Watch the assembly output + * if anything changes, though, and make sure everything is going in the right + * place. + */ +static inline void suspend2_arch_save_processor_context(void) +{ + kernel_fpu_begin(); + + /* + * descriptor tables + */ + asm volatile ("sgdt (%0)" : "=m" (suspend2_saved_context.gdt_limit)); + asm volatile ("sidt (%0)" : "=m" (suspend2_saved_context.idt_limit)); + asm volatile ("sldt (%0)" : "=m" (suspend2_saved_context.ldt)); + asm volatile ("str (%0)" : "=m" (suspend2_saved_context.tr)); + + /* + * save the general registers. + * note that gcc has constructs to specify output of certain registers, + * but they're not used here, because it assumes that you want to modify + * those registers, so it tries to be smart and save them beforehand. + * It's really not necessary, and kinda fishy (check the assembly output), + * so it's avoided. + */ + asm volatile ("movl %%esp, (%0)" : "=m" (suspend2_saved_context.esp)); + asm volatile ("movl %%eax, (%0)" : "=m" (suspend2_saved_context.eax)); + asm volatile ("movl %%ebx, (%0)" : "=m" (suspend2_saved_context.ebx)); + asm volatile ("movl %%ecx, (%0)" : "=m" (suspend2_saved_context.ecx)); + asm volatile ("movl %%edx, (%0)" : "=m" (suspend2_saved_context.edx)); + asm volatile ("movl %%ebp, (%0)" : "=m" (suspend2_saved_context.ebp)); + asm volatile ("movl %%esi, (%0)" : "=m" (suspend2_saved_context.esi)); + asm volatile ("movl %%edi, (%0)" : "=m" (suspend2_saved_context.edi)); + + /* + * segment registers + */ + asm volatile ("movw %%es, %0" : "=r" (suspend2_saved_context.es)); + asm volatile ("movw %%fs, %0" : "=r" (suspend2_saved_context.fs)); + asm volatile ("movw %%gs, %0" : "=r" (suspend2_saved_context.gs)); + asm volatile ("movw %%ss, %0" : "=r" (suspend2_saved_context.ss)); + + /* + * control registers + */ + asm volatile ("movl %%cr0, %0" : "=r" (suspend2_saved_context.cr0)); + asm volatile ("movl %%cr2, %0" : "=r" (suspend2_saved_context.cr2)); + asm volatile ("movl %%cr3, %0" : "=r" (suspend2_saved_context.cr3)); + asm volatile ("movl %%cr4, %0" : "=r" (suspend2_saved_context.cr4)); + + /* + * eflags + */ + asm volatile ("pushfl ; popl (%0)" : "=m" (suspend2_saved_context.eflags)); +} + +static void fix_processor_context(void) +{ + struct tss_struct *t = &per_cpu(init_tss,0); + + /* This just modifies memory; should not be neccessary. But... This is + * neccessary, because 386 hardware has concept of busy tsc or some + * similar stupidity. */ + set_tss_desc(0,t); + per_cpu(cpu_gdt_table,0)[GDT_ENTRY_TSS].b &= 0xfffffdff; + + load_TR_desc(); + + load_LDT(¤t->active_mm->context); /* This does lldt */ + + /* + * Now maybe reload the debug registers + */ + if (current->thread.debugreg[7]){ + set_debugreg(¤t->thread.debugreg[0], 0); + set_debugreg(¤t->thread.debugreg[1], 1); + set_debugreg(¤t->thread.debugreg[2], 2); + set_debugreg(¤t->thread.debugreg[3], 3); + /* no 4 and 5 */ + set_debugreg(¤t->thread.debugreg[6], 6); + set_debugreg(¤t->thread.debugreg[7], 7); + } + +} + +static void do_fpu_end(void) +{ + /* restore FPU regs if necessary */ + /* Do it out of line so that gcc does not move cr0 load to some stupid + * place */ + kernel_fpu_end(); +} + +#if defined(CONFIG_SUSPEND2) || defined(CONFIG_SMP) +static unsigned long c_loops_per_jiffy_ref __nosavedata; +#endif + +#ifdef CONFIG_SUSPEND2 +#ifndef CONFIG_SMP +extern unsigned long loops_per_jiffy; +volatile static unsigned long cpu_khz_ref __nosavedata = 0; +#endif + +static inline void suspend2_arch_pre_copy(void) { } +static inline void suspend2_arch_post_copy(void) { } + +static inline void suspend2_arch_pre_copyback(void) +{ + /* We want to run from swsusp_pg_dir, since swsusp_pg_dir is stored in + * constant place in memory. + */ + + __asm__( "movl %%ecx,%%cr3\n" ::"c"(__pa(swsusp_pg_dir))); + + c_loops_per_jiffy_ref = + current_cpu_data.loops_per_jiffy; +#ifndef CONFIG_SMP + cpu_khz_ref = cpu_khz; + c_loops_per_jiffy_ref = loops_per_jiffy; +#endif + +} + +/* + * restore_processor_context + * + * Restore the processor context as it was before we went to sleep + * - descriptor tables + * - control registers + * - segment registers + * - flags + * + * Note that it is critical that this function is declared inline. + * It was separated out from restore_state to make that function + * a little clearer, but it needs to be inlined because we won't have a + * stack when we get here (so we can't push a return address). + */ +static inline void suspend2_arch_restore_processor_context(void) +{ + /* + * first restore %ds, so we can access our data properly + */ + asm volatile (".align 4"); + asm volatile ("movw %0, %%ds" :: "r" ((u16)__KERNEL_DS)); + + + /* + * control registers + */ + asm volatile ("movl %0, %%cr4" :: "r" (suspend2_saved_context.cr4)); + asm volatile ("movl %0, %%cr3" :: "r" (suspend2_saved_context.cr3)); + asm volatile ("movl %0, %%cr2" :: "r" (suspend2_saved_context.cr2)); + asm volatile ("movl %0, %%cr0" :: "r" (suspend2_saved_context.cr0)); + + /* + * segment registers + */ + asm volatile ("movw %0, %%es" :: "r" (suspend2_saved_context.es)); + asm volatile ("movw %0, %%fs" :: "r" (suspend2_saved_context.fs)); + asm volatile ("movw %0, %%gs" :: "r" (suspend2_saved_context.gs)); + asm volatile ("movw %0, %%ss" :: "r" (suspend2_saved_context.ss)); + + /* + * the other general registers + * + * note that even though gcc has constructs to specify memory + * input into certain registers, it will try to be too smart + * and save them at the beginning of the function. This is esp. + * bad since we don't have a stack set up when we enter, and we + * want to preserve the values on exit. So, we set them manually. + */ + asm volatile ("movl %0, %%esp" :: "m" (suspend2_saved_context.esp)); + asm volatile ("movl %0, %%ebp" :: "m" (suspend2_saved_context.ebp)); + asm volatile ("movl %0, %%eax" :: "m" (suspend2_saved_context.eax)); + asm volatile ("movl %0, %%ebx" :: "m" (suspend2_saved_context.ebx)); + asm volatile ("movl %0, %%ecx" :: "m" (suspend2_saved_context.ecx)); + asm volatile ("movl %0, %%edx" :: "m" (suspend2_saved_context.edx)); + asm volatile ("movl %0, %%esi" :: "m" (suspend2_saved_context.esi)); + asm volatile ("movl %0, %%edi" :: "m" (suspend2_saved_context.edi)); + + /* + * now restore the descriptor tables to their proper values + * ltr is done in fix_processor_context(). + */ + + asm volatile ("lgdt (%0)" :: "m" (suspend2_saved_context.gdt_limit)); + asm volatile ("lidt (%0)" :: "m" (suspend2_saved_context.idt_limit)); + asm volatile ("lldt (%0)" :: "m" (suspend2_saved_context.ldt)); + + /* tell gcc that we clobbered all the registers... + * otherwise it might keep some addresses there. + * Unfortunately gcc 4 thinks it's smart and will + * error out if we tell it we're clobbering ebp as + * well. So we have to lie. + */ + asm volatile ("" : : : "esp", "eax", "ebx", "ecx", "edx", "esi", "edi"); + + if (boot_cpu_has(X86_FEATURE_SEP)) + enable_sep_cpu(); + + fix_processor_context(); + + /* + * the flags + */ + asm volatile ("pushl %0 ; popfl" :: "m" (suspend2_saved_context.eflags)); + + do_fpu_end(); + + mtrr_ap_init(); + mcheck_init(&boot_cpu_data); +} + +static inline void suspend2_arch_flush_caches(void) +{ +#ifdef CONFIG_SMP + cpu_clear(0, per_cpu(cpu_tlbstate, + 0).active_mm->cpu_vm_mask); +#endif + wbinvd(); + __flush_tlb_all(); + +} + +static inline void suspend2_arch_post_copyback(void) +{ + BUG_ON(!irqs_disabled()); + + current_cpu_data.loops_per_jiffy = + c_loops_per_jiffy_ref; +#ifndef CONFIG_SMP + loops_per_jiffy = c_loops_per_jiffy_ref; + cpu_khz = cpu_khz_ref; +#endif +} + +#endif diff -urN oldtree/include/asm-i386/tlbflush.h newtree/include/asm-i386/tlbflush.h --- oldtree/include/asm-i386/tlbflush.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/asm-i386/tlbflush.h 2006-02-13 14:51:54.084943096 -0500 @@ -84,6 +84,7 @@ #define flush_tlb() __flush_tlb() #define flush_tlb_all() __flush_tlb_all() #define local_flush_tlb() __flush_tlb() +#define local_flush_tlb_all() __flush_tlb_all() static inline void flush_tlb_mm(struct mm_struct *mm) { @@ -116,6 +117,10 @@ extern void flush_tlb_current_task(void); extern void flush_tlb_mm(struct mm_struct *); extern void flush_tlb_page(struct vm_area_struct *, unsigned long); +extern void do_flush_tlb_all(void *info); + +#define local_flush_tlb_all() \ + do_flush_tlb_all(NULL) #define flush_tlb() flush_tlb_current_task() diff -urN oldtree/include/asm-ppc/cpu_context.h newtree/include/asm-ppc/cpu_context.h --- oldtree/include/asm-ppc/cpu_context.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-ppc/cpu_context.h 2006-02-13 14:51:54.084943096 -0500 @@ -0,0 +1,110 @@ +/* + * Written by Hu Gang (hugang@soulinfo.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include + +/* image of the saved processor states */ +struct saved_context { + u32 lr, cr, sp, r2; + u32 r[20]; /* r12 - r31 */ + u32 sprg[4]; + u32 msr, sdr1, tb1, tb2; +} __attribute__((packed)); + +inline static void __save_processor_state(struct saved_context *s) +{ + /*asm volatile ("mflr 0; stw 0,%0" : "=m" (s->lr));*/ + asm volatile ("mfcr 0; stw 0,%0" : "=m" (s->cr)); + asm volatile ("stw 1,%0" : "=m" (s->sp)); + asm volatile ("stw 2,%0" : "=m" (s->r2)); + asm volatile ("stmw 12,%0" : "=m" (s->r)); + + /* Save MSR & SDR1 */ + asm volatile ("mfmsr 4; stw 4,%0" : "=m" (s->msr)); + asm volatile ("mfsdr1 4; stw 4,%0": "=m" (s->sdr1)); + + /* Get a stable timebase and save it */ + asm volatile ("1:\n" + "mftbu 4;stw 4,%0\n" + "mftb 5;stw 5,%1\n" + "mftbu 3\n" + "cmpw 3,4;\n" + "bne 1b" : + "=m" (s->tb1), + "=m" (s->tb2)); + + /* Save SPRGs */ + asm volatile ("mfsprg 4,0; stw 4,%0 " : "=m" (s->sprg[0])); + asm volatile ("mfsprg 4,1; stw 4,%0 " : "=m" (s->sprg[1])); + asm volatile ("mfsprg 4,2; stw 4,%0 " : "=m" (s->sprg[2])); + asm volatile ("mfsprg 4,3; stw 4,%0 " : "=m" (s->sprg[3])); +} + +inline static void __restore_processor_state(struct saved_context *s) +{ + /* Restore the BATs, and SDR1 */ + asm volatile ("lwz 4,%0; mtsdr1 4" : "=m" (s->sdr1)); + /* asm volatile ("lwz 3,%0" : "=m" (saved_context.msr)); */ + + asm volatile ("lwz 4,%0; mtsprg 0,4": "=m" (s->sprg[0])); + asm volatile ("lwz 4,%0; mtsprg 1,4": "=m" (s->sprg[1])); + asm volatile ("lwz 4,%0; mtsprg 2,4": "=m" (s->sprg[2])); + asm volatile ("lwz 4,%0; mtsprg 3,4": "=m" (s->sprg[3])); + + /* Restore TB */ + asm volatile ("li 3,0; mttbl 3; \n" + "lwz 3,%0\n; lwz 4,%1\n" + "mttbu 3; mttbl 4" : + "=m" (s->tb1), + "=m" (s->tb2)); + + /* Restore the callee-saved registers and return */ + asm volatile ("lmw 12,%0" : "=m" (s->r)); + asm volatile ("lwz 2,%0" : "=m" (s->r2)); + asm volatile ("lwz 1,%0" : "=m" (s->sp)); + asm volatile ("lwz 0,%0; mtcr 0" : "=m" (s->cr)); + + /* tell gcc that we clobbered all the registers... + * otherwise it might keep some addresses there. */ + asm volatile ("" : : : "r13", "r14", "r15", "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31"); + /*asm volatile ("lwz 0,%0; mtlr 0" : "=m" (s->lr));*/ +} + +static inline void save_context(void) +{ +#ifdef CONFIG_ADB_PMU + printk("pmu suspend\n"); + pmu_suspend(); +#endif +} + +extern void enable_kernel_altivec(void); + +static inline void restore_context(void) +{ + printk("set context: <%p>\n", current); + set_context(current->active_mm->context, + current->active_mm->pgd); + +#ifdef CONFIG_ADB_PMU + printk("pmu_resume\n"); + pmu_resume(); +#endif + +#ifdef CONFIG_ALTIVEC + if (cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC) { + printk("enable altivec\n"); + enable_kernel_altivec(); + } +#endif + printk("enable fp\n"); + enable_kernel_fp(); +} diff -urN oldtree/include/asm-ppc/suspend2.h newtree/include/asm-ppc/suspend2.h --- oldtree/include/asm-ppc/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-ppc/suspend2.h 2006-02-13 14:51:54.085942944 -0500 @@ -0,0 +1,47 @@ +/* + * Written by Hu Gang (hugang@soulinfo.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include "asm/cpu_context.h" + +typedef struct saved_context suspend2_saved_context_t; + +extern struct saved_context suspend2_saved_context; + +static inline void suspend2_arch_save_processor_context(void) +{ + __save_processor_state(&suspend2_saved_context); +} + +static inline void suspend2_arch_restore_processor_context(void) +{ + __restore_processor_state(&suspend2_saved_context); + + restore_context(); +} + +static inline void suspend2_arch_pre_copy(void) +{ +} + +static inline void suspend2_arch_post_copy(void) +{ +} + +static inline void suspend2_arch_pre_copyback(void) +{ + save_context(); +} + +static inline void suspend2_arch_post_copyback(void) +{ +} + +static inline void suspend2_arch_flush_caches(void) +{ +} diff -urN oldtree/include/asm-x86_64/suspend.h newtree/include/asm-x86_64/suspend.h --- oldtree/include/asm-x86_64/suspend.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/asm-x86_64/suspend.h 2006-02-13 14:51:54.085942944 -0500 @@ -43,8 +43,6 @@ : /* no output */ \ :"r" ((thread)->debugreg##register)) -extern void fix_processor_context(void); - #ifdef CONFIG_ACPI_SLEEP extern unsigned long saved_eip; extern unsigned long saved_esp; diff -urN oldtree/include/asm-x86_64/suspend2.h newtree/include/asm-x86_64/suspend2.h --- oldtree/include/asm-x86_64/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/asm-x86_64/suspend2.h 2006-02-13 14:51:54.085942944 -0500 @@ -0,0 +1,361 @@ + /* + * Copyright 2005 Nigel Cunningham + * Based on code + * Copyright 2001-2002 Pavel Machek + * Based on code + * Copyright 2001 Patrick Mochel + */ + +#include +#include +#include +#include +#include +#include +#include + +extern pgd_t *temp_level4_pgt; +extern int suspend2_mapping_prepare(void); + +/* image of the saved processor states */ +struct suspend2_saved_context { + unsigned long eax, ebx, ecx, edx; + unsigned long esp, ebp, esi, edi; + unsigned long r8, r9, r10, r11; + unsigned long r12, r13, r14, r15; + + u16 ds, es, fs, gs, ss; + unsigned long gs_base, gs_kernel_base, fs_base; + unsigned long cr0, cr2, cr3, cr4, cr8; + u16 gdt_pad; + u16 gdt_limit; + unsigned long gdt_base; + u16 idt_pad; + u16 idt_limit; + unsigned long idt_base; + u16 ldt; + u16 tss; + unsigned long tr; + unsigned long safety; + unsigned long return_address; + unsigned long eflags; +} __attribute__((packed)); + +typedef struct suspend2_saved_context suspend2_saved_context_t; + +/* temporary storage */ +extern struct suspend2_saved_context suspend2_saved_context; + +/* + * save_processor_context + * + * Save the state of the processor before we go to sleep. + * + * return_stack is the value of the stack pointer (%esp) as the caller sees it. + * A good way could not be found to obtain it from here (don't want to make _too_ + * many assumptions about the layout of the stack this far down.) Also, the + * handy little __builtin_frame_pointer(level) where level > 0, is blatantly + * buggy - it returns the value of the stack at the proper location, not the + * location, like it should (as of gcc 2.91.66) + * + * Note that the context and timing of this function is pretty critical. + * With a minimal amount of things going on in the caller and in here, gcc + * does a good job of being just a dumb compiler. Watch the assembly output + * if anything changes, though, and make sure everything is going in the right + * place. + */ +static inline void suspend2_arch_save_processor_context(void) +{ + kernel_fpu_begin(); + + /* + * descriptor tables + */ + asm volatile ("sgdt %0" : "=m" (suspend2_saved_context.gdt_limit)); + asm volatile ("sidt %0" : "=m" (suspend2_saved_context.idt_limit)); + asm volatile ("str %0" : "=m" (suspend2_saved_context.tr)); + + /* + * segment registers + */ + asm volatile ("movw %%ds, %0" : "=r" (suspend2_saved_context.ds)); + asm volatile ("movw %%es, %0" : "=r" (suspend2_saved_context.es)); + asm volatile ("movw %%fs, %0" : "=r" (suspend2_saved_context.fs)); + asm volatile ("movw %%gs, %0" : "=r" (suspend2_saved_context.gs)); + asm volatile ("movw %%ss, %0" : "=r" (suspend2_saved_context.ss)); + + rdmsrl(MSR_FS_BASE, suspend2_saved_context.fs_base); + rdmsrl(MSR_GS_BASE, suspend2_saved_context.gs_base); + rdmsrl(MSR_KERNEL_GS_BASE, suspend2_saved_context.gs_kernel_base); + + /* + * control registers + */ + asm volatile ("movq %%cr0, %0" : "=r" (suspend2_saved_context.cr0)); + asm volatile ("movq %%cr2, %0" : "=r" (suspend2_saved_context.cr2)); + asm volatile ("movq %%cr3, %0" : "=r" (suspend2_saved_context.cr3)); + asm volatile ("movq %%cr4, %0" : "=r" (suspend2_saved_context.cr4)); + asm volatile ("movq %%cr8, %0" : "=r" (suspend2_saved_context.cr8)); + + /* + * save the general registers. + * note that gcc has constructs to specify output of certain registers, + * but they're not used here, because it assumes that you want to modify + * those registers, so it tries to be smart and save them beforehand. + * It's really not necessary, and kinda fishy (check the assembly output), + * so it's avoided. + */ + + asm volatile ("movq %%rsp, %0" : "=m" (suspend2_saved_context.esp)); + + asm volatile ("movq %%rax, %0" : "=m" (suspend2_saved_context.eax)); + asm volatile ("movq %%rbx, %0" : "=m" (suspend2_saved_context.ebx)); + asm volatile ("movq %%rcx, %0" : "=m" (suspend2_saved_context.ecx)); + asm volatile ("movq %%rdx, %0" : "=m" (suspend2_saved_context.edx)); + asm volatile ("movq %%rbp, %0" : "=m" (suspend2_saved_context.ebp)); + asm volatile ("movq %%rsi, %0" : "=m" (suspend2_saved_context.esi)); + asm volatile ("movq %%rdi, %0" : "=m" (suspend2_saved_context.edi)); + asm volatile ("movq %%r8, %0" : "=m" (suspend2_saved_context.r8)); + asm volatile ("movq %%r9, %0" : "=m" (suspend2_saved_context.r9)); + asm volatile ("movq %%r10, %0" : "=m" (suspend2_saved_context.r10)); + asm volatile ("movq %%r11, %0" : "=m" (suspend2_saved_context.r11)); + asm volatile ("movq %%r12, %0" : "=m" (suspend2_saved_context.r12)); + asm volatile ("movq %%r13, %0" : "=m" (suspend2_saved_context.r13)); + asm volatile ("movq %%r14, %0" : "=m" (suspend2_saved_context.r14)); + asm volatile ("movq %%r15, %0" : "=m" (suspend2_saved_context.r15)); + + /* + * eflags + */ + asm volatile ("pushfq ; popq %0" : "=m" (suspend2_saved_context.eflags)); + +} + +static void fix_processor_context(void) +{ + struct tss_struct * t = &per_cpu(init_tss,0); + + set_tss_desc(0,t); /* This just modifies memory; should not be neccessary. But... This is neccessary, because 386 hardware has concept of busy tsc or some similar stupidity. */ + cpu_gdt_table[0][GDT_ENTRY_TSS].type = 9; + + syscall_init(); /* This sets MSR_*STAR and related */ + load_TR_desc(); + load_LDT(¤t->active_mm->context); /* This does lldt */ + + /* + * Now maybe reload the debug registers + */ + if (current->thread.debugreg7){ + loaddebug(¤t->thread, 0); + loaddebug(¤t->thread, 1); + loaddebug(¤t->thread, 2); + loaddebug(¤t->thread, 3); + /* no 4 and 5 */ + loaddebug(¤t->thread, 6); + loaddebug(¤t->thread, 7); + } +} + +static void do_fpu_end(void) +{ + /* restore FPU regs if necessary */ + /* Do it out of line so that gcc does not move cr0 load to some stupid place */ + kernel_fpu_end(); + mxcsr_feature_mask_init(); +} + +/* + * restore_processor_context + * + * Restore the processor context as it was before we went to sleep + * - descriptor tables + * - control registers + * - segment registers + * - flags + * + * Note that it is critical that this function is declared inline. + * It was separated out from restore_state to make that function + * a little clearer, but it needs to be inlined because we won't have a + * stack when we get here (so we can't push a return address). + */ +static inline void restore_processor_context(void) +{ + /* + * Credit for this goes to the swsusp code. Restoring the + * CPU context is the one thing we still do in the same + * way, and swsusp did it right first. + * + * 0xffffffff80000000UL is __START_KERNEL_map. + */ + + __asm__ __volatile__( + "leaq init_level4_pgt(%rip), %rax; \n" + "subq $0xffffffff80000000, %rax; \n" + "movq %rax, %cr3; \n" + "movq mmu_cr4_features(%rip), %rax; \n" + "movq %rax, %rdx; \n" + "andq $~(1<<7), %rdx; # PGE \n" + "movq %rdx, %cr4; # turn off PGE \n" + "movq %cr3, %rcx; # flush TLB \n" + "movq %rcx, %cr3; \n" + "movq %rax, %cr4; # turn PGE back on; \n" + + "movl $24, %eax; \n" + "movl %eax, %ds \n"); + /* + * the other general registers + * + * note that even though gcc has constructs to specify memory + * input into certain registers, it will try to be too smart + * and save them at the beginning of the function. This is esp. + * bad since we don't have a stack set up when we enter, and we + * want to preserve the values on exit. So, we set them manually. + */ + asm volatile ("movq %0, %%rsp" :: "m" (suspend2_saved_context.esp)); + asm volatile ("movq %0, %%rbp" :: "m" (suspend2_saved_context.ebp)); + asm volatile ("movq %0, %%rbx" :: "m" (suspend2_saved_context.ebx)); + asm volatile ("movq %0, %%rcx" :: "m" (suspend2_saved_context.ecx)); + asm volatile ("movq %0, %%rdx" :: "m" (suspend2_saved_context.edx)); + asm volatile ("movq %0, %%rsi" :: "m" (suspend2_saved_context.esi)); + asm volatile ("movq %0, %%rdi" :: "m" (suspend2_saved_context.edi)); + asm volatile ("movq %0, %%r8" :: "m" (suspend2_saved_context.r8)); + asm volatile ("movq %0, %%r9" :: "m" (suspend2_saved_context.r9)); + asm volatile ("movq %0, %%r10" :: "m" (suspend2_saved_context.r10)); + asm volatile ("movq %0, %%r11" :: "m" (suspend2_saved_context.r11)); + asm volatile ("movq %0, %%r12" :: "m" (suspend2_saved_context.r12)); + asm volatile ("movq %0, %%r13" :: "m" (suspend2_saved_context.r13)); + asm volatile ("movq %0, %%r14" :: "m" (suspend2_saved_context.r14)); + asm volatile ("movq %0, %%r15" :: "m" (suspend2_saved_context.r15)); + + /* + * the flags + */ + asm volatile ("pushq %0 ; popfq" :: "m" (suspend2_saved_context.eflags)); + + asm volatile ("xorq %rax, %rax"); + + /* + * control registers + */ + asm volatile ("movq %0, %%cr8" :: "r" (suspend2_saved_context.cr8)); + asm volatile ("movq %0, %%cr4" :: "r" (suspend2_saved_context.cr4)); + asm volatile ("movq %0, %%cr3" :: "r" (suspend2_saved_context.cr3)); + asm volatile ("movq %0, %%cr2" :: "r" (suspend2_saved_context.cr2)); + asm volatile ("movq %0, %%cr0" :: "r" (suspend2_saved_context.cr0)); + + /* + * now restore the descriptor tables to their proper values + * ltr is done in fix_processor_context(). + */ + + asm volatile ("lgdt %0" :: "m" (suspend2_saved_context.gdt_limit)); + asm volatile ("lidt %0" :: "m" (suspend2_saved_context.idt_limit)); + + /* + * segment registers + */ + asm volatile ("movw %0, %%ds" :: "r" (suspend2_saved_context.ds)); + asm volatile ("movw %0, %%es" :: "r" (suspend2_saved_context.es)); + asm volatile ("movw %0, %%fs" :: "r" (suspend2_saved_context.fs)); + load_gs_index(suspend2_saved_context.gs); + asm volatile ("movw %0, %%ss" :: "r" (suspend2_saved_context.ss)); + + wrmsrl(MSR_FS_BASE, suspend2_saved_context.fs_base); + wrmsrl(MSR_GS_BASE, suspend2_saved_context.gs_base); + wrmsrl(MSR_KERNEL_GS_BASE, suspend2_saved_context.gs_kernel_base); + + /* tell gcc that we clobbered all the registers... + * otherwise it might keep some addresses there. */ + asm volatile ("" : : : "rsp", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"); + + fix_processor_context(); + + do_fpu_end(); + + mtrr_ap_init(); + mcheck_init(&boot_cpu_data); +} + +#if defined(CONFIG_SUSPEND2) || defined(CONFIG_SMP) +extern unsigned char * my_saved_context __nosavedata; +static unsigned long c_loops_per_jiffy_ref[NR_CPUS] __nosavedata; +#endif + +#ifdef CONFIG_SUSPEND2 +#ifndef CONFIG_SMP +extern unsigned long loops_per_jiffy; +volatile static unsigned long cpu_khz_ref __nosavedata = 0; +#endif + +/* + * APIC support: These routines save the APIC + * configuration for the CPU on which they are + * being executed + */ +extern void suspend_apic_save_state(void); +extern void suspend_apic_reload_state(void); + +static inline void suspend2_arch_pre_copy(void) +{ +} + +static inline void suspend2_arch_post_copy(void) +{ +} + +static inline void suspend2_arch_pre_copyback(void) +{ + /* We want to run from swsusp_pg_dir, since swsusp_pg_dir is stored in + * constant place in memory. + */ + + suspend2_mapping_prepare(); + + asm volatile ("movq $0xffff810000000000, %rdx"); + asm volatile ("movq temp_level4_pgt(%rip), %rax"); + asm volatile ("subq %rdx, %rax"); + asm volatile ("movq %rax, %cr3"); + + wbinvd(); + __flush_tlb_all(); + + c_loops_per_jiffy_ref[0] = + current_cpu_data.loops_per_jiffy; +#ifndef CONFIG_SMP + cpu_khz_ref = cpu_khz; + c_loops_per_jiffy_ref[0] = loops_per_jiffy; +#endif + +} + +static inline void suspend2_arch_restore_processor_context(void) +{ + restore_processor_context(); +} + +static inline void suspend2_arch_flush_caches(void) +{ +#ifdef CONFIG_SMP + clear_bit(0, &read_pda(active_mm)->cpu_vm_mask); +#endif + wbinvd(); + __flush_tlb_all(); + +} + +static inline void suspend2_arch_post_copyback(void) +{ + /* Get other CPUs to restore their contexts and flush their tlbs. */ + clear_suspend_state(SUSPEND_FREEZE_SMP); + + BUG_ON(!irqs_disabled()); + + current_cpu_data.loops_per_jiffy = + c_loops_per_jiffy_ref[0]; +#ifndef CONFIG_SMP + loops_per_jiffy = c_loops_per_jiffy_ref[0]; + cpu_khz = cpu_khz_ref; +#endif +} + +#endif diff -urN oldtree/include/linux/bio.h newtree/include/linux/bio.h --- oldtree/include/linux/bio.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/bio.h 2006-02-13 14:51:54.086942792 -0500 @@ -124,6 +124,7 @@ #define BIO_BOUNCED 5 /* bio is a bounce bio */ #define BIO_USER_MAPPED 6 /* contains user pages */ #define BIO_EOPNOTSUPP 7 /* not supported */ +#define BIO_SUSPEND2 8 /* Suspend2 bio - for corruption checking */ #define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag))) /* diff -urN oldtree/include/linux/dyn_pageflags.h newtree/include/linux/dyn_pageflags.h --- oldtree/include/linux/dyn_pageflags.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/linux/dyn_pageflags.h 2006-02-13 14:51:54.086942792 -0500 @@ -0,0 +1,66 @@ +/* + * include/linux/dyn_pageflags.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It implements support for dynamically allocated bitmaps that are + * used for temporary or infrequently used pageflags, in lieu of + * bits in the struct page flags entry. + */ + +#ifndef DYN_PAGEFLAGS_H +#define DYN_PAGEFLAGS_H + +#include + +typedef unsigned long *** dyn_pageflags_t; + +#define BITNUMBER(page) (page_to_pfn(page)) + +#if BITS_PER_LONG == 32 +#define UL_SHIFT 5 +#else +#if BITS_PER_LONG == 64 +#define UL_SHIFT 6 +#else +#error Bits per long not 32 or 64? +#endif +#endif + +#define BIT_NUM_MASK (sizeof(unsigned long) * 8 - 1) +#define PAGE_NUM_MASK (~((1 << (PAGE_SHIFT + 3)) - 1)) +#define UL_NUM_MASK (~(BIT_NUM_MASK | PAGE_NUM_MASK)) + +#define BITS_PER_PAGE (PAGE_SIZE << 3) +#define PAGENUMBER(zone_offset) (zone_offset >> (PAGE_SHIFT + 3)) +#define PAGEINDEX(zone_offset) ((zone_offset & UL_NUM_MASK) >> UL_SHIFT) +#define PAGEBIT(zone_offset) (zone_offset & BIT_NUM_MASK) + +#define PAGE_UL_PTR(bitmap, zone_num, zone_pfn) \ + ((bitmap[zone_num][PAGENUMBER(zone_pfn)])+PAGEINDEX(zone_pfn)) + +/* With the above macros defined, you can do... + +#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page)) +#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page)) +#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page)) +*/ + +#define BITMAP_FOR_EACH_SET(bitmap, counter) \ + for (counter = get_next_bit_on(bitmap, -1); counter < max_pfn; \ + counter = get_next_bit_on(bitmap, counter)) + +extern void clear_dyn_pageflags(dyn_pageflags_t pagemap); +extern int allocate_dyn_pageflags(dyn_pageflags_t *pagemap); +extern int free_dyn_pageflags(dyn_pageflags_t *pagemap); +extern int dyn_pageflags_pages_per_bitmap(void); +extern int get_next_bit_on(dyn_pageflags_t bitmap, int counter); +extern unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, + struct page *pg); + +extern int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +extern void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +extern void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page); +#endif diff -urN oldtree/include/linux/freezer.h newtree/include/linux/freezer.h --- oldtree/include/linux/freezer.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/linux/freezer.h 2006-02-13 14:51:54.086942792 -0500 @@ -0,0 +1,28 @@ +/* Freezer declarations */ + +#define FREEZER_ON 0 +#define ABORT_FREEZING 1 + +#define FREEZER_KERNEL_THREADS 0 +#define FREEZER_ALL_THREADS 1 + +#ifdef CONFIG_PM +extern unsigned long freezer_state; + +#define test_freezer_state(bit) test_bit(bit, &freezer_state) +#define set_freezer_state(bit) set_bit(bit, &freezer_state) +#define clear_freezer_state(bit) clear_bit(bit, &freezer_state) + +#define freezer_is_on() (test_freezer_state(FREEZER_ON)) + +extern void do_freeze_process(struct notifier_block *nl); + +#else + +#define test_freezer_state(bit) (0) +#define set_freezer_state(bit) do { } while(0) +#define clear_freezer_state(bit) do { } while(0) + +#define freezer_is_on() (0) + +#endif diff -urN oldtree/include/linux/kernel.h newtree/include/linux/kernel.h --- oldtree/include/linux/kernel.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/kernel.h 2006-02-13 14:51:54.087942640 -0500 @@ -103,6 +103,8 @@ __attribute__ ((format (printf, 2, 0))); extern int snprintf(char * buf, size_t size, const char * fmt, ...) __attribute__ ((format (printf, 3, 4))); +extern int snprintf_used(char *buffer, int buffer_size, + const char *fmt, ...); extern int vsnprintf(char *buf, size_t size, const char *fmt, va_list args) __attribute__ ((format (printf, 3, 0))); extern int scnprintf(char * buf, size_t size, const char * fmt, ...) diff -urN oldtree/include/linux/kthread.h newtree/include/linux/kthread.h --- oldtree/include/linux/kthread.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/kthread.h 2006-02-13 14:51:54.087942640 -0500 @@ -23,10 +23,20 @@ * * Returns a task_struct or ERR_PTR(-ENOMEM). */ +struct task_struct *__kthread_create(int (*threadfn)(void *data), + void *data, + unsigned long freezer_flags, + const char namefmt[], + va_list * args); + struct task_struct *kthread_create(int (*threadfn)(void *data), void *data, const char namefmt[], ...); +struct task_struct *kthread_nofreeze_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...); + /** * kthread_run: create and wake a thread. * @threadfn: the function to run until signal_pending(current). @@ -35,14 +45,15 @@ * * Description: Convenient wrapper for kthread_create() followed by * wake_up_process(). Returns the kthread, or ERR_PTR(-ENOMEM). */ -#define kthread_run(threadfn, data, namefmt, ...) \ -({ \ - struct task_struct *__k \ - = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \ - if (!IS_ERR(__k)) \ - wake_up_process(__k); \ - __k; \ -}) + +extern struct task_struct * kthread_run(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...); + +extern struct task_struct * kthread_nofreeze_run(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...); + /** * kthread_bind: bind a just-created kthread to a cpu. diff -urN oldtree/include/linux/netlink.h newtree/include/linux/netlink.h --- oldtree/include/linux/netlink.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/netlink.h 2006-02-13 14:51:54.087942640 -0500 @@ -21,6 +21,8 @@ #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ #define NETLINK_GENERIC 16 +#define NETLINK_SUSPEND2_USERUI 17 /* For suspend2's userui */ +#define NETLINK_SUSPEND2_USM 18 /* For suspend2's userui */ #define MAX_LINKS 32 diff -urN oldtree/include/linux/sched.h newtree/include/linux/sched.h --- oldtree/include/linux/sched.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/sched.h 2006-02-13 14:51:54.088942488 -0500 @@ -34,6 +34,7 @@ #include #include #include +#include #include /* For AT_VECTOR_SIZE */ @@ -807,7 +808,10 @@ int (*notifier)(void *priv); void *notifier_data; sigset_t *notifier_mask; - + + /* todo list to be executed in the context of this thread */ + struct notifier_block *todo; + void *security; struct audit_context *audit_context; seccomp_t seccomp; @@ -898,7 +902,6 @@ #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_FLUSHER 0x00001000 /* responsible for disk writeback */ #define PF_USED_MATH 0x00002000 /* if unset the fpu must be initialized before use */ -#define PF_FREEZE 0x00004000 /* this task is being frozen for suspend now */ #define PF_NOFREEZE 0x00008000 /* this thread should not be frozen */ #define PF_FROZEN 0x00010000 /* frozen for system suspend */ #define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */ @@ -1385,79 +1388,37 @@ #endif -#ifdef CONFIG_PM /* - * Check if a process has been frozen + * Check if there is a todo list request */ -static inline int frozen(struct task_struct *p) +static inline int todo_list_active(void) { - return p->flags & PF_FROZEN; + return current->todo != NULL; } -/* - * Check if there is a request to freeze a process - */ -static inline int freezing(struct task_struct *p) +static inline void run_todo_list(void) { - return p->flags & PF_FREEZE; + notifier_call_chain(¤t->todo, 0, current); } -/* - * Request that a process be frozen - * FIXME: SMP problem. We may not modify other process' flags! - */ -static inline void freeze(struct task_struct *p) +static inline int try_todo_list(void) { - p->flags |= PF_FREEZE; -} - -/* - * Wake up a frozen process - */ -static inline int thaw_process(struct task_struct *p) -{ - if (frozen(p)) { - p->flags &= ~PF_FROZEN; - wake_up_process(p); - return 1; - } - return 0; -} - -/* - * freezing is complete, mark process as frozen - */ -static inline void frozen_process(struct task_struct *p) -{ - p->flags = (p->flags & ~PF_FREEZE) | PF_FROZEN; -} - -extern void refrigerator(void); -extern int freeze_processes(void); -extern void thaw_processes(void); - -static inline int try_to_freeze(void) -{ - if (freezing(current)) { - refrigerator(); + if (todo_list_active()) { + run_todo_list(); return 1; } else return 0; } -#else -static inline int frozen(struct task_struct *p) { return 0; } -static inline int freezing(struct task_struct *p) { return 0; } -static inline void freeze(struct task_struct *p) { BUG(); } -static inline int thaw_process(struct task_struct *p) { return 1; } -static inline void frozen_process(struct task_struct *p) { BUG(); } - -static inline void refrigerator(void) {} -static inline int freeze_processes(void) { BUG(); return 0; } -static inline void thaw_processes(void) {} -static inline int try_to_freeze(void) { return 0; } +/* + * Compatibility definitions to use the suspend checkpoints for the task todo + * list. These may be removed once all uses of try_to_free, refrigerator and + * freezing have been removed. + */ +#define try_to_freeze try_todo_list +#define refrigerator run_todo_list +#define freezing(p) todo_list_active() -#endif /* CONFIG_PM */ #endif /* __KERNEL__ */ #endif diff -urN oldtree/include/linux/suspend.h newtree/include/linux/suspend.h --- oldtree/include/linux/suspend.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/suspend.h 2006-02-13 14:51:54.089942336 -0500 @@ -9,6 +9,7 @@ #include #include #include +#include /* page backup entry */ typedef struct pbe { @@ -50,14 +51,20 @@ extern int pm_prepare_console(void); extern void pm_restore_console(void); +extern int freeze_processes(void); +extern void thaw_processes(int which_threads); #else static inline int software_suspend(void) { printk("Warning: fake suspend called\n"); return -EPERM; } +static inline int freeze_processes(void) { return 0; } +static inline void thaw_processes(int which_threads) { } #endif +extern char resume2_file[256]; + #ifdef CONFIG_SUSPEND_SMP extern void disable_nonboot_cpus(void); extern void enable_nonboot_cpus(void); @@ -69,8 +76,6 @@ void save_processor_state(void); void restore_processor_state(void); struct saved_context; -void __save_processor_state(struct saved_context *ctxt); -void __restore_processor_state(struct saved_context *ctxt); unsigned long get_safe_page(gfp_t gfp_mask); /* diff -urN oldtree/include/linux/suspend2.h newtree/include/linux/suspend2.h --- oldtree/include/linux/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/include/linux/suspend2.h 2006-02-13 14:51:54.089942336 -0500 @@ -0,0 +1,231 @@ +#ifndef _LINUX_SUSPEND2_H +#define _LINUX_SUSPEND2_H + +#include +#include +#ifdef CONFIG_ACPI +#include +#include +#endif + +/* arch/i386/mm/init.c */ +extern char __nosave_begin, __nosave_end; + +extern char __nosavedata swsusp_pg_dir[PAGE_SIZE] + __attribute__ ((aligned (PAGE_SIZE))); + +#define SECTOR_SIZE 512 + +/* kernel/power/process.c */ + +/* kernel/power/main.c */ +extern unsigned long suspend_result; + +/* kernel/power/process.c */ +extern unsigned long suspend_debug_state; + +/* arch/i386/power/suspend2.c */ +extern unsigned long suspend_action; +extern int suspend_io_time[2][2]; + +extern dyn_pageflags_t pageset1_map; +extern dyn_pageflags_t pageset1_copy_map; + +#ifdef CONFIG_PM_DEBUG +#define test_debug_state(bit) (test_bit(bit, &suspend_debug_state)) +#else +#define test_debug_state(bit) (0) +#endif + +#define test_result_state(bit) (test_bit(bit, &suspend_result)) + +/* + * First status register - this is suspend's return code. + * + * All the rest are in kernel/power/suspend2_common.h + */ +#define SUSPEND_ABORTED 0 + +/* Second status register - ditto */ +#define SUSPEND_RETRY_RESUME 0 + +/* Debug sections - if debugging compiled in */ +enum { + SUSPEND_ANY_SECTION, + SUSPEND_FREEZER, + SUSPEND_EAT_MEMORY, + SUSPEND_PAGESETS, + SUSPEND_IO, + SUSPEND_BMAP, + SUSPEND_HEADER, + SUSPEND_WRITER, + SUSPEND_MEMORY, + SUSPEND_EXTENTS, + SUSPEND_SPINLOCKS, + SUSPEND_MEM_POOL, + SUSPEND_RANGE_PARANOIA, + SUSPEND_NOSAVE, + SUSPEND_INTEGRITY +}; + +/* debugging levels. */ +#define SUSPEND_STATUS 0 +#define SUSPEND_ERROR 2 +#define SUSPEND_LOW 3 +#define SUSPEND_MEDIUM 4 +#define SUSPEND_HIGH 5 +#define SUSPEND_VERBOSE 6 + +/* second status register */ +enum { + SUSPEND_REBOOT, + SUSPEND_PAUSE, + SUSPEND_SLOW, + SUSPEND_NOPAGESET2, + SUSPEND_LOGALL, + SUSPEND_CAN_CANCEL, + SUSPEND_KEEP_IMAGE, + SUSPEND_FREEZER_TEST, + SUSPEND_FREEZER_TEST_SHOWALL, + SUSPEND_SINGLESTEP, + SUSPEND_PAUSE_NEAR_PAGESET_END, + SUSPEND_USE_ACPI_S4, + SUSPEND_TEST_FILTER_SPEED, + SUSPEND_FREEZE_TIMERS, + SUSPEND_DISABLE_SYSDEV_SUPPORT, + SUSPEND_VGA_POST, + SUSPEND_TEST_BIO, + SUSPEND_NO_PAGESET2, +}; + +#ifdef CONFIG_SUSPEND2 +#define test_action_state(bit) (test_bit(bit, &suspend_action)) +#define set_action_state(bit) (test_and_set_bit(bit, &suspend_action)) +#define clear_action_state(bit) (test_and_clear_bit(bit, &suspend_action)) +#else +#define test_action_state(bit) (0) +#endif + +extern void __suspend_message(unsigned long section, unsigned long level, int log_normally, + const char *fmt, ...); + +#ifdef CONFIG_PM_DEBUG +#define suspend_message(sn, lev, log, fmt, a...) \ +do { \ + if (test_debug_state(sn)) \ + __suspend_message(sn, lev, log, fmt, ##a); \ +} while(0) +#else /* CONFIG_PM_DEBUG */ +#define suspend_message(sn, lev, log, fmt, a...) \ +do { \ + if (lev == 0) \ + __suspend_message(sn, lev, log, fmt, ##a); \ +} while(0) +#endif /* CONFIG_PM_DEBUG */ + +/* Suspend 2 */ + +enum { + SUSPEND_DISABLED, + SUSPEND_RUNNING, + SUSPEND_RESUME_DEVICE_OK, + SUSPEND_NORESUME_SPECIFIED, + SUSPEND_COMMANDLINE_ERROR, + SUSPEND_IGNORE_IMAGE, + SUSPEND_SANITY_CHECK_PROMPT, + SUSPEND_FREEZER_ON, + SUSPEND_BLOCK_PAGE_ALLOCATIONS, + SUSPEND_USE_MEMORY_POOL, + SUSPEND_STAGE2_CONTINUE, + SUSPEND_FREEZE_SMP, + SUSPEND_PAGESET2_NOT_LOADED, + SUSPEND_CONTINUE_REQ, + SUSPEND_RESUMED_BEFORE, + SUSPEND_RUNNING_INITRD, + SUSPEND_RESUME_NOT_DONE, + SUSPEND_BOOT_TIME, + SUSPEND_NOW_RESUMING, + SUSPEND_SLAB_ALLOC_FALLBACK, + SUSPEND_IGNORE_LOGLEVEL, + SUSPEND_TIMER_FREEZER_ON, + SUSPEND_ACT_USED, + SUSPEND_DBG_USED, + SUSPEND_LVL_USED, + SUSPEND_TRYING_TO_RESUME, + SUSPEND_FORK_COPYBACK_THREAD, + SUSPEND_TRY_RESUME_RD, + SUSPEND_IGNORE_ROOTFS, +}; + +#define test_and_set_suspend_state(bit) \ + (test_and_set_bit(bit, &software_suspend_state)) + +#define get_suspend_state() (software_suspend_state) +#define restore_suspend_state(saved_state) \ + do { software_suspend_state = saved_state; } while(0) + +/* --------------------------------------------------------------------- */ +#ifdef CONFIG_SUSPEND2 + +/* Used in init dir files */ +extern unsigned long software_suspend_state; + +extern void suspend2_try_resume(void); +extern int suspend_early_boot_message + (int can_erase_image, int default_answer, char *warning_reason, ...); +extern void suspend_handle_keypress(unsigned int keycode, int source); +extern unsigned long suspend2_update_status (unsigned long value, unsigned long maximum, + const char *fmt, ...); +extern void suspend2_prepare_status (int clearbar, const char *fmt, ...); + +#define test_suspend_state(bit) \ + (test_bit(bit, &software_suspend_state)) + +#define clear_suspend_state(bit) \ + (clear_bit(bit, &software_suspend_state)) + +#define set_suspend_state(bit) \ + (set_bit(bit, &software_suspend_state)) + +extern inline void suspend2_copyback_low(void); +extern inline void suspend2_copyback_high(void); + +extern void suspend2_try_suspend(void); + +/* --------------------------------------------------------------------- */ +#else +/* --------------------------------------------------------------------- */ + +#define software_suspend_state (0) +#define clear_suspend_state(bit) do { } while (0) +#define test_suspend_state(bit) (0) +#define set_suspend_state(bit) do { } while(0) + +#define suspend2_try_resume() do { } while(0) +static inline int suspend_early_boot_message(int a, int b, char *c, ...) { return 0; } +#define suspend_handle_keypress(a, b) do { } while(0) +static inline unsigned long suspend2_update_status(unsigned long value, unsigned long maximum, + const char *fmt, ...) +{ + return maximum; +} +#define suspend2_prepare_status(a, ...) do { } while(0) + +#endif /* CONFIG_SUSPEND2 */ + +#if defined(CONFIG_SUSPEND2) && defined(CONFIG_ACPI) +static inline int may_try_suspend2(u32 state) +{ + if (state == ACPI_STATE_S4) { + suspend2_try_suspend(); + return 1; + } + return 0; +} +#else +static inline int may_try_suspend2(u32 state) +{ + return 0; +} +#endif +#endif /* _LINUX_SUSPEND2_H */ diff -urN oldtree/include/linux/workqueue.h newtree/include/linux/workqueue.h --- oldtree/include/linux/workqueue.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/include/linux/workqueue.h 2006-02-13 14:51:54.089942336 -0500 @@ -51,9 +51,12 @@ } while (0) extern struct workqueue_struct *__create_workqueue(const char *name, - int singlethread); -#define create_workqueue(name) __create_workqueue((name), 0) -#define create_singlethread_workqueue(name) __create_workqueue((name), 1) + int singlethread, + unsigned long freezer_flag); +#define create_workqueue(name) __create_workqueue((name), 0, 0) +#define create_nofreeze_workqueue(name) __create_workqueue((name), 0, PF_NOFREEZE) +#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0) +#define create_nofreeze_singlethread_workqueue(name) __create_workqueue((name), 1, PF_NOFREEZE) extern void destroy_workqueue(struct workqueue_struct *wq); diff -urN oldtree/init/do_mounts.c newtree/init/do_mounts.c --- oldtree/init/do_mounts.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/init/do_mounts.c 2006-02-13 14:51:54.090942184 -0500 @@ -139,11 +139,16 @@ char s[32]; char *p; dev_t res = 0; - int part; + int part, mount_result; #ifdef CONFIG_SYSFS int mkdir_err = sys_mkdir("/sys", 0700); - if (sys_mount("sysfs", "/sys", "sysfs", 0, NULL) < 0) + /* + * When changing resume2 parameter for Software Suspend, sysfs may + * already be mounted. + */ + mount_result = sys_mount("sysfs", "/sys", "sysfs", 0, NULL); + if (mount_result < 0 && mount_result != -EBUSY) goto out; #endif @@ -195,7 +200,8 @@ res = try_name(s, part); done: #ifdef CONFIG_SYSFS - sys_umount("/sys", 0); + if (mount_result >= 0) + sys_umount("/sys", 0); out: if (!mkdir_err) sys_rmdir("/sys"); @@ -412,9 +418,25 @@ is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR; + /* Suspend2: + * By this point, suspend_early_init has been called to initialise our + * proc interface. If modules are built in, they have registered (all + * of the above via late_initcalls). + * + * We have not yet looked to see if an image exists, however. If we + * have an initrd, it is expected that the user will have set it up + * to echo > /proc/suspend2/do_resume and thus initiate any + * resume. If they don't do that, we do it immediately after the initrd + * is finished (major issues if they mount filesystems rw from the + * initrd! - they are warned. If there's no usable initrd, we do our + * check next. + */ if (initrd_load()) goto out; + if (test_suspend_state(SUSPEND_RESUME_NOT_DONE)) + suspend2_try_resume(); + if (is_floppy && rd_doload && rd_load_disk(0)) ROOT_DEV = Root_RAM0; diff -urN oldtree/init/do_mounts_initrd.c newtree/init/do_mounts_initrd.c --- oldtree/init/do_mounts_initrd.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/init/do_mounts_initrd.c 2006-02-13 14:51:54.090942184 -0500 @@ -7,6 +7,7 @@ #include #include #include +#include #include "do_mounts.h" @@ -58,10 +59,16 @@ pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD); if (pid > 0) { - while (pid != sys_wait4(-1, NULL, 0, NULL)) + while (pid != sys_wait4(-1, NULL, 0, NULL)) { yield(); + try_to_freeze(); + } } + if (test_suspend_state(SUSPEND_RESUME_NOT_DONE)) + printk(KERN_ERR "Suspend2: Initrd lacks echo > /proc/suspend2/do_resume.\n"); + clear_suspend_state(SUSPEND_BOOT_TIME); + /* move initrd to rootfs' /old */ sys_fchdir(old_fd); sys_mount("/", ".", NULL, MS_MOVE, NULL); diff -urN oldtree/init/main.c newtree/init/main.c --- oldtree/init/main.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/init/main.c 2006-02-13 14:51:54.090942184 -0500 @@ -694,7 +694,9 @@ /* * check if there is an early userspace init. If yes, let it do all - * the work + * the work. For suspend2, we assume that it will do the right thing + * with regard to trying to resume at the right place. When that + * happens, the BOOT_TIME flag will be cleared. */ if (!ramdisk_execute_command) diff -urN oldtree/kernel/audit.c newtree/kernel/audit.c --- oldtree/kernel/audit.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/audit.c 2006-02-13 14:51:54.091942032 -0500 @@ -288,6 +288,9 @@ } } else { DECLARE_WAITQUEUE(wait, current); + + try_todo_list(); + set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&kauditd_wait, &wait); diff -urN oldtree/kernel/fork.c newtree/kernel/fork.c --- oldtree/kernel/fork.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/fork.c 2006-02-13 14:51:54.092941880 -0500 @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -165,7 +166,13 @@ if (!tsk) return NULL; - ti = alloc_thread_info(tsk); + if (test_suspend_state(SUSPEND_FORK_COPYBACK_THREAD)) { + extern void * suspend2_get_nonconflicting_pages(int); + ti = suspend2_get_nonconflicting_pages(get_order(THREAD_SIZE)); + printk("Starting a copyback thread %p\n", ti); + } else + ti = alloc_thread_info(tsk); + if (!ti) { free_task_struct(tsk); return NULL; @@ -942,6 +949,8 @@ clear_tsk_thread_flag(p, TIF_SIGPENDING); init_sigpending(&p->pending); + p->todo = NULL; + p->utime = cputime_zero; p->stime = cputime_zero; p->sched_time = 0; diff -urN oldtree/kernel/kmod.c newtree/kernel/kmod.c --- oldtree/kernel/kmod.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/kmod.c 2006-02-13 14:51:54.092941880 -0500 @@ -36,6 +36,7 @@ #include #include #include +#include #include extern int max_threads; @@ -249,6 +250,9 @@ if (!khelper_wq) return -EBUSY; + if (freezer_is_on()) + return 0; + if (path[0] == '\0') return 0; diff -urN oldtree/kernel/kthread.c newtree/kernel/kthread.c --- oldtree/kernel/kthread.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/kthread.c 2006-02-13 14:51:54.092941880 -0500 @@ -25,6 +25,7 @@ /* Information passed to kthread() from keventd. */ int (*threadfn)(void *data); void *data; + unsigned long freezer_flags; struct completion started; /* Result passed back to kthread_create() from keventd. */ @@ -86,6 +87,10 @@ /* By default we can run anywhere, unlike keventd. */ set_cpus_allowed(current, CPU_MASK_ALL); + /* Set our freezer flags */ + current->flags &= ~PF_NOFREEZE; + current->flags |= (create->freezer_flags & PF_NOFREEZE); + /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_INTERRUPTIBLE); complete(&create->started); @@ -119,16 +124,18 @@ complete(&create->done); } -struct task_struct *kthread_create(int (*threadfn)(void *data), +struct task_struct *__kthread_create(int (*threadfn)(void *data), void *data, + unsigned long freezer_flags, const char namefmt[], - ...) + va_list * args) { struct kthread_create_info create; DECLARE_WORK(work, keventd_create_kthread, &create); create.threadfn = threadfn; create.data = data; + create.freezer_flags = freezer_flags; init_completion(&create.started); init_completion(&create.done); @@ -141,18 +148,89 @@ queue_work(helper_wq, &work); wait_for_completion(&create.done); } - if (!IS_ERR(create.result)) { - va_list args; - va_start(args, namefmt); + if (!IS_ERR(create.result)) vsnprintf(create.result->comm, sizeof(create.result->comm), - namefmt, args); - va_end(args); - } + namefmt, *args); return create.result; } + +struct task_struct *kthread_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...) +{ + struct task_struct * result; + + va_list args; + va_start(args, namefmt); + result = __kthread_create(threadfn, data, 0, namefmt, &args); + va_end(args); + return result; +} + EXPORT_SYMBOL(kthread_create); +struct task_struct *kthread_nofreeze_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...) +{ + struct task_struct * result; + + va_list args; + va_start(args, namefmt); + result = __kthread_create(threadfn, data, PF_NOFREEZE, namefmt, &args); + va_end(args); + return result; +} + +EXPORT_SYMBOL(kthread_nofreeze_create); + +/** + * kthread_run: create and wake a thread. + * @threadfn: the function to run until signal_pending(current). + * @data: data ptr for @threadfn. + * @namefmt: printf-style name for the thread. + * + * Description: Convenient wrapper for kthread_create() followed by + * wake_up_process(). Returns the kthread, or ERR_PTR(-ENOMEM). + **/ +struct task_struct * kthread_run(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...) +{ + struct task_struct *__k; + va_list args; + + va_start(args, namefmt); + __k = __kthread_create(threadfn, data, 0, namefmt, &args); + va_end(args); + + if(!IS_ERR(__k)) + wake_up_process(__k); + + return __k; +} + +EXPORT_SYMBOL(kthread_run); + +struct task_struct * kthread_nofreeze_run(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...) +{ + struct task_struct *__k; + va_list args; + + va_start(args, namefmt); + __k = __kthread_create(threadfn, data, PF_NOFREEZE, namefmt, &args); + va_end(args); + + if(!IS_ERR(__k)) + wake_up_process(__k); + + return __k; +} +EXPORT_SYMBOL(kthread_nofreeze_run); + void kthread_bind(struct task_struct *k, unsigned int cpu) { BUG_ON(k->state != TASK_INTERRUPTIBLE); diff -urN oldtree/kernel/power/Kconfig newtree/kernel/power/Kconfig --- oldtree/kernel/power/Kconfig 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/Kconfig 2006-02-13 14:51:54.093941728 -0500 @@ -98,3 +98,75 @@ bool depends on HOTPLUG_CPU && X86 && PM default y + +config SUSPEND_DEBUG_PAGEALLOC + bool + depends on DEBUG_PAGEALLOC && (SOFTWARE_SUSPEND || SUSPEND2) + default y + +config SUSPEND2_CRYPTO + bool + depends on SUSPEND2 && CRYPTO + default y + +menuconfig SUSPEND2 + bool "Suspend2" + select DYN_PAGEFLAGS + depends on PM + select HOTPLUG_CPU if SMP + ---help--- + Suspend2 is the 'new and improved' suspend support. + + See the Suspend2 home page (suspend2.net) + for FAQs, HOWTOs and other documentation. + + comment 'Image Storage (you need at least one writer)' + depends on SUSPEND2 + + config SUSPEND2_FILEWRITER + bool ' File Writer' + depends on SUSPEND2 + ---help--- + This option enables support for storing an image in a + simple file. This should be possible, but we're still + testing it. + + config SUSPEND2_SWAPWRITER + bool ' Swap Writer' + depends on SWAP && SUSPEND2 + ---help--- + This option enables support for storing an image in your + swap space. + + comment 'General Options' + depends on SUSPEND2 + + config SUSPEND2_DEFAULT_RESUME2 + string ' Default resume device name' + depends on SUSPEND2 + ---help--- + You normally need to add a resume2= parameter to your lilo.conf or + equivalent. With this option properly set, the kernel has a value + to default. No damage will be done if the value is invalid. + + config SUSPEND2_CHECKSUMMING + bool ' Checksum images - developer option (SLOW!)' + depends on PM_DEBUG && SUSPEND2 + ---help--- + This option implements checksumming of images. It is not designed + for everyone to use, but as a development tool. + + config SUSPEND2_KEEP_IMAGE + bool ' Allow Keep Image Mode' + depends on SUSPEND2 + ---help--- + This option allows you to keep and image and reuse it. It is intended + __ONLY__ for use with systems where all filesystems are mounted read- + only (kiosks, for example). To use it, compile this option in and boot + normally. Set the KEEP_IMAGE flag in /proc/suspend2 and suspend. + When you resume, the image will not be removed. You will be unable to turn + off swap partitions (assuming you are using the swap writer), but future + suspends simply do a power-down. The image can be updated using the + kernel command line parameter suspend_act= to turn off the keep image + bit. Keep image mode is a little less user friendly on purpose - it + should not be used without thought! diff -urN oldtree/kernel/power/Makefile newtree/kernel/power/Makefile --- oldtree/kernel/power/Makefile 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/Makefile 2006-02-13 14:51:54.093941728 -0500 @@ -3,10 +3,33 @@ EXTRA_CFLAGS += -DDEBUG endif +CFLAGS_atomic_copy.o := -O0 + obj-y := main.o process.o console.o obj-$(CONFIG_PM_LEGACY) += pm.o -obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o disk.o snapshot.o obj-$(CONFIG_SUSPEND_SMP) += smp.o +# Order is important for compression and encryption - we +# compress before encrypting. + +suspend_core-objs := io.o pagedir.o prepare_image.o \ + extent.o suspend.o plugins.o \ + pageflags.o ui.o proc.o \ + power_off.o atomic_copy.o debug_pagealloc.o \ + netlink.o + +#ifdef CONFIG_NET +suspend_core-objs += storage.o +#endif +obj-$(CONFIG_SUSPEND2) += suspend_core.o +obj-$(CONFIG_SUSPEND2_CRYPTO) += compression.o encryption.o + +obj-$(CONFIG_SUSPEND2_SWAPWRITER) += suspend_block_io.o suspend_swap.o +obj-$(CONFIG_SUSPEND2_FILEWRITER) += suspend_block_io.o suspend_file.o + +obj-$(CONFIG_SUSPEND2_CHECKSUMMING) += suspend_checksums.o + +obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o disk.o snapshot.o + obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o diff -urN oldtree/kernel/power/atomic_copy.c newtree/kernel/power/atomic_copy.c --- oldtree/kernel/power/atomic_copy.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/atomic_copy.c 2006-02-13 14:51:54.094941576 -0500 @@ -0,0 +1,473 @@ +/* + */ + +#include +#include +#include +#include +#include +#include +#include +#include "suspend2_common.h" +#include "io.h" +#include "power_off.h" +#include "version.h" +#include "ui.h" +#include "plugins.h" +#include "atomic_copy.h" +#include "suspend2.h" +#include "checksum.h" +#include "pageflags.h" +#include "debug_pagealloc.h" +#include "storage.h" + +volatile static int state1 __nosavedata = 0; +volatile static int state2 __nosavedata = 0; +volatile static int state3 __nosavedata = 0; +volatile static int io_speed_save[2][2] __nosavedata; + +static dyn_pageflags_t __nosavedata origmap; +static dyn_pageflags_t __nosavedata copymap; +static unsigned long __nosavedata origoffset; +static unsigned long __nosavedata copyoffset; +static int __nosavedata loop; +static __nosavedata int o_zone_num, c_zone_num; +static __nosavedata int is_resuming; + +__nosavedata char resume_commandline[COMMAND_LINE_SIZE]; + +static atomic_t atomic_copy_hold; +static atomic_t restore_thread_ready; + +suspend2_saved_context_t suspend2_saved_context; /* temporary storage */ +cpumask_t saved_affinity[NR_IRQS]; + +struct zone_data { + unsigned long start_pfn; + unsigned long end_pfn; + int is_highmem; +}; + +static __nosavedata struct zone_data *zone_nosave; +static __nosavedata int num_zones; + +/* + * Zone information might be overwritten during the copy back, + * so we copy the fields we need to a non-conflicting page and + * use it. + */ +static void init_nosave_zone_table(void) +{ + struct zone *zone; + + zone_nosave = (struct zone_data *) suspend2_get_nonconflicting_pages(0); + + BUG_ON(!zone_nosave); + + for_each_zone(zone) { + if (zone->spanned_pages) { + zone_nosave[num_zones].start_pfn = zone->zone_start_pfn; + zone_nosave[num_zones].end_pfn = zone->zone_start_pfn + + zone->spanned_pages - 1; + zone_nosave[num_zones].is_highmem = is_highmem(zone); + } + num_zones++; + } +} + +/* For Suspend2, where this all has to be inlined */ +static unsigned long __get_next_bit_on(dyn_pageflags_t bitmap, int *zone_num, long counter) +{ + unsigned long *ul_ptr = NULL; + int reset_ul_ptr = 1; + BUG_ON(counter == max_pfn); + + if (counter == -1) { + *zone_num = 0; + + /* + * Test the end because the start can validly + * be zero. + */ + while (!zone_nosave[*zone_num].end_pfn) + *zone_num++; + counter = zone_nosave[*zone_num].start_pfn - 1; + } + + do { + counter++; + if (counter > zone_nosave[*zone_num].end_pfn) { + (*zone_num)++; + while (!zone_nosave[*zone_num].end_pfn && *zone_num < num_zones) + (*zone_num)++; + + if (*zone_num == num_zones) + return max_pfn; + counter = zone_nosave[*zone_num].start_pfn; + reset_ul_ptr = 1; + } else + if (!(counter & BIT_NUM_MASK)) + reset_ul_ptr = 1; + if (reset_ul_ptr) { + reset_ul_ptr = 0; + ul_ptr = PAGE_UL_PTR(bitmap, *zone_num, + (counter - zone_nosave[*zone_num].start_pfn)); + if (!*ul_ptr) { + counter += BIT_NUM_MASK - 1; + continue; + } + } + } while((counter < max_pfn) && !test_bit(PAGEBIT(counter), ul_ptr)); + return counter; +} + +/** + * copyback_prepare + * Functionality : Preparatory steps for copying the original kernel back. + * Called From : do_suspend2_lowlevel + **/ + +static void copyback_prepare(void) +{ + int loop; + + state1 = suspend_action; + state2 = suspend_debug_state; + state3 = console_loglevel; + for (loop = 0; loop < 4; loop++) + io_speed_save[loop/2][loop%2] = + suspend_io_time[loop/2][loop%2]; + + init_nosave_zone_table(); + + memcpy(resume_commandline, saved_command_line, COMMAND_LINE_SIZE); + + suspend2_map_atomic_copy_pages(); + + suspend2_deactivate_storage(1); + + /* Arch specific preparation */ + suspend2_arch_pre_copyback(); + + device_suspend(PMSG_FREEZE); + local_irq_disable(); /* irqs might have been re-enabled on us by buggy drivers */ + + device_power_down(PMSG_FREEZE); + + barrier(); + mb(); +} + +/* + * copyback_post + * Functionality : Steps taken after copying back the original kernel at + * resume. + * Key Assumptions : Will be able to read back secondary pagedir (if + * applicable). + * Called From : do_suspend2_lowlevel + */ + +static void copyback_post(void) +{ + int loop; + + /* Arch specific code */ + suspend2_arch_post_copyback(); + + suspend_action = state1; + suspend_debug_state = state2; + console_loglevel = state3; + + for (loop = 0; loop < 4; loop++) + suspend_io_time[loop/2][loop%2] = + io_speed_save[loop/2][loop%2]; + + set_suspend_state(SUSPEND_NOW_RESUMING); + set_suspend_state(SUSPEND_PAGESET2_NOT_LOADED); + + suspend2_unmap_atomic_copy_pages(); + + local_irq_disable(); + device_power_up(); + local_irq_enable(); + + device_resume(); + + if (pm_ops && pm_ops->finish && suspend2_powerdown_method > 3) + pm_ops->finish(suspend2_powerdown_method); + + if (suspend2_activate_storage(1)) + panic("Failed to reactivate our storage."); + + userui_redraw(); + + check_shift_keys(1, "About to reload secondary pagedir."); + + read_pageset2(0); + clear_suspend_state(SUSPEND_PAGESET2_NOT_LOADED); + + suspend2_prepare_status(DONT_CLEAR_BAR, "Cleaning up..."); +} + + +/* + * suspend2_pre_copy + * Functionality : Steps taken prior to saving CPU state and the image + * itself. + * Called From : do_suspend2_lowlevel + */ + +static void suspend2_pre_copy(void) +{ + suspend2_arch_pre_copy(); + + device_suspend(PMSG_FREEZE); + + mb(); + barrier(); + + local_irq_disable(); + + device_power_down(PMSG_FREEZE); +} + +/* + * suspend2_post_copy + * Functionality : Steps taken after saving CPU state to save the + * image and powerdown/reboot or recover on failure. + * Key Assumptions : save_image returns zero on success; otherwise we need to + * clean up and exit. The state on exiting this routine + * should be essentially the same as if we have suspended, + * resumed and reached the end of copyback_post. + * Called From : do_suspend2_lowlevel + */ +extern void suspend_power_down(void); + +static void suspend2_post_copy(void) +{ + suspend2_arch_post_copy(); + + if (!save_image_part1()) { + int temp_result; + + suspend_power_down(); + + temp_result = read_pageset2(1); + + /* If that failed, we're sunk. Panic! */ + if (temp_result) + panic("Attempt to reload pagedir 2 failed. Try rebooting."); + } + + if (!test_result_state(SUSPEND_ABORT_REQUESTED) && + !test_action_state(SUSPEND_TEST_FILTER_SPEED) && + !test_action_state(SUSPEND_TEST_BIO) && + suspend2_powerdown_method != PM_SUSPEND_MEM) + printk(KERN_EMERG name_suspend + "Suspend failed, trying to recover...\n"); + barrier(); + mb(); +} + +/* + * copyback_low + */ + +static void copyback_low(void) +{ + unsigned long *origpage; + unsigned long *copypage; + + o_zone_num = 0; + c_zone_num = 0; + + origmap = pageset1_map; + copymap = pageset1_copy_map; + + origoffset = __get_next_bit_on(origmap, &o_zone_num, -1); + copyoffset = __get_next_bit_on(copymap, &c_zone_num, -1); + + while (origoffset < max_pfn) { + if (!zone_nosave[o_zone_num].is_highmem) { + origpage = (unsigned long *) __va(origoffset << PAGE_SHIFT); + copypage = (unsigned long *) __va(copyoffset << PAGE_SHIFT); + + loop = (PAGE_SIZE / sizeof(unsigned long)) - 1; + + while (loop >= 0) { + *(origpage + loop) = *(copypage + loop); + loop--; + } + } + + origoffset = __get_next_bit_on(origmap, &o_zone_num, origoffset); + copyoffset = __get_next_bit_on(copymap, &c_zone_num, copyoffset); + } +} + +/* + * copyback_high + */ +static void copyback_high(void) +{ + unsigned long *origpage; + unsigned long *copypage; + + origoffset = get_next_bit_on(origmap, -1); + copyoffset = get_next_bit_on(copymap, -1); + + while (origoffset < max_pfn) { + if (PageHighMem(pfn_to_page(origoffset))) { + origpage = (unsigned long *) kmap_atomic(pfn_to_page(origoffset), KM_USER1); + copypage = (unsigned long *) __va(copyoffset << PAGE_SHIFT); + + memcpy(origpage, copypage, PAGE_SIZE); + + kunmap_atomic(origpage, KM_USER1); + } + + origoffset = get_next_bit_on(origmap, origoffset); + copyoffset = get_next_bit_on(copymap, copyoffset); + } +} + +void do_suspend2_lowlevel(int resume) +{ + is_resuming = resume; + + if (resume) { + suspend2_arch_save_processor_context(); /* We need to capture registers and memory at "same time" */ + + copyback_prepare(); + + copyback_low(); /* 0 = use logical addresses */ + + suspend2_arch_restore_processor_context(); + } else { + suspend2_pre_copy(); + + suspend2_arch_save_processor_context(); + } + + if (is_resuming) { + suspend2_arch_flush_caches(); + + /* Now we are running with our old stack, and with registers copied + * from suspend time. Let's copy back those remaining highmem pages. */ + copyback_high(); + suspend2_arch_flush_caches(); + + touch_softlockup_watchdog(); + + suspend2_checksum_print_differences(); + + copyback_post(); + + } else { + suspend2_post_copy(); /* If everything goes okay, this function does not return */ + } +} + +/* suspend_copy_pageset1 + * + * Description: Make the atomic copy of pageset1. We can't use copy_page (as we + * once did) because we can't be sure what side effects it has. On + * my old Duron, with 3DNOW, kernel_fpu_begin increments preempt + * count, making our preempt count at resume time 4 instead of 3. + * + * We don't want to call kmap_atomic unconditionally because it has + * the side effect of incrementing the preempt count, which will + * leave it one too high post resume (the page containing the + * preempt count will be copied after its incremented. This is + * essentially the same problem. + */ + +void suspend2_copy_pageset1(void) +{ + unsigned long i, source_index, dest_index; + + source_index = get_next_bit_on(pageset1_map, -1); + dest_index = get_next_bit_on(pageset1_copy_map, -1); + + for (i = 0; i < pagedir1.pageset_size; i++) { + unsigned long *origvirt, *copyvirt; + struct page *origpage; + int loop = (PAGE_SIZE / sizeof(unsigned long)) - 1; + + origpage = pfn_to_page(source_index); + + copyvirt = (unsigned long *) page_address(pfn_to_page(dest_index)); + + if (PageHighMem(origpage)) + origvirt = kmap_atomic(origpage, KM_USER1); + else + origvirt = page_address(origpage); + + while (loop >= 0) { + *(copyvirt + loop) = *(origvirt + loop); + loop--; + } + + if (PageHighMem(origpage)) + kunmap_atomic(origvirt, KM_USER1); + + source_index = get_next_bit_on(pageset1_map, source_index); + dest_index = get_next_bit_on(pageset1_copy_map, dest_index); + } +} + +int __suspend_atomic_restore(void *data) +{ + struct page *my_thread_info = virt_to_page(current->thread_info); + + BUG_ON(PagePageset1(my_thread_info)); + BUG_ON(THREAD_SIZE > PAGE_SIZE && PagePageset1(++my_thread_info)); + + atomic_set(&restore_thread_ready, 1); + + while atomic_read(&atomic_copy_hold) + yield(); + + suspend2_prepare_status(DONT_CLEAR_BAR, "Copying original kernel back"); + + /* + * If you're hitting this BUG_ON, you have a process that's + * not freezing which is started prior to this. + */ + BUG_ON(freeze_processes()); + + do_suspend2_lowlevel(1); + + printk("Returned from do_suspend2_lowlevel when resuming?!"); + BUG(); + + return 0; +} + +void suspend_atomic_restore(void) +{ + struct task_struct *work_thread; + + disable_nonboot_cpus(); + + yield(); + + set_suspend_state(SUSPEND_FORK_COPYBACK_THREAD); + BUG_ON(atomic_read(&restore_thread_ready)); + + atomic_set(&atomic_copy_hold, 1); + + /* Now start the new thread */ + work_thread = kthread_run(__suspend_atomic_restore, 0, "kcopyback"); + BUG_ON(IS_ERR(work_thread)); + + while (!atomic_read(&restore_thread_ready)) + yield(); + + atomic_set(&atomic_copy_hold, 0); + + while(1) { + try_to_freeze(); + yield(); + } +} diff -urN oldtree/kernel/power/atomic_copy.h newtree/kernel/power/atomic_copy.h --- oldtree/kernel/power/atomic_copy.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/atomic_copy.h 2006-02-13 14:51:54.094941576 -0500 @@ -0,0 +1,4 @@ +extern inline void move_stack_to_nonconflicing_area(void); +extern int save_image_part1(void); +extern void suspend_atomic_restore(void); + diff -urN oldtree/kernel/power/block_io.h newtree/kernel/power/block_io.h --- oldtree/kernel/power/block_io.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/block_io.h 2006-02-13 14:51:54.094941576 -0500 @@ -0,0 +1,75 @@ +/* + * block_io.h + * + * Copyright 2004-2005 Nigel Cunningham + * + * Distributed under GPLv2. + * + * This file contains declarations for functions exported from + * block_io.c, which contains low level io functions. + */ + +#include +#include "extent.h" + +/* + * submit_params + * + * The structure we use for tracking submitted I/O. + */ +struct submit_params { + swp_entry_t swap_address; + struct page *page; + struct block_device *dev; + sector_t block[MAX_BUF_PER_PAGE]; + int readahead_index; + struct submit_params *next; + int printme; +}; + +struct suspend2_bdev_info { + struct block_device *bdev; + dev_t dev_t; + int bmap_shift; + int blocks_per_page; +}; + +/* + * Our exported interface so the swapwriter and filewriter don't + * need these functions duplicated. + */ +struct suspend_bio_ops { + int (*submit_io) (int rw, + struct submit_params *submit_info, int syncio); + int (*bdev_page_io) (int rw, struct block_device *bdev, long pos, + struct page *page); + int (*rw_page) (int rw, struct page *page, int readahead_index, + int sync); + void (*wait_on_readahead) (int readahead_index); + void (*check_io_stats) (void); + void (*reset_io_stats) (void); + void (*finish_all_io) (void); + int (*prepare_readahead) (int index); + void (*cleanup_readahead) (int index); + struct page ** readahead_pages; + int (*readahead_ready) (int readahead_index); + int *need_extra_next; + int (*forward_one_page) (void); + void (*set_devinfo) (struct suspend2_bdev_info *info); + int (*read_init) (int stream_number); + int (*read_chunk) (struct page *buffer_page, int sync); + int (*read_cleanup) (void); + int (*write_init) (int stream_number); + int (*write_chunk) (struct page *buffer_page); + int (*write_cleanup) (void); + int (*read_header_chunk) (char *buffer, int buffer_size); + int (*write_header_chunk) (char *buffer, int buffer_size); +}; + +extern struct suspend_bio_ops suspend_bio_ops; + +extern char *suspend_writer_buffer; +extern int suspend_writer_buffer_posn; +extern int suspend_read_fd; +extern struct extent_iterate_saved_state suspend_writer_posn_save[3]; +extern struct extent_iterate_state suspend_writer_posn; diff -urN oldtree/kernel/power/checksum.h newtree/kernel/power/checksum.h --- oldtree/kernel/power/checksum.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/checksum.h 2006-02-13 14:51:54.094941576 -0500 @@ -0,0 +1,11 @@ +#ifdef CONFIG_SUSPEND2_CHECKSUMS +extern void suspend2_verify_checksums(void); +extern void suspend2_checksum_calculate_checksums(void); +extern void suspend2_checksum_print_differences(void); +extern int suspend2_allocate_checksum_pages(void); +#else +static inline void suspend2_verify_checksums(void) { }; +static inline void suspend2_checksum_calculate_checksums(void) { }; +static inline void suspend2_checksum_print_differences(void) { }; +static inline int suspend2_allocate_checksum_pages(void) { return 0; }; +#endif diff -urN oldtree/kernel/power/compression.c newtree/kernel/power/compression.c --- oldtree/kernel/power/compression.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/compression.c 2006-02-13 14:51:54.129936256 -0500 @@ -0,0 +1,638 @@ +/* + * kernel/power/suspend2_core/compression.c + * + * Copyright (C) 2003-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * This file contains data compression routines for suspend, + * using LZH compression. + * + */ + +#include +#include +#include +#include +#include + +#include "suspend2.h" +#include "plugins.h" +#include "proc.h" +#include "suspend2_common.h" +#include "io.h" + +#define S2C_WRITE 0 +#define S2C_READ 1 + +static int s2_expected_compression = 0; + +static struct suspend_plugin_ops s2_compression_ops; +static struct suspend_plugin_ops *next_driver; + +static char s2_compressor_name[32]; +static struct crypto_tfm *s2_compressor_transform; + +static u8 *local_buffer = NULL; +static u8 *page_buffer = NULL; +static unsigned int bufofs; + +static int position = 0; + +/* ---- Local buffer management ---- */ + +/* allocate_local_buffer + * + * Description: Allocates a page of memory for buffering output. + * Returns: Int: Zero if successful, -ENONEM otherwise. + */ + +static int allocate_local_buffer(void) +{ + if (!local_buffer) { + local_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (!local_buffer) { + printk(KERN_ERR + "Failed to allocate the local buffer for " + "suspend2 compression driver.\n"); + return -ENOMEM; + } + } + + if (!page_buffer) { + page_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (!page_buffer) { + printk(KERN_ERR + "Failed to allocate the page buffer for " + "suspend2 compression driver.\n"); + return -ENOMEM; + } + } + + return 0; +} + +/* free_local_buffer + * + * Description: Frees memory allocated for buffering output. + */ + +static inline void free_local_buffer(void) +{ + if (local_buffer) + free_page((unsigned long) local_buffer); + + local_buffer = NULL; + + if (page_buffer) + free_page((unsigned long) page_buffer); + + page_buffer = NULL; +} + +/* suspend2_crypto_cleanup + * + * Description: Frees memory allocated for our labours. + */ + +static void suspend2_crypto_cleanup(void) +{ + if (s2_compressor_transform) { + crypto_free_tfm(s2_compressor_transform); + s2_compressor_transform = NULL; + } +} + +/* suspend2_crypto_prepare + * + * Description: Prepare to do some work by allocating buffers and transforms. + * Returns: Int: Zero if successful, -ENONEM otherwise. + */ + +static int s2_compress_crypto_prepare(int mode) +{ + if (!*s2_compressor_name) { + printk("Suspend2: Compression enabled but no compressor name set.\n"); + return 1; + } + + if (!(s2_compressor_transform = crypto_alloc_tfm(s2_compressor_name, 0))) { + printk("Suspend2: Failed to initialise the %s compression transform.\n", + s2_compressor_name); + return 1; + } + + return 0; +} + +/* ---- Exported functions ---- */ + +/* write_init() + * + * Description: Allocate buffers and prepare to compress data. + * Arguments: Stream_number: Ignored. + * Returns: Zero on success, -ENOMEM if unable to vmalloc. + */ + +static int s2_compress_write_init(int stream_number) +{ + int result; + + next_driver = get_next_filter(&s2_compression_ops); + + if (!next_driver) { + printk("Compression Driver: Argh! No one wants my output!"); + return -ECHILD; + } + + if ((result = s2_compress_crypto_prepare(S2C_WRITE))) { + return result; + } + + if ((result = allocate_local_buffer())) + return result; + + /* Only reset the stats if starting to write an image */ + if (stream_number == 2) + bytes_in = bytes_out = 0; + + bufofs = 0; + + position = 0; + + return 0; +} + +/* s2_compress_write() + * + * Description: Helper function for write_chunk. Write the compressed data. + * Arguments: u8*: Output buffer to be written. + * unsigned int: Length of buffer. + * Return: int: Result to be passed back to caller. + */ + +static int s2_compress_write (u8 *buffer, unsigned int len) +{ + int ret; + + bytes_out += len; + + while (len + bufofs > PAGE_SIZE) { + unsigned int chunk = PAGE_SIZE - bufofs; + memcpy (local_buffer + bufofs, buffer, chunk); + buffer += chunk; + len -= chunk; + bufofs = 0; + if ((ret = next_driver->ops.filter.write_chunk(virt_to_page(local_buffer))) < 0) + return ret; + } + memcpy (local_buffer + bufofs, buffer, len); + bufofs += len; + return 0; +} + +/* s2_compress_write_chunk() + * + * Description: Compress a page of data, buffering output and passing on + * filled pages to the next plugin in the pipeline. + * Arguments: Buffer_page: Pointer to a buffer of size PAGE_SIZE, + * containing data to be compressed. + * Returns: 0 on success. Otherwise the error is that returned by later + * plugins, -ECHILD if we have a broken pipeline or -EIO if + * zlib errs. + */ + +static int s2_compress_write_chunk(struct page *buffer_page) +{ + int ret; + unsigned int len; + u16 len_written; + char *buffer_start; + + if (!s2_compressor_transform) + return next_driver->ops.filter.write_chunk(buffer_page); + + buffer_start = kmap(buffer_page); + + bytes_in += PAGE_SIZE; + + len = PAGE_SIZE; + + ret = crypto_comp_compress(s2_compressor_transform, + buffer_start, PAGE_SIZE, + page_buffer, &len); + + if (ret) { + printk("Compression failed.\n"); + goto failure; + } + + len_written = (u16) len; + + if ((ret = s2_compress_write((u8 *)&len_written, 2)) >= 0) { + if ((ret = s2_compress_write((u8 *) &position, sizeof(position)))) + return -EIO; + if (len < PAGE_SIZE) { // some compression + position += len; + ret = s2_compress_write(page_buffer, len); + } else { + ret = s2_compress_write(buffer_start, PAGE_SIZE); + position += PAGE_SIZE; + } + } + position += 2 + sizeof(int); + + +failure: + kunmap(buffer_page); + return ret; +} + +/* write_cleanup() + * + * Description: Write unflushed data and free workspace. + * Returns: Result of writing last page. + */ + +static int s2_compress_write_cleanup(void) +{ + int ret = 0; + + if (s2_compressor_transform) + ret = next_driver->ops.filter.write_chunk(virt_to_page(local_buffer)); + + suspend2_crypto_cleanup(); + free_local_buffer(); + + return ret; +} + +/* read_init() + * + * Description: Prepare to read a new stream of data. + * Arguments: int: Section of image about to be read. + * Returns: int: Zero on success, error number otherwise. + */ + +static int s2_compress_read_init(int stream_number) +{ + int result; + + next_driver = get_next_filter(&s2_compression_ops); + + if (!next_driver) { + printk("Compression Driver: Argh! No one wants " + "to feed me data!"); + return -ECHILD; + } + + if ((result = s2_compress_crypto_prepare(S2C_READ))) + return result; + + if ((result = allocate_local_buffer())) + return result; + + bufofs = PAGE_SIZE; + + position = 0; + + return 0; +} + +/* s2_compress_read() + * + * Description: Read data into compression buffer. + * Arguments: u8 *: Address of the buffer. + * unsigned int: Length + * Returns: int: Result of reading the image chunk. + */ + +static int s2_compress_read (u8 *buffer, unsigned int len) +{ + int ret; + + while (len + bufofs > PAGE_SIZE) { + unsigned int chunk = PAGE_SIZE - bufofs; + memcpy(buffer, local_buffer + bufofs, chunk); + buffer += chunk; + len -= chunk; + bufofs = 0; + if ((ret = next_driver->ops.filter.read_chunk( + virt_to_page(local_buffer), SUSPEND_SYNC)) < 0) { + return ret; + } + } + memcpy (buffer, local_buffer + bufofs, len); + bufofs += len; + return 0; +} + +/* s2_compress_read_chunk() + * + * Description: Retrieve data from later plugins and decompress it until the + * input buffer is filled. + * Arguments: Buffer_start: Pointer to a buffer of size PAGE_SIZE. + * Sync: Whether the previous plugin (or core) wants its + * data synchronously. + * Returns: Zero if successful. Error condition from me or from downstream + * on failure. + */ + +static int s2_compress_read_chunk(struct page *buffer_page, int sync) +{ + int ret, position_saved; + unsigned int len; + u16 len_written; + char *buffer_start; + + if (!s2_compressor_transform) + return next_driver->ops.filter.read_chunk(buffer_page, SUSPEND_ASYNC); + + /* + * All our reads must be synchronous - we can't decompress + * data that hasn't been read yet. + */ + + buffer_start = kmap(buffer_page); + + if ((ret = s2_compress_read ((u8 *)&len_written, 2)) >= 0) { + len = (unsigned int) len_written; + ret = s2_compress_read((u8 *) &position_saved, sizeof(position_saved)); + if (ret) + return ret; + + if (position != position_saved) { + printk("Position saved (%d) != position I'm at now (%d).\n", + position_saved, position); + BUG_ON(1); + } + if (len >= PAGE_SIZE) { // uncompressed + ret = s2_compress_read(buffer_start, PAGE_SIZE); + if (ret) + return ret; + + position += PAGE_SIZE; + } else { // compressed + if ((ret = s2_compress_read(page_buffer, len)) >= 0) { + int outlen = PAGE_SIZE; + /* Important note. + * + * For Deflate, decompression return values may represent + * errors. Deflate complains when everything is alright, so + * we ignore the errors unless the number of output bytes is + * not PAGE_SIZE. + */ + crypto_comp_decompress(s2_compressor_transform, + page_buffer, len, + buffer_start, &outlen); + if (outlen != PAGE_SIZE) { + printk("Decompression yielded %ld bytes instead of %d.\n", PAGE_SIZE, outlen); + ret = -EIO; + } else + ret = 0; + } + position += len; + } + position += 2 + sizeof(int); + } else + printk("Compress_read returned %d.", ret); + kunmap(buffer_page); + return ret; +} + +/* read_cleanup() + * + * Description: Clean up after reading part or all of a stream of data. + * Returns: int: Always zero. Never fails. + */ + +static int s2_compress_read_cleanup(void) +{ + suspend2_crypto_cleanup(); + free_local_buffer(); + return 0; +} + +/* s2_compress_print_debug_stats + * + * Description: Print information to be recorded for debugging purposes into a + * buffer. + * Arguments: buffer: Pointer to a buffer into which the debug info will be + * printed. + * size: Size of the buffer. + * Returns: Number of characters written to the buffer. + */ + +static int s2_compress_print_debug_stats(char *buffer, int size) +{ + int pages_in = bytes_in >> PAGE_SHIFT, + pages_out = bytes_out >> PAGE_SHIFT; + int len; + + /* Output the compression ratio achieved. */ + len = snprintf_used(buffer, size, "- Compressor %s enabled.\n", + s2_compressor_name); + if (pages_in) + len+= snprintf_used(buffer+len, size - len, + " Compressed %ld bytes into %ld (%d percent compression).\n", + bytes_in, bytes_out, (pages_in - pages_out) * 100 / pages_in); + return len; +} + +/* compression_memory_needed + * + * Description: Tell the caller how much memory we need to operate during + * suspend/resume. + * Returns: Unsigned long. Maximum number of bytes of memory required for + * operation. + */ + +static unsigned long s2_compress_memory_needed(void) +{ + return PAGE_SIZE; +} + +static unsigned long s2_compress_storage_needed(void) +{ + return 2 * sizeof(unsigned long) + sizeof(int); +} + +/* s2_compress_save_config_info + * + * Description: Save informaton needed when reloading the image at resume time. + * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE. + * Returns: Number of bytes used for saving our data. + */ + +static int s2_compress_save_config_info(char *buffer) +{ + int namelen = strlen(s2_compressor_name) + 1; + int total_len; + + *((unsigned long *) buffer) = bytes_in; + *((unsigned long *) (buffer + 1 * sizeof(unsigned long))) = bytes_out; + *((unsigned long *) (buffer + 2 * sizeof(unsigned long))) = s2_expected_compression; + *((unsigned long *) (buffer + 3 * sizeof(unsigned long))) = namelen; + strncpy(buffer + 4 * sizeof(unsigned long), s2_compressor_name, namelen); + total_len = 4 * sizeof(unsigned long) + namelen; + return total_len; +} + +/* s2_compress_load_config_info + * + * Description: Reload information needed for decompressing the image at + * resume time. + * Arguments: Buffer: Pointer to the start of the data. + * Size: Number of bytes that were saved. + */ + +static void s2_compress_load_config_info(char *buffer, int size) +{ + int namelen; + + bytes_in = *((unsigned long *) buffer); + bytes_out = *((unsigned long *) (buffer + 1 * sizeof(unsigned long))); + s2_expected_compression = *((unsigned long *) (buffer + 2 * sizeof(unsigned long))); + namelen = *((unsigned long *) (buffer + 3 * sizeof(unsigned long))); + strncpy(s2_compressor_name, buffer + 4 * sizeof(unsigned long), namelen); + return; +} + +/* suspend2_expected_compression_ratio + * + * Description: Returns the expected ratio between data passed into this plugin + * and the amount of data output when writing. + * Returns: 100 if the plugin is disabled. Otherwise the value set by the + * user via our proc entry. + */ + +int suspend2_expected_compression_ratio(void) +{ + if (s2_compression_ops.disabled) + return 100; + else + return 100 - s2_expected_compression; +} + +static void s2_compressor_disable_if_empty(void) +{ + s2_compression_ops.disabled = !(*s2_compressor_name); +} + +static int s2_compress_initialise(int starting_cycle) +{ + if (starting_cycle) + s2_compressor_disable_if_empty(); + + return 0; +} +/* + * data for our proc entries. + */ + +static struct suspend_proc_data proc_params[] = { + { + .filename = "expected_compression", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &s2_expected_compression, + .minimum = 0, + .maximum = 99, + } + } + }, + + { + .filename = "disable_compression", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &s2_compression_ops.disabled, + .minimum = 0, + .maximum = 1, + } + } + }, + + { + .filename = "compressor", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = s2_compressor_name, + .max_length = 31, + } + }, + .write_proc = &s2_compressor_disable_if_empty, + } +}; + +/* + * Ops structure. + */ + +static struct suspend_plugin_ops s2_compression_ops = { + .type = FILTER_PLUGIN, + .name = "Suspend2 Compressor", + .module = THIS_MODULE, + .memory_needed = s2_compress_memory_needed, + .print_debug_info = s2_compress_print_debug_stats, + .save_config_info = s2_compress_save_config_info, + .load_config_info = s2_compress_load_config_info, + .storage_needed = s2_compress_storage_needed, + + .initialise = s2_compress_initialise, + + .write_init = s2_compress_write_init, + .write_cleanup = s2_compress_write_cleanup, + .read_init = s2_compress_read_init, + .read_cleanup = s2_compress_read_cleanup, + + .ops = { + .filter = { + .write_chunk = s2_compress_write_chunk, + .read_chunk = s2_compress_read_chunk, + } + } +}; + +/* ---- Registration ---- */ + +static __init int s2_compress_load(void) +{ + int result; + int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + printk("Suspend2 Compression Driver loading.\n"); + if (!(result = suspend_register_plugin(&s2_compression_ops))) { + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); + } else + printk("Suspend2 Compression Driver unable to register!\n"); + return result; +} + +#ifdef MODULE +static __exit void s2_compress_unload(void) +{ + printk("Suspend2 Compression Driver unloading.\n"); + for (i=0; i< numfiles; i++) + suspend_unregister_procfile(&proc_params[i]); + suspend_unregister_plugin(&s2_compression_ops); +} + + +module_init(s2_compress_load); +module_exit(s2_compress_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Compression Support for Suspend2"); +#else +late_initcall(s2_compress_load); +#endif diff -urN oldtree/kernel/power/debug_pagealloc.c newtree/kernel/power/debug_pagealloc.c --- oldtree/kernel/power/debug_pagealloc.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/debug_pagealloc.c 2006-02-13 14:51:54.132935800 -0500 @@ -0,0 +1,111 @@ +#include +#ifdef CONFIG_DEBUG_PAGEALLOC +#include +#include + +#include "pageflags.h" +#include "suspend2.h" +#include "pagedir.h" + + +extern pte_t *lookup_address(unsigned long address); + +/* Returns whether it was already in the requested state */ +extern void kernel_map_pages(struct page *page, int numpages, int enable); + +static int page_is_kernel_mapped(struct page *page) +{ + pte_t *kpte; + unsigned long address; + + if (PageHighMem(page)) + return 0; + + address = (unsigned long)page_address(page); + + kpte = lookup_address(address); + if (!kpte) + return 0; + + if (pte_same(*kpte, mk_pte(page, PAGE_KERNEL))) + return 1; + + return 0; +} + +int suspend_map_kernel_page(struct page *page, int enable) +{ + int is_already_mapped = page_is_kernel_mapped(page); + + if (enable == is_already_mapped) + return 1; + + kernel_map_pages(page, 1, enable); + + return 0; +} + +/* + * suspend2_map_atomic_copy_pages + * + * When DEBUG_PAGEALLOC is enabled, we need to map the pages before + * an atomic copy. + */ +void suspend2_map_atomic_copy_pages(void) +{ + int i = 0, source_index = -1, dest_index = -1; + + for (i = 0; i < pagedir1.pageset_size; i++) { + int orig_was_mapped = 1, copy_was_mapped = 1; + struct page *origpage, *copypage; + + source_index = get_next_bit_on(pageset1_map, source_index); + dest_index = get_next_bit_on(pageset1_copy_map, dest_index); + + origpage = pfn_to_page(source_index); + copypage = pfn_to_page(dest_index); + + if (!PageHighMem(origpage)) { + orig_was_mapped = suspend_map_kernel_page(origpage, 1); + if ((!orig_was_mapped) && + (!test_suspend_state(SUSPEND_NOW_RESUMING))) + SetPageUnmap(origpage); + } + + copy_was_mapped = suspend_map_kernel_page(copypage, 1); + if ((!copy_was_mapped) && + (!test_suspend_state(SUSPEND_NOW_RESUMING))) + SetPageUnmap(copypage); + } +} + +/* + * suspend2_unmap_atomic_copy_pages + * + * We also need to unmap pages when DEBUG_PAGEALLOC is enabled. + */ +void suspend2_unmap_atomic_copy_pages(void) +{ + int i; + struct zone *zone; + + for_each_zone(zone) { + if (!zone->present_pages) + continue; + for (i = 0; i < zone->spanned_pages; i++) { + struct page *page = pfn_to_page(zone->zone_start_pfn + i); + if (PageUnmap(page)) + suspend_map_kernel_page(page, 0); + } + } +} +#else +void suspend2_map_atomic_copy_pages(void) { }; + +void suspend2_unmap_atomic_copy_pages(void) { }; + +int suspend_map_kernel_page(struct page *page, int enable) +{ + return 1; +} +#endif diff -urN oldtree/kernel/power/debug_pagealloc.h newtree/kernel/power/debug_pagealloc.h --- oldtree/kernel/power/debug_pagealloc.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/debug_pagealloc.h 2006-02-13 14:51:54.133935648 -0500 @@ -0,0 +1,3 @@ +extern void suspend2_map_atomic_copy_pages(void); +extern void suspend2_unmap_atomic_copy_pages(void); +extern int suspend_map_kernel_page(struct page *page, int enable); diff -urN oldtree/kernel/power/disk.c newtree/kernel/power/disk.c --- oldtree/kernel/power/disk.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/disk.c 2006-02-13 14:51:54.133935648 -0500 @@ -10,6 +10,7 @@ */ #include +#include #include #include #include @@ -89,12 +90,16 @@ unsigned long pages = 0; char *p = "-\\|/"; + thaw_processes(FREEZER_KERNEL_THREADS); + printk("Freeing memory... "); while ((tmp = shrink_all_memory(10000))) { pages += tmp; printk("\b%c", p[i++ % 4]); } printk("\bdone (%li pages freed)\n", pages); + + freeze_processes(); } @@ -130,7 +135,7 @@ free_some_memory(); return 0; thaw: - thaw_processes(); + thaw_processes(FREEZER_ALL_THREADS); enable_nonboot_cpus(); pm_restore_console(); return error; @@ -139,7 +144,7 @@ static void unprepare_processes(void) { platform_finish(); - thaw_processes(); + thaw_processes(FREEZER_ALL_THREADS); enable_nonboot_cpus(); pm_restore_console(); } diff -urN oldtree/kernel/power/encryption.c newtree/kernel/power/encryption.c --- oldtree/kernel/power/encryption.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/encryption.c 2006-02-13 14:51:54.134935496 -0500 @@ -0,0 +1,597 @@ +/* + * kernel/power/suspend2_core/encryption.c + * + * Copyright (C) 2003-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * This file contains data encryption routines for suspend, + * using cryptoapi transforms. + * + * ToDo: + * - Apply min/max_keysize the cipher changes. + * - Test. + */ + +#include +#include +#include +#include +#include +#include + +#include "suspend2.h" +#include "plugins.h" +#include "proc.h" +#include "suspend2_common.h" +#include "io.h" + +#define S2C_WRITE 0 +#define S2C_READ 1 + +static struct suspend_plugin_ops s2_encryption_ops; +static struct suspend_plugin_ops *next_driver; + +static char s2_encryptor_name[32]; +static struct crypto_tfm *s2_encryptor_transform; +static char s2_encryptor_key[256]; +static int s2_key_len; +static char s2_encryptor_iv[256]; +static int s2_encryptor_mode; +static int s2_encryptor_save_key_and_iv; + +static u8 *page_buffer = NULL; +static unsigned int bufofs; + +static struct scatterlist s2_crypt_sg[PAGE_SIZE/8]; + +/* ---- Local buffer management ---- */ + +/* allocate_local_buffer + * + * Description: Allocates a page of memory for buffering output. + * Returns: Int: Zero if successful, -ENONEM otherwise. + */ + +static int allocate_local_buffer(void) +{ + if (!page_buffer) { + int i; + + page_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (!page_buffer) { + printk(KERN_ERR + "Failed to allocate the page buffer for " + "suspend2 encryption driver.\n"); + return -ENOMEM; + } + + for (i=0; i < (PAGE_SIZE / s2_key_len); i++) { + s2_crypt_sg[i].page = virt_to_page(page_buffer); + s2_crypt_sg[i].offset = s2_key_len * i; + s2_crypt_sg[i].length = s2_key_len; + } + } + + return 0; +} + +/* free_local_buffer + * + * Description: Frees memory allocated for buffering output. + */ + +static void free_local_buffer(void) +{ + if (page_buffer) + free_page((unsigned long) page_buffer); + + page_buffer = NULL; +} + +/* suspend2_crypto_cleanup + * + * Description: Frees memory allocated for our labours. + */ + +static void suspend2_crypto_cleanup(void) +{ + if (s2_encryptor_transform) { + crypto_free_tfm(s2_encryptor_transform); + s2_encryptor_transform = NULL; + } +} + +/* suspend2_crypto_prepare + * + * Description: Prepare to do some work by allocating buffers and transforms. + * Returns: Int: Zero if successful, -ENONEM otherwise. + */ + +static int s2_encrypt_crypto_prepare(int mode) +{ + if (!*s2_encryptor_name) { + printk("Suspend2: Encryptor enabled but no name set.\n"); + return 1; + } + + if (!(s2_encryptor_transform = crypto_alloc_tfm(s2_encryptor_name, + 1 << s2_encryptor_mode))) { + printk("Suspend2: Failed to initialise the encryption transform (%s, mode %d).\n", + s2_encryptor_name, s2_encryptor_mode); + return 1; + } + + if (mode) + bufofs = PAGE_SIZE; + else + bufofs = 0; + + s2_key_len = strlen(s2_encryptor_key); + + if (crypto_cipher_setkey(s2_encryptor_transform, s2_encryptor_key, + s2_key_len)) { + printk("%d is an invalid key length for cipher %s.\n", + s2_key_len, + s2_encryptor_name); + return 1; + } + + if (!mode) { + crypto_cipher_set_iv(s2_encryptor_transform, + s2_encryptor_iv, + crypto_tfm_alg_ivsize(s2_encryptor_transform)); + } + + return 0; +} + +/* ---- Exported functions ---- */ + +/* write_init() + * + * Description: Allocate buffers and prepare to encrypt data. + * Arguments: Stream_number: Ignored. + * Returns: Zero on success, -ENOMEM if unable to vmalloc. + */ + +static int s2_encrypt_write_init(int stream_number) +{ + int result; + + next_driver = get_next_filter(&s2_encryption_ops); + + if (!next_driver) { + printk("Encryption Driver: Argh! No one wants my output!"); + return -ECHILD; + } + + if ((result = s2_encrypt_crypto_prepare(S2C_WRITE))) { + set_result_state(SUSPEND_ENCRYPTION_SETUP_FAILED); + suspend2_crypto_cleanup(); + return result; + } + + if ((result = allocate_local_buffer())) + return result; + + /* Only reset the stats if starting to write an image */ + if (stream_number == 2) + bytes_in = bytes_out = 0; + + bufofs = 0; + + return 0; +} + +/* s2_encrypt_write_chunk() + * + * Description: Encrypt a page of data, buffering output and passing on + * filled pages to the next plugin in the pipeline. + * Arguments: Buffer_page: Pointer to a buffer of size PAGE_SIZE, + * containing data to be encrypted. + * Returns: 0 on success. Otherwise the error is that returned by later + * plugins, -ECHILD if we have a broken pipeline or -EIO if + * zlib errs. + */ + +static int s2_encrypt_write_chunk(struct page *buffer_page) +{ + int ret; + unsigned int len; + u16 len_written; + char *buffer_start; + + if (!s2_encryptor_transform) + return next_driver->ops.filter.write_chunk(buffer_page); + + buffer_start = kmap(buffer_page); + memcpy(page_buffer, buffer_start, PAGE_SIZE); + kunmap(buffer_page); + + bytes_in += PAGE_SIZE; + + len = PAGE_SIZE; + + ret = crypto_cipher_encrypt(s2_encryptor_transform, + s2_crypt_sg, s2_crypt_sg, PAGE_SIZE); + + if (ret) { + printk("Encryption failed.\n"); + return -EIO; + } + + len_written = (u16) len; + + ret = next_driver->ops.filter.write_chunk(virt_to_page(page_buffer)); + + return ret; +} + +/* write_cleanup() + * + * Description: Write unflushed data and free workspace. + * Returns: Result of writing last page. + */ + +static int s2_encrypt_write_cleanup(void) +{ + suspend2_crypto_cleanup(); + free_local_buffer(); + + return 0; +} + +/* read_init() + * + * Description: Prepare to read a new stream of data. + * Arguments: int: Section of image about to be read. + * Returns: int: Zero on success, error number otherwise. + */ + +static int s2_encrypt_read_init(int stream_number) +{ + int result; + + next_driver = get_next_filter(&s2_encryption_ops); + + if (!next_driver) { + printk("Encryption Driver: Argh! No one wants " + "to feed me data!"); + return -ECHILD; + } + + if ((result = s2_encrypt_crypto_prepare(S2C_READ))) { + set_result_state(SUSPEND_ENCRYPTION_SETUP_FAILED); + suspend2_crypto_cleanup(); + return result; + } + + if ((result = allocate_local_buffer())) + return result; + + bufofs = PAGE_SIZE; + + return 0; +} + +/* s2_encrypt_read_chunk() + * + * Description: Retrieve data from later plugins and deencrypt it until the + * input buffer is filled. + * Arguments: Buffer_start: Pointer to a buffer of size PAGE_SIZE. + * Sync: Whether the previous plugin (or core) wants its + * data synchronously. + * Returns: Zero if successful. Error condition from me or from downstream + * on failure. + */ + +static int s2_encrypt_read_chunk(struct page *buffer_page, int sync) +{ + int ret; + char *buffer_start; + + if (!s2_encryptor_transform) + return next_driver->ops.filter.read_chunk(buffer_page, sync); + + /* + * All our reads must be synchronous - we can't deencrypt + * data that hasn't been read yet. + */ + + if ((ret = next_driver->ops.filter.read_chunk( + virt_to_page(page_buffer), SUSPEND_SYNC)) < 0) { + printk("Failed to read an encrypted block.\n"); + return ret; + } + + ret = crypto_cipher_decrypt(s2_encryptor_transform, + s2_crypt_sg, s2_crypt_sg, PAGE_SIZE); + + if (ret) + printk("Decrypt function returned %d.\n", ret); + + buffer_start = kmap(buffer_page); + memcpy(buffer_start, page_buffer, PAGE_SIZE); + kunmap(buffer_page); + return ret; +} + +/* read_cleanup() + * + * Description: Clean up after reading part or all of a stream of data. + * Returns: int: Always zero. Never fails. + */ + +static int s2_encrypt_read_cleanup(void) +{ + suspend2_crypto_cleanup(); + free_local_buffer(); + return 0; +} + +/* s2_encrypt_print_debug_stats + * + * Description: Print information to be recorded for debugging purposes into a + * buffer. + * Arguments: buffer: Pointer to a buffer into which the debug info will be + * printed. + * size: Size of the buffer. + * Returns: Number of characters written to the buffer. + */ + +static int s2_encrypt_print_debug_stats(char *buffer, int size) +{ + int len; + + len = snprintf_used(buffer, size, "- Encryptor %s enabled.\n", + s2_encryptor_name); + return len; +} + +/* encryption_memory_needed + * + * Description: Tell the caller how much memory we need to operate during + * suspend/resume. + * Returns: Unsigned long. Maximum number of bytes of memory required for + * operation. + */ + +static unsigned long s2_encrypt_memory_needed(void) +{ + return PAGE_SIZE; +} + +static unsigned long s2_encrypt_storage_needed(void) +{ + return 2 * sizeof(unsigned long) + sizeof(int); +} + +/* s2_encrypt_save_config_info + * + * Description: Save informaton needed when reloading the image at resume time. + * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE. + * Returns: Number of bytes used for saving our data. + */ + +static int s2_encrypt_save_config_info(char *buffer) +{ + int buf_offset, str_size; + + str_size = strlen(s2_encryptor_name); + *buffer = (char) str_size; + strncpy(buffer + 1, s2_encryptor_name, str_size + 1); + buf_offset = str_size + 2; + + *(buffer + buf_offset) = (char) s2_encryptor_mode; + buf_offset++; + + *(buffer + buf_offset) = (char) s2_encryptor_save_key_and_iv; + buf_offset++; + + if (s2_encryptor_save_key_and_iv) { + + str_size = strlen(s2_encryptor_key); + *(buffer + buf_offset) = (char) str_size; + strncpy(buffer + buf_offset + 1, s2_encryptor_key, str_size + 1); + + buf_offset+= str_size + 2; + + str_size = strlen(s2_encryptor_iv); + *(buffer + buf_offset) = (char) str_size; + strncpy(buffer + buf_offset + 1, s2_encryptor_iv, str_size + 1); + + buf_offset += str_size + 2; + } + + return buf_offset; +} + +/* s2_encrypt_load_config_info + * + * Description: Reload information needed for deencrypting the image at + * resume time. + * Arguments: Buffer: Pointer to the start of the data. + * Size: Number of bytes that were saved. + */ + +static void s2_encrypt_load_config_info(char *buffer, int size) +{ + int buf_offset, str_size; + + str_size = (int) *buffer; + strncpy(s2_encryptor_name, buffer + 1, str_size + 1); + buf_offset = str_size + 2; + + s2_encryptor_mode = (int) *(buffer + buf_offset); + buf_offset++; + + s2_encryptor_save_key_and_iv = (int) *(buffer + buf_offset); + buf_offset++; + + if (s2_encryptor_save_key_and_iv) { + str_size = (int) *(buffer + buf_offset); + strncpy(s2_encryptor_key, buffer + buf_offset + 1, str_size + 1); + + buf_offset+= str_size + 2; + + str_size = (int) *(buffer + buf_offset); + strncpy(s2_encryptor_iv, buffer + buf_offset + 1, str_size + 1); + + buf_offset += str_size + 2; + } else { + *s2_encryptor_key = 0; + *s2_encryptor_iv = 0; + } + + if (buf_offset != size) { + printk("Suspend Encryptor config info size mismatch (%d != %d): settings ignored.\n", + buf_offset, size); + *s2_encryptor_key = 0; + *s2_encryptor_iv = 0; + } + return; +} + +static void s2_encryptor_disable_if_empty(void) +{ + s2_encryption_ops.disabled = !(*s2_encryptor_name); +} + +static int s2_encrypt_initialise(int starting_cycle) +{ + if (starting_cycle) + s2_encryptor_disable_if_empty(); + + return 0; +} +/* + * data for our proc entries. + */ + +static struct suspend_proc_data proc_params[] = { + { + .filename = "encryptor", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = s2_encryptor_name, + .max_length = 31, + } + }, + .write_proc = s2_encryptor_disable_if_empty, + }, + + { + .filename = "encryption_mode", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &s2_encryptor_mode, + .minimum = 0, + .maximum = 3, + } + } + }, + + { + .filename = "encryption_save_key_and_iv", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &s2_encryptor_save_key_and_iv, + .minimum = 0, + .maximum = 1, + } + } + }, + + { + .filename = "encryption_key", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = s2_encryptor_key, + .max_length = 255, + } + } + }, + + { + .filename = "encryption_iv", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = s2_encryptor_iv, + .max_length = 255, + } + } + }, + + { + .filename = "disable_encryption", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &s2_encryption_ops.disabled, + .minimum = 0, + .maximum = 1, + } + } + }, + +}; + +/* + * Ops structure. + */ + +static struct suspend_plugin_ops s2_encryption_ops = { + .type = FILTER_PLUGIN, + .name = "Encryptor", + .module = THIS_MODULE, + .memory_needed = s2_encrypt_memory_needed, + .print_debug_info = s2_encrypt_print_debug_stats, + .save_config_info = s2_encrypt_save_config_info, + .load_config_info = s2_encrypt_load_config_info, + .storage_needed = s2_encrypt_storage_needed, + + .initialise = s2_encrypt_initialise, + + .write_init = s2_encrypt_write_init, + .write_cleanup = s2_encrypt_write_cleanup, + .read_init = s2_encrypt_read_init, + .read_cleanup = s2_encrypt_read_cleanup, + + .ops = { + .filter = { + .write_chunk = s2_encrypt_write_chunk, + .read_chunk = s2_encrypt_read_chunk, + } + } +}; + +/* ---- Registration ---- */ + +static __init int s2_encrypt_load(void) +{ + int result; + int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + printk("Suspend2 Encryption Driver loading.\n"); + if (!(result = suspend_register_plugin(&s2_encryption_ops))) { + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); + } else + printk("Suspend2 Encryption Driver unable to register!\n"); + return result; +} + +late_initcall(s2_encrypt_load); diff -urN oldtree/kernel/power/extent.c newtree/kernel/power/extent.c --- oldtree/kernel/power/extent.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/extent.c 2006-02-13 14:51:54.134935496 -0500 @@ -0,0 +1,247 @@ +/* kernel/power/suspend2_core/extent.c + * + * (C) 2003-2005 Nigel Cunningham + * + * Distributed under GPLv2. + * + * These functions encapsulate the manipulation of storage metadata. For + * pageflags, we use dynamically allocated bitmaps. + */ + +#include +#include +#include "plugins.h" +#include "extent.h" +#include "ui.h" + +int extents_allocated = 0; + +/* get_extent + * + * Returns a free extent. May fail, returning NULL instead. + */ + +static struct extent *get_extent(void) +{ + struct extent *result; + + if (!(result = kmalloc(sizeof(struct extent), GFP_ATOMIC))) + return NULL; + + extents_allocated++; + result->minimum = result->maximum = 0; + result->next = NULL; + return result; +} + +/* put_extent. + * + * Frees an extent. Assumes unlinking is done by the caller. + */ +void put_extent(struct extent *extent) +{ + BUG_ON(!extent); + + kfree(extent); + extents_allocated--; +} + +/* put_extent_chain. + * + * Frees a whole chain of extents. + */ +void put_extent_chain(struct extent_chain *chain) +{ + struct extent *this; + + this = chain->first; + + while(this) { + struct extent *next = this->next; + kfree(this); + chain->frees++; + extents_allocated --; + this = next; + } + + BUG_ON(chain->frees != chain->allocs); + chain->first = chain->last = NULL; + chain->size = chain->allocs = chain->frees = 0; +} + +/* append_extent_to_extent_chain + * + * Used where we know a extent is to be added to the end of the list + * and does not need merging with the current last extent. + */ + +int append_extent_to_extent_chain(struct extent_chain *chain, + unsigned long minimum, unsigned long maximum) +{ + struct extent *newextent = NULL; + + newextent = get_extent(); + if (!newextent) { + printk("Error unable to append a new extent to the chain.\n"); + return 2; + } + + chain->allocs++; + chain->size+= (maximum - minimum + 1); + newextent->minimum = minimum; + newextent->maximum = maximum; + newextent->next = NULL; + + if (chain->last) { + chain->last->next = newextent; + chain->last = newextent; + } else + chain->last = chain->first = newextent; + + return 0; +} + +/* serialise_extent_chain + * + * Write a chain in the image. + */ +int serialise_extent_chain(struct extent_chain *chain) +{ + struct extent *this; + int ret, i = 1; + + if ((ret = active_writer->ops.writer.write_header_chunk((char *) chain, + sizeof(struct extent_chain) - 2 * sizeof(struct extent *)))) + return ret; + + this = chain->first; + while (this) { + if ((ret = active_writer->ops.writer.write_header_chunk((char *) this, + 2 * sizeof(unsigned long)))) + return ret; + this = this->next; + i++; + } + return ret; +} + +/* load_extent_chain + * + * Read back a chain saved in the image. + */ +int load_extent_chain(struct extent_chain *chain) +{ + struct extent *this, *last = NULL; + int i, ret; + + if (!(ret = active_writer->ops.writer.read_header_chunk((char *) chain, + sizeof(struct extent_chain) - 2 * sizeof(struct extent *)))) + return ret; + + for (i = 0; i < (chain->allocs - chain->frees); i++) { + this = kmalloc(sizeof(struct extent), GFP_ATOMIC); + BUG_ON(!this); /* Shouldn't run out of memory trying this! */ + this->next = NULL; + if (!(ret = active_writer->ops.writer.read_header_chunk((char *) this, + 2 * sizeof(unsigned long)))) + return ret; + if (last) + last->next = this; + else + chain->first = this; + last = this; + } + chain->last = last; + return ret; +} + +/* extent_state_next + * + * Given a state, progress to the next valid entry. We may begin in an + * invalid state, as we do when invoked from extent_state_goto_start below. + */ +unsigned long extent_state_next(struct extent_iterate_state *state) +{ + if (state->current_chain > state->num_chains) + return 0; + + if (state->current_extent) + GET_EXTENT_NEXT(state->current_extent, state->current_offset); + + while(!state->current_extent) { + int chain_num = ++(state->current_chain); + + if (chain_num > state->num_chains) + return 0; + + state->current_extent = (state->chains + chain_num)->first; + + if (!state->current_extent) + continue; + + state->current_offset = state->current_extent->minimum; + } + + return state->current_offset; +} + +/* extent_state_goto_start + * + * Find the first valid value in a group of chains. + */ +void extent_state_goto_start(struct extent_iterate_state *state) +{ + state->current_chain = -1; + state->current_extent = NULL; + state->current_offset = 0; +} + +/* extent_start_save + * + * Given a state and a struct extent_state_store, save the crreutn + * position in a format that can be used with relocated chains (at + * resume time). + */ + +void extent_state_save(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state) +{ + struct extent *extent; + + saved_state->chain_num = state->current_chain; + saved_state->extent_num = 0; + saved_state->offset = state->current_offset; + + if (saved_state->chain_num == -1) + return; + + extent = (state->chains + state->current_chain)->first; + + while (extent != state->current_extent) { + saved_state->extent_num++; + extent = extent->next; + } +} + +/* extent_start_restore + * + * Restore the position saved by extent_state_save. + */ + +void extent_state_restore(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state) +{ + int posn = saved_state->extent_num; + + if (saved_state->chain_num == -1) { + extent_state_goto_start(state); + return; + } + + state->current_chain = saved_state->chain_num; + state->current_extent = (state->chains + state->current_chain)->first; + state->current_offset = saved_state->offset; + + while (posn--) + state->current_extent = state->current_extent->next; +} diff -urN oldtree/kernel/power/extent.h newtree/kernel/power/extent.h --- oldtree/kernel/power/extent.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/extent.h 2006-02-13 14:51:54.134935496 -0500 @@ -0,0 +1,105 @@ +/* + * kernel/power/extent.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It contains declarations related to extents. Extents are + * suspend's method of storing some of the metadata for the image. + * See extent.c for more info. + * + */ + +#ifndef EXTENT_H +#define EXTENT_H +struct extent_chain { + int size; /* size of the extent ie sum (max-min+1) */ + int allocs; + int frees; + int debug; + char *name; + struct extent *first; + struct extent *last; +}; + +/* + * We rely on extents not fitting evenly into a page. + * The last four bytes are used to store the number + * of the page, to make saving & reloading pages simpler. + */ +struct extent { + unsigned long minimum; + unsigned long maximum; + struct extent *next; +}; + +struct extent_iterate_state { + struct extent_chain *chains; + int num_chains; + int current_chain; + struct extent *current_extent; + unsigned long current_offset; +}; + +struct extent_iterate_saved_state { + int chain_num; + int extent_num; + unsigned long offset; +}; + +#define extent_state_eof(state) ((state)->num_chains < (state)->current_chain) + +#define extent_for_each(extent_chain, extentpointer, value) \ +if ((extent_chain)->first) \ + for ((extentpointer) = (extent_chain)->first, (value) = \ + (extentpointer)->minimum; \ + ((extentpointer) && ((extentpointer)->next || (value) <= \ + (extentpointer)->maximum)); \ + (((value) == (extentpointer)->maximum) ? \ + ((extentpointer) = (extentpointer)->next, (value) = \ + ((extentpointer) ? (extentpointer)->minimum : 0)) : \ + (value)++)) + +/* + * When using compression and expected_compression > 0, + * we allocate fewer swap entries, so GET_EXTENT_NEXT can + * validly run out of data to return. + */ +#define GET_EXTENT_NEXT(currentextent, currentval) \ +{ \ + if (currentextent) { \ + if ((currentval) == (currentextent)->maximum) { \ + if ((currentextent)->next) { \ + (currentextent) = (currentextent)->next; \ + (currentval) = (currentextent)->minimum; \ + } else { \ + (currentextent) = NULL; \ + (currentval) = 0; \ + } \ + } else \ + currentval++; \ + } \ +} + +extern int extents_allocated; +void put_extent(struct extent *extent); +void put_extent_chain(struct extent_chain *chain); +int append_extent_to_extent_chain(struct extent_chain *chain, + unsigned long minimum, unsigned long maximum); +int serialise_extent_chain(struct extent_chain *chain); +int load_extent_chain(struct extent_chain *chain); + +/* swap_entry_to_extent_val & extent_val_to_swap_entry: + * We are putting offset in the low bits so consecutive swap entries + * make consecutive extent values */ +#define swap_entry_to_extent_val(swp_entry) (swp_entry.val) +#define extent_val_to_swap_entry(val) (swp_entry_t) { (val) } + +void extent_state_save(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state); +void extent_state_restore(struct extent_iterate_state *state, + struct extent_iterate_saved_state *saved_state); +void extent_state_goto_start(struct extent_iterate_state *state); +unsigned long extent_state_next(struct extent_iterate_state *state); +#endif diff -urN oldtree/kernel/power/io.c newtree/kernel/power/io.c --- oldtree/kernel/power/io.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/io.c 2006-02-13 14:51:54.136935192 -0500 @@ -0,0 +1,991 @@ +/* + * kernel/power/io.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It contains high level IO routines for suspending. + * + */ + +#include +#include +#include +#include +#include + +#include "version.h" +#include "plugins.h" +#include "pageflags.h" +#include "io.h" +#include "ui.h" +#include "suspend2_common.h" +#include "suspend2.h" +#include "debug_pagealloc.h" +#include "storage.h" + +/* attempt_to_parse_resume_device + * + * Can we suspend, using the current resume2= parameter? + */ +int attempt_to_parse_resume_device(void) +{ + struct list_head *writer; + struct suspend_plugin_ops *this_writer; + int result, returning = 0; + + if (suspend2_activate_storage(0)) + return 0; + + active_writer = NULL; + clear_suspend_state(SUSPEND_RESUME_DEVICE_OK); + set_suspend_state(SUSPEND_DISABLED); + clear_result_state(SUSPEND_ABORTED); + + if (!num_writers) { + printk(name_suspend "No writers have been registered. Suspending will be disabled.\n"); + goto cleanup; + } + + if (!resume2_file[0]) { + printk(name_suspend "Resume2 parameter is empty. Suspending will be disabled.\n"); + goto cleanup; + } + + list_for_each(writer, &suspend_writers) { + this_writer = list_entry(writer, struct suspend_plugin_ops, + ops.writer.writer_list); + + /* + * Not sure why you'd want to disable a writer, but + * we should honour the flag if we're providing it + */ + if (this_writer->disabled) { + printk(name_suspend + "Writer '%s' is disabled. Ignoring it.\n", + this_writer->name); + continue; + } + + result = this_writer->ops.writer.parse_sig_location( + resume2_file, (num_writers == 1)); + + switch (result) { + case -EINVAL: + /* + * For this writer, but not a valid + * configuration. Error already printed. + */ + + goto cleanup; + + case 0: + /* + * For this writer and valid. + */ + + active_writer = this_writer; + + set_suspend_state(SUSPEND_RESUME_DEVICE_OK); + clear_suspend_state(SUSPEND_DISABLED); + printk(name_suspend "Suspending enabled.\n"); + + returning = 1; + goto cleanup; + } + } + printk(name_suspend "No matching enabled writer found. Suspending disabled.\n"); +cleanup: + suspend2_deactivate_storage(0); + return returning; +} + +void attempt_to_parse_resume_device2(void) +{ + suspend2_prepare_usm(); + attempt_to_parse_resume_device(); + suspend2_cleanup_usm(); +} + +/* noresume_reset_plugins + * + * Description: When we read the start of an image, plugins (and especially the + * active writer) might need to reset data structures if we decide + * to invalidate the image rather than resuming from it. + */ + +static void noresume_reset_plugins(void) +{ + struct suspend_plugin_ops *this_filter; + + list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) { + if (this_filter->ops.filter.noresume_reset) + this_filter->ops.filter.noresume_reset(); + } + + if (active_writer && active_writer->ops.writer.noresume_reset) + active_writer->ops.writer.noresume_reset(); +} + +/* fill_suspend_header() + * + * Description: Fill the suspend header structure. + * Arguments: struct suspend_header: Header data structure to be filled. + */ + +static void fill_suspend_header(struct suspend_header *sh) +{ + int i; + + memset((char *)sh, 0, sizeof(*sh)); + + sh->version_code = LINUX_VERSION_CODE; + sh->num_physpages = num_physpages; + sh->orig_mem_free = suspend2_orig_mem_free; + strncpy(sh->machine, system_utsname.machine, 65); + strncpy(sh->version, system_utsname.version, 65); + sh->page_size = PAGE_SIZE; + sh->pagedir = pagedir1; + sh->pageset_2_size = pagedir2.pageset_size; + sh->param0 = suspend_result; + sh->param1 = suspend_action; + sh->param2 = suspend_debug_state; + sh->param3 = console_loglevel; + sh->root_fs = current->fs->rootmnt->mnt_sb->s_dev; + for (i = 0; i < 4; i++) + sh->io_time[i/2][i%2] = + suspend_io_time[i/2][i%2]; +} + +/* + * rw_init_plugins + * + * Iterate over plugins, preparing the ones that will be used to read or write + * data. + */ +static int rw_init_plugins(int write, int which) +{ + struct suspend_plugin_ops *this_plugin; + /* Initialise page transformers */ + list_for_each_entry(this_plugin, &suspend_filters, + ops.filter.filter_list) { + if (this_plugin->disabled) + continue; + if ((write && this_plugin->write_init && + this_plugin->write_init(which)) || + (!write && this_plugin->read_init && + this_plugin->read_init(which))) { + abort_suspend("Failed to initialise the %s filter.", + this_plugin->name); + return 1; + } + } + + /* Initialise writer */ + if ((write && active_writer->write_init(which)) || + (!write && active_writer->read_init(which))) { + abort_suspend("Failed to initialise the writer."); + if (!write) + active_writer->ops.writer.invalidate_image(); + return 1; + } + + /* Initialise other plugins */ + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if ((this_plugin->type == FILTER_PLUGIN) || + (this_plugin->type == WRITER_PLUGIN)) + continue; + if ((write && this_plugin->write_init && + this_plugin->write_init(which)) || + (!write && this_plugin->read_init && + this_plugin->read_init(which))) { + set_result_state(SUSPEND_ABORTED); + return 1; + } + } + + return 0; +} + +/* + * rw_cleanup_plugins + * + * Cleanup components after reading or writing a set of pages. + * Only the writer may fail. + */ +static int rw_cleanup_plugins(int write) +{ + struct suspend_plugin_ops *this_plugin; + int result = 0; + + /* Cleanup other plugins */ + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if ((this_plugin->type == FILTER_PLUGIN) || + (this_plugin->type == WRITER_PLUGIN)) + continue; + if (write) { + if (this_plugin->write_cleanup) + result |= this_plugin->write_cleanup(); + } else + if (this_plugin->read_cleanup) + result |= this_plugin->read_cleanup(); + } + + /* Flush data and cleanup */ + list_for_each_entry(this_plugin, &suspend_filters, + ops.filter.filter_list) { + if (this_plugin->disabled) + continue; + if (write) { + if (this_plugin->write_cleanup) + result |= this_plugin->write_cleanup(); + } else + if (this_plugin->read_cleanup) + result |= this_plugin->read_cleanup(); + } + + if (write) + result |= active_writer->write_cleanup(); + else + result |= active_writer->read_cleanup(); + + return result; +} + +/* + * do_rw_loop + * + * The main I/O loop for reading or writing pages. + */ +static int do_rw_loop(int write, int finish_at, dyn_pageflags_t *pageflags, + int base, int barmax) +{ + int current_page_index = -1, pc, step = 1, nextupdate = 0, i; + int result; + struct suspend_plugin_ops *first_filter = get_next_filter(NULL); + + current_page_index = get_next_bit_on(*pageflags, -1); + + pc = finish_at / 5; + + /* Read the pages */ + for (i=0; i< finish_at; i++) { + int was_mapped = 0; + struct page *page = pfn_to_page(current_page_index); + + /* Status */ + if ((i+base) >= nextupdate) + nextupdate = suspend2_update_status(i+base, barmax, + " %d/%d MB ", MB(base+i+1), MB(barmax)); + + if ((i + 1) == pc) { + printk("%d%%...", 20 * step); + step++; + pc = finish_at * step / 5; + } + + was_mapped = suspend_map_kernel_page(page, 1); + if (write) + result = first_filter->ops.filter.write_chunk(page); + else + result = first_filter->ops.filter.read_chunk(page, + SUSPEND_ASYNC); + if (!was_mapped) + suspend_map_kernel_page(page, 0); + + if (result) { + if (write) { + printk("Write chunk returned %d.\n", result); + abort_suspend("Failed to write a chunk of the " + "image."); + return result; + } else + panic("Failed to read chunk %d/%d of the image. (%d)", + i, finish_at, result); + } + + /* Interactivity*/ + check_shift_keys(0, NULL); + + if (test_result_state(SUSPEND_ABORTED) && write) + return 1; + + /* Prepare next */ + current_page_index = get_next_bit_on(*pageflags, + current_page_index); + } + + printk("done.\n"); + + suspend2_update_status(base + finish_at, barmax, " %d/%d MB ", + MB(base + finish_at), MB(barmax)); + return 0; +} + +/* write_pageset() + * + * Description: Write a pageset to disk. + * Arguments: pagedir: Pointer to the pagedir to be saved. + * whichtowrite: Controls what debugging output is printed. + * Returns: Zero on success or -1 on failure. + */ + +int write_pageset(struct pagedir *pagedir, int whichtowrite) +{ + int finish_at, base = 0, start_time, end_time; + int barmax = pagedir1.pageset_size + pagedir2.pageset_size; + long error = 0; + dyn_pageflags_t *pageflags; + + /* + * Even if there is nothing to read or write, the writer + * may need the init/cleanup for it's housekeeping. (eg: + * Pageset1 may start where pageset2 ends when writing). + */ + finish_at = pagedir->pageset_size; + + if (whichtowrite == 1) { + suspend2_prepare_status(DONT_CLEAR_BAR, + "Writing kernel & process data..."); + base = pagedir2.pageset_size; + if (test_action_state(SUSPEND_TEST_FILTER_SPEED) || + test_action_state(SUSPEND_TEST_BIO)) + pageflags = &pageset1_map; + else + pageflags = &pageset1_copy_map; + } else { + suspend2_prepare_status(CLEAR_BAR, "Writing caches..."); + pageflags = &pageset2_map; + bytes_in = bytes_out = 0; + } + + start_time = jiffies; + + if (!rw_init_plugins(1, whichtowrite)) + error = do_rw_loop(1, finish_at, pageflags, base, barmax); + + if (rw_cleanup_plugins(1)) { + abort_suspend("Failed to cleanup after writing."); + error = 1; + } + + /* Statistics */ + end_time = jiffies; + + if ((end_time - start_time) && (!test_result_state(SUSPEND_ABORTED))) { + suspend_io_time[0][0] += finish_at, + suspend_io_time[0][1] += (end_time - start_time); + } + + return error; +} + +/* read_pageset() + * + * Description: Read a pageset from disk. + * Arguments: pagedir: Pointer to the pagedir to be saved. + * whichtowrite: Controls what debugging output is printed. + * overwrittenpagesonly: Whether to read the whole pageset or + * only part. + * Returns: Zero on success or -1 on failure. + */ + +static int read_pageset(struct pagedir *pagedir, int whichtoread, + int overwrittenpagesonly) +{ + int result = 0, base = 0, start_time, end_time; + int finish_at = pagedir->pageset_size; + int barmax = pagedir1.pageset_size + pagedir2.pageset_size; + dyn_pageflags_t *pageflags; + + if (whichtoread == 1) { + suspend2_prepare_status(CLEAR_BAR, + "Reading kernel & process data..."); + pageflags = &pageset1_copy_map; + } else { + suspend2_prepare_status(DONT_CLEAR_BAR, "Reading caches..."); + if (overwrittenpagesonly) + barmax = finish_at = min(pagedir1.pageset_size, + pagedir2.pageset_size); + else { + base = pagedir1.pageset_size; + } + pageflags = &pageset2_map; + } + + start_time = jiffies; + + if (rw_init_plugins(0, whichtoread)) { + active_writer->ops.writer.invalidate_image(); + result = 1; + } else + result = do_rw_loop(0, finish_at, pageflags, base, barmax); + + if (rw_cleanup_plugins(0)) { + abort_suspend("Failed to cleanup after reading."); + result = 1; + } + + /* Statistics */ + end_time=jiffies; + + if ((end_time - start_time) && (!test_result_state(SUSPEND_ABORTED))) { + suspend_io_time[1][0] += finish_at, + suspend_io_time[1][1] += (end_time - start_time); + } + + return result; +} + +/* write_plugin_configs() + * + * Description: Store the configuration for each plugin in the image header. + * Returns: Int: Zero on success, Error value otherwise. + */ +static int write_plugin_configs(void) +{ + struct suspend_plugin_ops *this_plugin; + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + int len, index = 1; + struct plugin_header plugin_header; + + if (!buffer) { + printk("Failed to allocate a buffer for saving " + "plugin configuration info.\n"); + return -ENOMEM; + } + + /* + * We have to know which data goes with which plugin, so we at + * least write a length of zero for a plugin. Note that we are + * also assuming every plugin's config data takes <= PAGE_SIZE. + */ + + /* For each plugin (in registration order) */ + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + + /* Get the data from the plugin */ + len = 0; + if (this_plugin->save_config_info) + len = this_plugin->save_config_info(buffer); + + /* Save the details of the plugin */ + plugin_header.disabled = this_plugin->disabled; + plugin_header.type = this_plugin->type; + plugin_header.index = index++; + strncpy(plugin_header.name, this_plugin->name, + sizeof(plugin_header.name)); + active_writer->ops.writer.write_header_chunk( + (char *) &plugin_header, + sizeof(plugin_header)); + + /* Save the size of the data and any data returned */ + active_writer->ops.writer.write_header_chunk((char *) &len, + sizeof(int)); + if (len) + active_writer->ops.writer.write_header_chunk( + buffer, len); + } + + /* Write a blank header to terminate the list */ + plugin_header.name[0] = '\0'; + active_writer->ops.writer.write_header_chunk( + (char *) &plugin_header, + sizeof(plugin_header)); + + free_page((unsigned long) buffer); + return 0; +} + +/* read_plugin_configs() + * + * Description: Reload plugin configurations from the image header. + * Returns: Int. Zero on success, error value otherwise. + */ + +static int read_plugin_configs(void) +{ + struct suspend_plugin_ops *this_plugin; + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + int len, result = 0; + struct plugin_header plugin_header; + + if (!buffer) { + printk("Failed to allocate a buffer for reloading plugin " + "configuration info.\n"); + return -ENOMEM; + } + + /* All plugins are initially disabled. That way, if we have a plugin + * loaded now that wasn't loaded when we suspended, it won't be used + * in trying to read the data. + */ + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) + this_plugin->disabled = 1; + + /* Get the first plugin header */ + result = active_writer->ops.writer.read_header_chunk( + (char *) &plugin_header, sizeof(plugin_header)); + if (!result) { + printk("Failed to read the next plugin header.\n"); + free_page((unsigned long) buffer); + return -EINVAL; + } + + /* For each plugin (in registration order) */ + while (plugin_header.name[0]) { + + /* Find the plugin */ + this_plugin = find_plugin_given_name(plugin_header.name); + + if (!this_plugin) { + /* + * Is it used? Only need to worry about filters. The active + * writer must be loaded! + */ + if ((!plugin_header.disabled) && + (plugin_header.type == FILTER_PLUGIN)) { + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "It looks like we need plugin %s for " + "reading the image but it hasn't been " + "registered.\n", + plugin_header.name); + if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) { + active_writer->ops.writer.invalidate_image(); + result = -EINVAL; + noresume_reset_plugins(); + free_page((unsigned long) buffer); + return -EINVAL; + } + } else + printk("Plugin %s configuration data found, but the plugin " + "hasn't registered. Looks like it was disabled, so " + "we're ignoring it's data.", + plugin_header.name); + } + + /* Get the length of the data (if any) */ + result = active_writer->ops.writer.read_header_chunk( + (char *) &len, sizeof(int)); + if (!result) { + printk("Failed to read the length of the plugin %s's" + " configuration data.\n", + plugin_header.name); + free_page((unsigned long) buffer); + return -EINVAL; + } + + /* Read any data and pass to the plugin (if we found one) */ + if (len) { + active_writer->ops.writer.read_header_chunk(buffer, len); + if (this_plugin) { + if (!this_plugin->save_config_info) { + printk("Huh? Plugin %s appears to have a " + "save_config_info, but not a " + "load_config_info function!\n", + this_plugin->name); + } else + this_plugin->load_config_info(buffer, len); + } + } + + if (this_plugin) { + /* Now move this plugin to the tail of its lists. This will put it + * in order. Any new plugins will end up at the top of the lists. + * They should have been set to disabled when loaded (people will + * normally not edit an initrd to load a new module and then + * suspend without using it!). + */ + + suspend_move_plugin_tail(this_plugin); + + /* + * We apply the disabled state; plugins don't need to save whether they + * were disabled and if they do, we override them anyway. + */ + this_plugin->disabled = plugin_header.disabled; + } + + /* Get the next plugin header */ + result = active_writer->ops.writer.read_header_chunk( + (char *) &plugin_header, sizeof(plugin_header)); + + if (!result) { + printk("Failed to read the next plugin header.\n"); + free_page((unsigned long) buffer); + return -EINVAL; + } + + } + + free_page((unsigned long) buffer); + return 0; +} + +/* write_image_header() + * + * Description: Write the image header after write the image proper. + * Returns: Int. Zero on success or -1 on failure. + */ + +int write_image_header(void) +{ + int ret; + int total = pagedir1.pageset_size + pagedir2.pageset_size+2; + char *header_buffer = NULL; + + /* Now prepare to write the header */ + if ((ret = active_writer->ops.writer.write_header_init())) { + abort_suspend("Active writer's write_header_init" + " function failed."); + goto write_image_header_abort; + } + + /* Get a buffer */ + header_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + if (!header_buffer) { + abort_suspend("Out of memory when trying to get page " + "for header!"); + goto write_image_header_abort; + } + + /* Write suspend header */ + fill_suspend_header((struct suspend_header *) header_buffer); + active_writer->ops.writer.write_header_chunk(header_buffer, + sizeof(struct suspend_header)); + + free_page((unsigned long) header_buffer); + + /* Write plugin configurations */ + if ((ret = write_plugin_configs())) { + abort_suspend("Failed to write plugin configs."); + goto write_image_header_abort; + } + + save_dyn_pageflags(pageset1_map); + + if (active_writer->ops.writer.serialise_extents && + (ret = active_writer->ops.writer.serialise_extents())) { + abort_suspend("Active writer's prepare_save_extents " + "function failed."); + goto write_image_header_abort; + } + + /* Flush data and let writer cleanup */ + if (active_writer->ops.writer.write_header_cleanup()) { + abort_suspend("Failed to cleanup writing header."); + goto write_image_header_abort_no_cleanup; + } + + if (test_result_state(SUSPEND_ABORTED)) + goto write_image_header_abort_no_cleanup; + + suspend_message(SUSPEND_IO, SUSPEND_VERBOSE, 1, "|\n"); + suspend2_update_status(total, total, NULL); + + return 0; + +write_image_header_abort: + active_writer->ops.writer.write_header_cleanup(); +write_image_header_abort_no_cleanup: + return -1; +} + +/* sanity_check() + * + * Description: Perform a few checks, seeking to ensure that the kernel being + * booted matches the one suspended. They need to match so we can + * be _sure_ things will work. It is not absolutely impossible for + * resuming from a different kernel to work, just not assured. + * Arguments: Struct suspend_header. The header which was saved at suspend + * time. + */ +static char *sanity_check(struct suspend_header *sh) +{ + if (sh->version_code != LINUX_VERSION_CODE) + return "Incorrect kernel version."; + + if (sh->num_physpages != num_physpages) + return "Incorrect memory size."; + + if (strncmp(sh->machine, system_utsname.machine, 65)) + return "Incorrect machine type."; + + if (strncmp(sh->version, system_utsname.version, 65)) + return "Right kernel version but wrong build number."; + + if (sh->page_size != PAGE_SIZE) + return "Incorrect PAGE_SIZE."; + + if ((sh->root_fs == current->fs->rootmnt->mnt_sb->s_dev) && + (!test_suspend_state(SUSPEND_IGNORE_ROOTFS))) + return "Root filesystem has been mounted prior to trying to resume."; + + return 0; +} + +/* __read_pageset1 + * + * Description: Test for the existence of an image and attempt to load it. + * Returns: Int. Zero if image found and pageset1 successfully loaded. + * Error if no image found or loaded. + */ +static int __read_pageset1(void) +{ + int i, result = 0; + char *header_buffer = (char *) get_zeroed_page(GFP_ATOMIC), *sanity_error = NULL; + struct suspend_header *suspend_header; + + if (!header_buffer) + return -ENOMEM; + + /* Check for an image */ + if (!(result = active_writer->ops.writer.image_exists())) { + result = -ENODATA; + noresume_reset_plugins(); + goto out; + } + + /* Check for noresume command line option */ + if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED)) { + active_writer->ops.writer.invalidate_image(); + result = -EINVAL; + noresume_reset_plugins(); + goto out; + } + + /* Check whether we've resumed before */ + if (test_suspend_state(SUSPEND_RESUMED_BEFORE)) { + int resumed_before_default = 0; + if (test_suspend_state(SUSPEND_RETRY_RESUME)) + resumed_before_default = SUSPEND_CONTINUE_REQ; + suspend_early_boot_message(1, resumed_before_default, NULL); + clear_suspend_state(SUSPEND_RETRY_RESUME); + if (!(test_suspend_state(SUSPEND_CONTINUE_REQ))) { + active_writer->ops.writer.invalidate_image(); + result = -EINVAL; + noresume_reset_plugins(); + goto out; + } + } + + clear_suspend_state(SUSPEND_CONTINUE_REQ); + + /* + * Prepare the active writer for reading the image header. The + * activate writer might read its own configuration. + * + * NB: This call may never return because there might be a signature + * for a different image such that we warn the user and they choose + * to reboot. (If the device ids look erroneous (2.4 vs 2.6) or the + * location of the image might be unavailable if it was stored on a + * network connection. + */ + + if ((result = active_writer->ops.writer.read_header_init())) { + noresume_reset_plugins(); + goto out; + } + + /* Read suspend header */ + if ((result = active_writer->ops.writer.read_header_chunk( + header_buffer, sizeof(struct suspend_header))) < 0) { + noresume_reset_plugins(); + goto out; + } + + suspend_header = (struct suspend_header *) header_buffer; + + /* + * NB: This call may also result in a reboot rather than returning. + */ + + if ((sanity_error = sanity_check(suspend_header)) && + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, sanity_error)) { + active_writer->ops.writer.invalidate_image(); + result = -EINVAL; + noresume_reset_plugins(); + goto out; + } + + /* + * We have an image and it looks like it will load okay. + */ + + /* Get metadata from header. Don't override commandline parameters. + * + * We don't need to save the image size limit because it's not used + * during resume and will be restored with the image anyway. + */ + + suspend2_orig_mem_free = suspend_header->orig_mem_free; + memcpy((char *) &pagedir1, + (char *) &suspend_header->pagedir, sizeof(pagedir1)); + suspend_result = suspend_header->param0; + if (!test_suspend_state(SUSPEND_ACT_USED)) + suspend_action = suspend_header->param1; + if (!test_suspend_state(SUSPEND_DBG_USED)) + suspend_debug_state = suspend_header->param2; + if (!test_suspend_state(SUSPEND_LVL_USED)) + suspend_default_console_level = suspend_header->param3; + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + pagedir2.pageset_size = suspend_header->pageset_2_size; + for (i = 0; i < 4; i++) + suspend_io_time[i/2][i%2] = + suspend_header->io_time[i/2][i%2]; + + /* Read plugin configurations */ + if ((result = read_plugin_configs())) { + noresume_reset_plugins(); + pagedir1.pageset_size = + pagedir2.pageset_size = 0; + goto out; + } + + suspend2_prepare_console(); + + check_shift_keys(1, "About to read original pageset1 locations."); + /* Read original pageset1 locations. These are the addresses we can't use for + * the data to be restored */ + allocate_dyn_pageflags(&pageset1_map); + load_dyn_pageflags(pageset1_map); + + allocate_dyn_pageflags(&conflicting_pages_map); + + set_suspend_state(SUSPEND_NOW_RESUMING); + + /* Relocate it so that it's not overwritten while we're using it to + * copy the original contents back */ + relocate_dyn_pageflags(&pageset1_map); + relocate_dyn_pageflags(&conflicting_pages_map); + + allocate_dyn_pageflags(&pageset1_copy_map); + relocate_dyn_pageflags(&pageset1_copy_map); + + /* Read extent pages */ + if (active_writer->ops.writer.load_extents && + (result = active_writer->ops.writer.load_extents())) { + noresume_reset_plugins(); + abort_suspend("Active writer's load_extents " + "function failed."); + goto out_reset_console; + } + + /* Clean up after reading the header */ + if ((result = active_writer->ops.writer.read_header_cleanup())) { + noresume_reset_plugins(); + goto out_reset_console; + } + + check_shift_keys(1, "About to read pagedir."); + + /* + * Get the addresses of pages into which we will load the kernel to + * be copied back + */ + if (suspend2_get_pageset1_load_addresses()) { + result = -ENOMEM; + noresume_reset_plugins(); + goto out_reset_console; + } + + /* Read the original kernel back */ + check_shift_keys(1, "About to read pageset 1."); + + if (read_pageset(&pagedir1, 1, 0)) { + suspend2_prepare_status(CLEAR_BAR, "Failed to read pageset 1."); + result = -EPERM; + noresume_reset_plugins(); + goto out_reset_console; + } + + check_shift_keys(1, "About to restore original kernel."); + result = 0; + + if (!test_action_state(SUSPEND_KEEP_IMAGE) && + active_writer->ops.writer.mark_resume_attempted) + active_writer->ops.writer.mark_resume_attempted(); + +out: + free_page((unsigned long) header_buffer); + return result; + +out_reset_console: + free_dyn_pageflags(&pageset1_map); + free_dyn_pageflags(&pageset1_copy_map); + free_dyn_pageflags(&conflicting_pages_map); + suspend2_cleanup_console(); + goto out; +} + +/* read_pageset1() + * + * Description: Attempt to read the header and pageset1 of a suspend image. + * Handle the outcome, complaining where appropriate. + */ +extern int block_dump; + +int read_pageset1(void) +{ + int error; + + block_dump = 1; + + error = __read_pageset1(); + + block_dump = 0; + + switch (error) { + case 0: + case -ENODATA: + case -EINVAL: /* non fatal error */ + return error; + case -EIO: + printk(KERN_CRIT name_suspend "I/O error\n"); + break; + case -ENOENT: + printk(KERN_CRIT name_suspend "No such file or directory\n"); + break; + case -EPERM: + printk(KERN_CRIT name_suspend "Sanity check error\n"); + break; + default: + printk(KERN_CRIT name_suspend "Error %d resuming\n", error); + break; + } + abort_suspend("Error %d in read_pageset1",error); + return error; +} + +/* read_pageset2() + * + * Description: Read in part or all of pageset2 of an image, depending upon + * whether we are suspending and have only overwritten a portion + * with pageset1 pages, or are resuming and need to read them + * all. + * Arguments: Int. Boolean. Read only pages which would have been + * overwritten by pageset1? + * Returns: Int. Zero if no error, otherwise the error value. + */ +int read_pageset2(int overwrittenpagesonly) +{ + int result = 0; + + if (!pagedir2.pageset_size) + return 0; + + result = read_pageset(&pagedir2, 2, overwrittenpagesonly); + + suspend2_update_status(100, 100, NULL); + check_shift_keys(1, "Pagedir 2 read."); + + return result; +} diff -urN oldtree/kernel/power/io.h newtree/kernel/power/io.h --- oldtree/kernel/power/io.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/io.h 2006-02-13 14:51:54.136935192 -0500 @@ -0,0 +1,38 @@ +/* + * kernel/power/io.h + */ + +#include "pagedir.h" + +/* Non-plugin data saved in our image header */ +struct suspend_header { + u32 version_code; + unsigned long num_physpages; + unsigned long orig_mem_free; + char machine[65]; + char version[65]; + int num_cpus; + int page_size; + int pageset_2_size; + int param0; + int param1; + int param2; + int param3; + int progress0; + int progress1; + int progress2; + int progress3; + int io_time[2][2]; + struct pagedir pagedir; + dev_t root_fs; +}; + +extern int write_pageset(struct pagedir *pagedir, int whichtowrite); +extern int write_image_header(void); +extern int read_pageset1(void); +extern int read_pageset2(int overwrittenpagesonly); + +extern int attempt_to_parse_resume_device(void); +extern void attempt_to_parse_resume_device2(void); +extern dev_t name_to_dev_t(char *line); +extern __nosavedata unsigned long bytes_in, bytes_out; diff -urN oldtree/kernel/power/main.c newtree/kernel/power/main.c --- oldtree/kernel/power/main.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/main.c 2006-02-13 14:51:54.136935192 -0500 @@ -9,6 +9,7 @@ */ #include +#include #include #include #include @@ -95,7 +96,7 @@ if (pm_ops->finish) pm_ops->finish(state); Thaw: - thaw_processes(); + thaw_processes(FREEZER_ALL_THREADS); Enable_cpu: enable_nonboot_cpus(); pm_restore_console(); @@ -103,7 +104,7 @@ } -static int suspend_enter(suspend_state_t state) +int suspend_enter(suspend_state_t state) { int error = 0; unsigned long flags; @@ -135,7 +136,7 @@ device_resume(); if (pm_ops && pm_ops->finish) pm_ops->finish(state); - thaw_processes(); + thaw_processes(FREEZER_ALL_THREADS); enable_nonboot_cpus(); pm_restore_console(); } diff -urN oldtree/kernel/power/netlink.c newtree/kernel/power/netlink.c --- oldtree/kernel/power/netlink.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/netlink.c 2006-02-13 14:51:54.137935040 -0500 @@ -0,0 +1,365 @@ +/* + * netlink.c + * + * Functions for communicating with a userspace helper via netlink. + */ + + +#include +#include "netlink.h" + +#ifdef CONFIG_NET +struct user_helper_data *uhd_list = NULL; + +/* + * Refill our pool of SKBs for use in emergencies (eg, when eating memory and none + * can be allocated). + */ +static void suspend2_fill_skb_pool(struct user_helper_data *uhd) +{ + while (uhd->pool_level < uhd->pool_limit) { + struct sk_buff *new_skb = + alloc_skb(NLMSG_SPACE(uhd->skb_size), GFP_ATOMIC); + + if (!new_skb) + break; + + new_skb->next = uhd->emerg_skbs; + uhd->emerg_skbs = new_skb; + uhd->pool_level++; + } +} + +/* + * Try to allocate a single skb. If we can't get one, try to use one from + * our pool. + */ +static struct sk_buff *suspend2_get_skb(struct user_helper_data *uhd) +{ + struct sk_buff *skb = + alloc_skb(NLMSG_SPACE(uhd->skb_size), GFP_ATOMIC); + + if (skb) + return skb; + + skb = uhd->emerg_skbs; + if (skb) { + uhd->pool_level--; + uhd->emerg_skbs = skb->next; + skb->next = NULL; + } + + return skb; +} + +static void put_skb(struct user_helper_data *uhd, struct sk_buff *skb) +{ + if (uhd->pool_level < uhd->pool_limit) { + skb->next = uhd->emerg_skbs; + uhd->emerg_skbs = skb; + } else + kfree_skb(skb); +} + + +static void suspend2_notify_userspace(void* data) +{ + struct task_struct *t; + struct user_helper_data *uhd = (struct user_helper_data *) data; + + BUG_ON(!uhd); + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(uhd->pid))) + wake_up_process(t); + read_unlock(&tasklist_lock); +} + +DECLARE_WORK(suspend2_notify_userspace_work, suspend2_notify_userspace, NULL); + +void suspend2_send_netlink_message(struct user_helper_data *uhd, + int type, void* params, size_t len) +{ + struct sk_buff *skb; + struct nlmsghdr *nlh; + void *dest; + + skb = suspend2_get_skb(uhd); + if (!skb) { + printk("suspend_netlink: Can't allocate skb!\n"); + return; + } + + /* NLMSG_PUT contains a hidden goto nlmsg_failure */ + nlh = NLMSG_PUT(skb, 0, uhd->sock_seq, type, len); + uhd->sock_seq++; + + dest = NLMSG_DATA(nlh); + if (params && len > 0) + memcpy(dest, params, len); + + netlink_unicast(uhd->nl, skb, uhd->pid, 0); + + /* We may be in an interrupt context so defer waking up userspace */ + suspend2_notify_userspace_work.data = uhd; + schedule_work(&suspend2_notify_userspace_work); + + return; + +nlmsg_failure: + if (skb) + put_skb(uhd, skb); +} + +#ifdef CONFIG_PM_DEBUG +static int is_debugging = 1; +#else +static int is_debugging = 0; +#endif + +static void send_whether_debugging(struct user_helper_data *uhd) +{ + suspend2_send_netlink_message(uhd, NETLINK_MSG_IS_DEBUGGING, + &is_debugging, sizeof(int)); +} + +/* + * Set the PF_NOFREEZE flag on the given process to ensure it can run whilst we + * are suspending. + */ +static int nl_set_nofreeze(struct user_helper_data *uhd, int pid) +{ + struct task_struct *t; + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(pid)) == NULL) { + read_unlock(&tasklist_lock); + printk("Strange. Can't find the userspace task %d.\n", pid); + return -EINVAL; + } + + t->flags |= PF_NOFREEZE; + + read_unlock(&tasklist_lock); + uhd->pid = pid; + + suspend2_send_netlink_message(uhd, NETLINK_MSG_NOFREEZE_ACK, NULL, 0); + + return 0; +} + +/* + * Called when the userspace process has informed us that it's ready to roll. + */ +static int nl_ready(struct user_helper_data *uhd, int version) +{ + if (version != uhd->interface_version) { + printk("%s userspace process using invalid interface version." + " Trying to continue without it.\n", + uhd->name); + if (uhd->not_ready) + uhd->not_ready(); + return 1; + } + + complete(&uhd->wait_for_process); + + return 0; +} + +static int suspend2_nl_gen_rcv_msg(struct user_helper_data *uhd, + struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + int err; + + /* Let the more specific handler go first. It returns + * 1 for valid messages that it doesn't know. */ + if ((err = uhd->rcv_msg(skb, nlh)) != 1) + return err; + + type = nlh->nlmsg_type; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && uhd->pid != -1) { + printk("Received extra nofreeze me requests.\n"); + return -EBUSY; + } + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case NETLINK_MSG_NOFREEZE_ME: + if ((err = nl_set_nofreeze(uhd, nlh->nlmsg_pid)) != 0) + return err; + break; + case NETLINK_MSG_GET_DEBUGGING: + send_whether_debugging(uhd); + break; + case NETLINK_MSG_READY: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) { + printk("Invalid ready mesage.\n"); + return -EINVAL; + } + if ((err = nl_ready(uhd, *data)) != 0) + return err; + break; + } + + return 0; +} + +static void suspend2_user_rcv_skb(struct user_helper_data *uhd, + struct sk_buff *skb) +{ + int err; + struct nlmsghdr *nlh; + + while (skb->len >= NLMSG_SPACE(0)) { + u32 rlen; + + nlh = (struct nlmsghdr *) skb->data; + if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len) + return; + + rlen = NLMSG_ALIGN(nlh->nlmsg_len); + if (rlen > skb->len) + rlen = skb->len; + + if ((err = suspend2_nl_gen_rcv_msg(uhd, skb, nlh)) != 0) + netlink_ack(skb, nlh, err); + else if (nlh->nlmsg_flags & NLM_F_ACK) + netlink_ack(skb, nlh, 0); + skb_pull(skb, rlen); + } +} + +static void suspend2_netlink_input(struct sock *sk, int len) +{ + struct user_helper_data *uhd = uhd_list; + + while (uhd && uhd->netlink_id != sk->sk_protocol) + uhd= uhd->next; + + BUG_ON(!uhd); + + do { + struct sk_buff *skb; + while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) { + suspend2_user_rcv_skb(uhd, skb); + put_skb(uhd, skb); + } + } while (uhd->nl && uhd->nl->sk_receive_queue.qlen); +} + +static int netlink_prepare(struct user_helper_data *uhd) +{ + uhd->next = uhd_list; + uhd_list = uhd; + + uhd->sock_seq = 0x42c0ffee; + uhd->nl = netlink_kernel_create(uhd->netlink_id, 0, + suspend2_netlink_input, THIS_MODULE); + if (!uhd->nl) { + printk("Failed to allocate netlink socket for %s.\n", + uhd->name); + return -ENOMEM; + } + + suspend2_fill_skb_pool(uhd); + + return 0; +} + +void suspend2_netlink_close(struct user_helper_data *uhd) +{ + if (uhd->nl) { + sock_release(uhd->nl->sk_socket); + uhd->nl = NULL; + } + + while (uhd->emerg_skbs) { + struct sk_buff *next = uhd->emerg_skbs->next; + kfree_skb(uhd->emerg_skbs); + uhd->emerg_skbs = next; + } +} + +static int launch_userpace_program(struct user_helper_data *uhd) +{ + int retval; + static char *envp[] = { + "HOME=/", + "TERM=linux", + "PATH=/sbin:/usr/sbin:/bin:/usr/bin", + NULL }; + static char *argv[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }; + char *channel = kmalloc(6, GFP_KERNEL); + int arg = 0, size; + char test_read[255]; + char *orig_posn = uhd->program; + + if (!strlen(orig_posn)) + return 1; + + while (arg < 7) { + sscanf(orig_posn, "%s", test_read); + size = strlen(test_read); + if (!(size)) + break; + argv[arg] = kmalloc(size + 1, GFP_ATOMIC); + strcpy(argv[arg], test_read); + orig_posn += size + 1; + *test_read = 0; + arg++; + } + + sprintf(channel, "-c%d", uhd->netlink_id); + argv[arg] = channel; + + retval = call_usermodehelper(argv[0], argv, envp, 0); + + if (retval) + printk("suspend_netlink: Failed to launch userui program: Error %d\n", retval); + + { + int i; + for (i = 0; i < arg; i++) + if (argv[i] && argv[i] != channel) + kfree(argv[i]); + } + + kfree(channel); + + return retval; +} + +int suspend2_netlink_setup(struct user_helper_data *uhd) +{ + if (netlink_prepare(uhd) < 0) { + printk("Netlink prepare failed.\n"); + return 1; + } + + if (launch_userpace_program(uhd) < 0) { + printk("Launch userspace program failed.\n"); + suspend2_netlink_close(uhd); + return 1; + } + + /* Wait 2 seconds for the userspace process to make contact */ + wait_for_completion_timeout(&uhd->wait_for_process, 2*HZ); + + if (uhd->pid == -1) { + printk("%s: Failed to contact userspace process.\n", + uhd->name); + suspend2_netlink_close(uhd); + return 1; + } + + return 0; +} + +#else +#endif diff -urN oldtree/kernel/power/netlink.h newtree/kernel/power/netlink.h --- oldtree/kernel/power/netlink.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/netlink.h 2006-02-13 14:51:54.137935040 -0500 @@ -0,0 +1,43 @@ +/* + * netlink.h + * + * Declarations for functions for communicating with a userspace helper + * via netlink. + */ + +#include +#include + +#define NETLINK_MSG_BASE 0x10 + +#define NETLINK_MSG_READY 0x10 +#define NETLINK_MSG_NOFREEZE_ME 0x16 +#define NETLINK_MSG_GET_DEBUGGING 0x19 +#define NETLINK_MSG_CLEANUP 0x24 +#define NETLINK_MSG_NOFREEZE_ACK 0x27 +#define NETLINK_MSG_IS_DEBUGGING 0x28 + +struct user_helper_data { + int (*rcv_msg) (struct sk_buff *skb, struct nlmsghdr *nlh); + void (* not_ready) (void); + struct sock *nl; + u32 sock_seq; + pid_t pid; + char *comm; + char program[256]; + int pool_level; + int pool_limit; + struct sk_buff *emerg_skbs; + int skb_size; + int netlink_id; + char *name; + struct user_helper_data *next; + struct completion wait_for_process; + int interface_version; + int must_init; +}; + +void suspend2_send_netlink_message(struct user_helper_data *uhd, + int type, void* params, size_t len); +int suspend2_netlink_setup(struct user_helper_data *uhd); +void suspend2_netlink_close(struct user_helper_data *uhd); diff -urN oldtree/kernel/power/pagedir.c newtree/kernel/power/pagedir.c --- oldtree/kernel/power/pagedir.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/pagedir.c 2006-02-13 14:51:54.137935040 -0500 @@ -0,0 +1,370 @@ +/* + * kernel/power/pagedir.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for handling pagesets. + * Note that pbes aren't actually stored as such. They're stored as + * bitmaps and extents. + */ + +#include +#include +#include +#include + +#include "pageflags.h" +#include "ui.h" +#include "pagedir.h" + +int extra_pagedir_pages_allocated = 0; + +/* Not static so allocation routine can BUG if recursively called */ +dyn_pageflags_t conflicting_pages_map; + +#define PageConflicting(page) (test_dynpageflag(&conflicting_pages_map, page)) +#define SetPageConflicting(page) (set_dynpageflag(&conflicting_pages_map, page)) +#define ClearPageConflicting(page) (clear_dynpageflag(&conflicting_pages_map, page)) + +/* suspend2_free_extra_pagedir_memory + * + * Description: Free a previously pagedir metadata. + */ +void suspend2_free_extra_pagedir_memory(void) +{ + unsigned long pagenumber; + + free_dyn_pageflags(&pageset1_map); + free_dyn_pageflags(&pageset2_map); + free_dyn_pageflags(&pageset1_copy_map); + + /* Free allocated pages */ + if (allocd_pages_map) { + BITMAP_FOR_EACH_SET(allocd_pages_map, pagenumber) { + struct page *page = pfn_to_page(pagenumber); + ClearPageNosave(page); + __free_page(page); + extra_pagedir_pages_allocated--; + } + free_dyn_pageflags(&allocd_pages_map); + } +} + +/* suspend2_allocate_extra_pagedir_memory + * + * Description: Allocate memory for making the atomic copy of pagedir1 in the + * case where it is bigger than pagedir2. + * Arguments: struct pagedir *: The pagedir for which we should + * allocate memory. + * int: Size of pageset 1. + * int: Size of pageset 2. + * Result: int. Zero on success. One if unable to allocate enough memory. + */ +int suspend2_allocate_extra_pagedir_memory(struct pagedir *p, int pageset_size, + int alloc_from) +{ + int num_to_alloc = pageset_size - alloc_from - extra_pagedir_pages_allocated; + int j, order; + + if (num_to_alloc < 1) + num_to_alloc = 0; + + if (num_to_alloc) { + int num_added = 0; + + order = generic_fls(num_to_alloc); + if (order >= MAX_ORDER) + order = MAX_ORDER - 1; + + while (num_added < num_to_alloc) { + struct page *newpage; + unsigned long virt; + + while ((1 << order) > (num_to_alloc - num_added)) + order--; + + virt = __get_free_pages(GFP_ATOMIC | __GFP_NOWARN, order); + while ((!virt) && (order > 0)) { + order--; + virt = __get_free_pages(GFP_ATOMIC | __GFP_NOWARN, order); + } + + if (!virt) { + p->pageset_size += num_added; + return 1; + } + + newpage = virt_to_page(virt); + for (j = 0; j < (1 << order); j++) { + SetPageNosave(newpage + j); + /* Pages will be freed one at a time. */ + set_page_count(newpage + j, 1); + SetPageAllocd(newpage + j); + extra_pagedir_pages_allocated++; + } + num_added+= (1 << order); + } + } + + return 0; +} + +/* + * suspend2_mark_task_as_pageset1 + * Functionality : Marks all the pages belonging to a given process as + * pageset 1 pages. + * Called From : pagedir.c - mark_pages_for_pageset2 + * + */ +extern struct page *suspend2_follow_page(struct mm_struct *mm, unsigned long address); + +void suspend2_mark_task_as_pageset1(struct task_struct *t) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + + mm = t->active_mm; + + if (!mm || !mm->mmap) return; + + /* Don't try to take the sem when processes are frozen, + * drivers are suspended and irqs are disabled. We're + * not racing with anything anyway. */ + BUG_ON(in_atomic() && !irqs_disabled()); + + if (!irqs_disabled()) + down_read(&mm->mmap_sem); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + if (vma->vm_flags & VM_PFNMAP) + continue; + if (vma->vm_start) { + unsigned long posn; + for (posn = vma->vm_start; posn < vma->vm_end; + posn += PAGE_SIZE) { + struct page *page = + suspend2_follow_page(mm, posn); + if (page) + ClearPagePageset2(page); + } + } + } + + BUG_ON(in_atomic() && !irqs_disabled()); + + if (!irqs_disabled()) + up_read(&mm->mmap_sem); +} + +/* mark_pages_for_pageset2 + * + * Description: Mark unshared pages in processes not needed for suspend as + * being able to be written out in a separate pagedir. + * HighMem pages are simply marked as pageset2. They won't be + * needed during suspend. + */ + +struct attention_list { + struct task_struct *task; + struct attention_list *next; +}; + +#define HALT_ON(condition) \ + do { if (unlikely(condition)) { \ + printk("Suspend2: Halting at line %d. Please report to nigel@suspend2.net.\n", __LINE__); \ + while(1) \ + cpu_relax(); \ + } } while(0) + +void suspend2_mark_pages_for_pageset2(void) +{ + struct zone *zone; + struct task_struct *p; + struct attention_list *attention_list = NULL, *last = NULL; + unsigned long flags, i; + + HALT_ON(in_atomic() && !irqs_disabled()); + + clear_dyn_pageflags(pageset2_map); + + if (test_action_state(SUSPEND_NO_PAGESET2)) + return; + + /* + * Note that we don't clear the map to begin with! + * This is because if we eat memory, we loose track + * of LRU pages that are still in use but taken off + * the LRU. If I can figure out how the VM keeps + * track of them, I might be able to tweak this a + * little further and decrease pageset one's size + * further. + * + * (Memory grabbing clears the pageset2 flag on + * pages that are really freed!). + */ + + for_each_zone(zone) { + spin_lock_irqsave(&zone->lru_lock, flags); + if (zone->nr_inactive) { + struct page *page; + list_for_each_entry(page, &zone->inactive_list, lru) + SetPagePageset2(page); + } + if (zone->nr_active) { + struct page *page; + list_for_each_entry(page, &zone->active_list, lru) + SetPagePageset2(page); + } + spin_unlock_irqrestore(&zone->lru_lock, flags); + } + + HALT_ON(in_atomic() && !irqs_disabled()); + + /* Now we find all userspace process (with task->mm) marked PF_NOFREEZE + * and move them into pageset1. + */ + read_lock(&tasklist_lock); + for_each_process(p) + if ((p->mm || p->active_mm) && (p->flags & PF_NOFREEZE)) { + struct attention_list *this = kmalloc(sizeof(struct attention_list), GFP_ATOMIC); + BUG_ON(!this); + this->task = p; + this->next = NULL; + if (attention_list) { + last->next = this; + last = this; + } else + attention_list = last = this; + } + read_unlock(&tasklist_lock); + + HALT_ON(in_atomic() && !irqs_disabled()); + + /* Because the tasks in attention_list are ones related to suspending, + * we know that they won't go away under us. + */ + + while (attention_list) { + suspend2_mark_task_as_pageset1(attention_list->task); + last = attention_list; + attention_list = attention_list->next; + kfree(last); + } + + HALT_ON(in_atomic() && !irqs_disabled()); + + for_each_zone(zone) { + if (!zone->present_pages) + continue; + for (i = 0; i < zone->spanned_pages; i++) { + struct page *page = pfn_to_page(zone->zone_start_pfn + i); + BUG_ON(PagePageset2(page) && PageSlab(page)); + } + } + + HALT_ON(in_atomic() && !irqs_disabled()); + +} + +/* suspend2_get_nonconflicting_pages + * + * Description: Gets higher-order pages that won't be overwritten + * while copying the original pages. + * + * Note that if only one of the allocated pages overlaps + * with the pages that overlap, another set must be + * tried. Therefore, you shouldn't use this function + * much, and not with high orders. + */ + +unsigned long suspend2_get_nonconflicting_pages(const int order) +{ + struct page *page; + unsigned long new_page, i; + int more = 0; + + do { + new_page = __get_free_pages(GFP_ATOMIC | __GFP_NOWARN, order); + if (!new_page) + return 0; + page = virt_to_page(new_page); + more = 0; + for (i = 0; i < (1UL << order); i++) { + if (PagePageset1(page + i)) { + more = 1; + break; + } + } + if (more) { + for (i = 0; i < (1UL << order); i++) + if (PagePageset1(page + i)) + SetPageConflicting(page + i); + else { + set_page_count(page + i, 1); + __free_pages(page + i, 0); + } + } + } + while (more); + + memset((void*)new_page, 0, PAGE_SIZE * (1< + * + * This file is released under the GPLv2. + * + * Declarations for routines for handling pagesets. + */ + +/* Pagedir + * + * Contains the metadata for a set of pages saved in the image. + */ + +struct pagedir { + int pageset_size; + int lastpageset_size; +}; + +extern struct pagedir pagedir1, pagedir2; + +extern void suspend2_copy_pageset1(void); + +extern void suspend2_free_extra_pagedir_memory(void); + +extern int suspend2_allocate_extra_pagedir_memory(struct pagedir *p, int pageset_size, int alloc_from); + +extern void suspend2_mark_task_as_pageset1 (struct task_struct *t); +extern void suspend2_mark_pages_for_pageset2(void); + +extern void suspend2_relocate_if_required(unsigned long *current_value, unsigned int size); +extern int suspend2_get_pageset1_load_addresses(void); + +extern int extra_pagedir_pages_allocated; + +extern unsigned long suspend2_get_nonconflicting_pages(int order); diff -urN oldtree/kernel/power/pageflags.c newtree/kernel/power/pageflags.c --- oldtree/kernel/power/pageflags.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/pageflags.c 2006-02-13 14:51:54.138934888 -0500 @@ -0,0 +1,139 @@ +/* + * kernel/power/suspend2_core/pageflags.c + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for dynamically allocating and releasing bitmaps + * used as pseudo-pageflags. + * + * Arrays are not contiguous. The first sizeof(void *) bytes are + * the pointer to the next page in the bitmap. This allows us to + * 1) work under low memory conditions where order 0 might be all + * that's available + * 2) save the pages at suspend time, reload and relocate them as + * necessary at resume time without breaking anything (cf + * extent pages). + */ + +#include +#include +#include +#include +#include +#include +#include "pageflags.h" +#include "plugins.h" +#include "pagedir.h" + +/* Maps used in copying the image back are in builtin.c */ +dyn_pageflags_t pageset1_map; +dyn_pageflags_t pageset1_copy_map; +dyn_pageflags_t pageset2_map; +dyn_pageflags_t in_use_map; +dyn_pageflags_t allocd_pages_map; +#ifdef CONFIG_DEBUG_PAGEALLOC +dyn_pageflags_t unmap_map; +#endif +dyn_pageflags_t checksum_map; + +static int pages_for_zone(struct zone *zone) +{ + return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) / + (PAGE_SIZE << 3); +} + +/* save_dyn_pageflags + * + * Description: Save a set of pageflags. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being saved. + */ + +void save_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i, zone_num = 0; + struct zone *zone; + + if (!*pagemap) + return; + + for_each_zone(zone) { + int size = pages_for_zone(zone); + active_writer->ops.writer.write_header_chunk((char *) &zone_num, sizeof(int)); + active_writer->ops.writer.write_header_chunk((char *) &size, sizeof(int)); + + for (i = 0; i < size; i++) + active_writer->ops.writer.write_header_chunk((char *) pagemap[zone_num][i], PAGE_SIZE); + zone_num++; + } + zone_num = -1; + active_writer->ops.writer.write_header_chunk((char *) &zone_num, sizeof(int)); +} + +/* load_dyn_pageflags + * + * Description: Load a set of pageflags. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being loaded. + * (It must be allocated before calling this routine). + */ + +void load_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i, zone_num = 0, zone_check = 0; + struct zone *zone; + + if (!pagemap) + return; + + for_each_zone(zone) { + int size = 0; + active_writer->ops.writer.read_header_chunk((char *) &zone_check, sizeof(int)); + if (zone_check != zone_num) { + printk("Zone check (%d) != zone_num (%d).\n", zone_check, zone_num); + BUG(); + } + active_writer->ops.writer.read_header_chunk((char *) &size, sizeof(int)); + + for (i = 0; i < size; i++) + active_writer->ops.writer.read_header_chunk((char *) pagemap[zone_num][i], PAGE_SIZE); + zone_num++; + } + active_writer->ops.writer.read_header_chunk((char *) &zone_check, sizeof(int)); + if (zone_check != -1) { + printk("Didn't read end of dyn pageflag data marker.(%x)\n", zone_check); + BUG(); + } +} + +/* relocate_dyn_pageflags + * + * Description: Relocate a set of pageflags to ensure they don't collide with + * pageset 1 data which will get overwritten on copyback. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being relocated. + */ + +extern int num_zones(void); + +void relocate_dyn_pageflags(dyn_pageflags_t *pagemap) +{ + int i, zone_num = 0; + struct zone *zone; + + if (!*pagemap) + return; + + suspend2_relocate_if_required((void *) pagemap, sizeof (void *) * num_zones()); + + for_each_zone(zone) { + int pages = (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >> + (PAGE_SHIFT + 3); + + suspend2_relocate_if_required((void *) &((*pagemap)[zone_num]), sizeof(void *) * pages); + + for (i = 0; i < pages; i++) + suspend2_relocate_if_required((void *) &((*pagemap)[zone_num][i]), + PAGE_SIZE); + zone_num++; + } +} diff -urN oldtree/kernel/power/pageflags.h newtree/kernel/power/pageflags.h --- oldtree/kernel/power/pageflags.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/pageflags.h 2006-02-13 14:51:54.138934888 -0500 @@ -0,0 +1,86 @@ +/* + * kernel/power/pageflags.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Suspend2 needs a few pageflags while working that aren't otherwise + * used. To save the struct page pageflags, we dynamically allocate + * a bitmap and use that. These are the only non order-0 allocations + * we do. + * + * NOTE!!! + * We assume that PAGE_SIZE - sizeof(void *) is a multiple of + * sizeof(unsigned long). Is this ever false? + */ + +#include +#include + +extern dyn_pageflags_t in_use_map; +extern dyn_pageflags_t allocd_pages_map; +#ifdef CONFIG_DEBUG_PAGEALLOC +extern dyn_pageflags_t unmap_map; +#endif +extern dyn_pageflags_t pageset2_map; +extern dyn_pageflags_t conflicting_pages_map; +extern dyn_pageflags_t checksum_map; + +/* + * inusemap is used in two ways: + * - During suspend, to tag pages which are not used (to speed up + * count_data_pages); + * - During resume, to tag pages which are in pagedir1. This does not tag + * pagedir2 pages, so !== first use. + */ + +#define PageInUse(page) (test_dynpageflag(&in_use_map, page)) +#define SetPageInUse(page) (set_dynpageflag(&in_use_map, page)) +#define ClearPageInUse(page) (clear_dynpageflag(&in_use_map, page)) + +#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page)) +#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page)) +#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page)) + +#define PagePageset1Copy(page) (test_dynpageflag(&pageset1_copy_map, page)) +#define SetPagePageset1Copy(page) (set_dynpageflag(&pageset1_copy_map, page)) +#define ClearPagePageset1Copy(page) (clear_dynpageflag(&pageset1_copy_map, page)) + +#define PagePageset2(page) (test_dynpageflag(&pageset2_map, page)) +#define SetPagePageset2(page) (set_dynpageflag(&pageset2_map, page)) +#define ClearPagePageset2(page) (clear_dynpageflag(&pageset2_map, page)) + +#define PageAllocd(page) (test_dynpageflag(&allocd_pages_map, page)) +#define SetPageAllocd(page) (set_dynpageflag(&allocd_pages_map, page)) +#define ClearPageAllocd(page) (clear_dynpageflag(&allocd_pages_map, page)) + +#ifdef CONFIG_DEBUG_PAGEALLOC +#define PagePageUnmap(page) (test_dynpageflag(&unmap_map, page)) +#define SetPagePageUnmap(page) (set_dynpageflag(&unmap_map, page)) +#define ClearPagePageUnmap(page) (clear_dynpageflag(&unmap_map, page)) +#endif + +static inline int PageChecksumIgnore(struct page *page) +{ + return checksum_map ? + test_dynpageflag(&checksum_map, page) : + 0; +} + +static inline void SetPageChecksumIgnore(struct page *page) +{ + if (checksum_map) + set_dynpageflag(&checksum_map, page); +}; + +static inline void ClearPageChecksumIgnore(struct page *page) +{ + if (checksum_map) + clear_dynpageflag(&checksum_map, page); +}; + +extern void save_dyn_pageflags(dyn_pageflags_t pagemap); +extern void load_dyn_pageflags(dyn_pageflags_t pagemap); +void relocate_dyn_pageflags(dyn_pageflags_t *pagemap); + diff -urN oldtree/kernel/power/plugins.c newtree/kernel/power/plugins.c --- oldtree/kernel/power/plugins.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/plugins.c 2006-02-13 14:51:54.138934888 -0500 @@ -0,0 +1,312 @@ +/* + * kernel/power/plugins.c + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + */ + +#include +#include +#include "suspend2.h" +#include "plugins.h" + +struct list_head suspend_filters, suspend_writers, suspend_plugins; +struct suspend_plugin_ops *active_writer = NULL; +static int num_filters = 0, num_ui = 0; +int num_writers = 0, num_plugins = 0; + +/* + * header_storage_for_plugins + * + * Returns the amount of space needed to store configuration + * data needed by the plugins prior to copying back the original + * kernel. We can exclude data for pageset2 because it will be + * available anyway once the kernel is copied back. + */ +unsigned long header_storage_for_plugins(void) +{ + struct suspend_plugin_ops *this_plugin; + unsigned long bytes = 0; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if (this_plugin->storage_needed) + bytes += this_plugin->storage_needed(); + } + + return bytes; +} + +/* + * memory_for_plugins + * + * Returns the amount of memory requested by plugins for + * doing their work during the cycle. + */ + +unsigned long memory_for_plugins(void) +{ + unsigned long bytes = 0; + struct suspend_plugin_ops *this_plugin; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if (this_plugin->memory_needed) + bytes += this_plugin->memory_needed(); + } + + return ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT); +} + +/* find_plugin_given_name + * Functionality : Return a plugin (if found), given a pointer + * to its name + */ + +struct suspend_plugin_ops *find_plugin_given_name(char *name) +{ + struct suspend_plugin_ops *this_plugin, *found_plugin = NULL; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (!strcmp(name, this_plugin->name)) { + found_plugin = this_plugin; + break; + } + } + + return found_plugin; +} + +/* + * print_plugin_debug_info + * Functionality : Get debugging info from plugins into a buffer. + */ +int print_plugin_debug_info(char *buffer, int buffer_size) +{ + struct suspend_plugin_ops *this_plugin; + int len = 0; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if (this_plugin->print_debug_info) { + int result; + result = this_plugin->print_debug_info(buffer + len, + buffer_size - len); + len += result; + } + } + + return len; +} + +/* + * suspend_register_plugin + * + * Register a plugin. + */ +int suspend_register_plugin(struct suspend_plugin_ops *plugin) +{ + if (find_plugin_given_name(plugin->name)) + return -EBUSY; + + switch (plugin->type) { + case FILTER_PLUGIN: + list_add_tail(&plugin->ops.filter.filter_list, + &suspend_filters); + num_filters++; + break; + + case WRITER_PLUGIN: + list_add_tail(&plugin->ops.writer.writer_list, + &suspend_writers); + num_writers++; + break; + + case MISC_PLUGIN: + break; + + default: + printk("Hmmm. Plugin '%s' has an invalid type." + " It has been ignored.\n", plugin->name); + return -EINVAL; + } + list_add_tail(&plugin->plugin_list, &suspend_plugins); + num_plugins++; + + return 0; +} + +/* + * suspend_unregister_plugin + * + * Remove a plugin. + */ +void suspend_unregister_plugin(struct suspend_plugin_ops *plugin) +{ + switch (plugin->type) { + case FILTER_PLUGIN: + list_del(&plugin->ops.filter.filter_list); + num_filters--; + break; + + case WRITER_PLUGIN: + list_del(&plugin->ops.writer.writer_list); + num_writers--; + if (active_writer == plugin) { + active_writer = NULL; + set_suspend_state(SUSPEND_DISABLED); + } + break; + + case MISC_PLUGIN: + break; + + default: + printk("Hmmm. Plugin '%s' has an invalid type." + " It has been ignored.\n", plugin->name); + return; + } + list_del(&plugin->plugin_list); + num_plugins--; +} + +/* + * suspend_move_plugin_tail + * + * Rearrange plugins when reloading the config. + */ +void suspend_move_plugin_tail(struct suspend_plugin_ops *plugin) +{ + switch (plugin->type) { + case FILTER_PLUGIN: + if (num_filters > 1) + list_move_tail(&plugin->ops.filter.filter_list, + &suspend_filters); + break; + + case WRITER_PLUGIN: + if (num_writers > 1) + list_move_tail(&plugin->ops.writer.writer_list, + &suspend_writers); + break; + + case MISC_PLUGIN: + break; + default: + printk("Hmmm. Plugin '%s' has an invalid type." + " It has been ignored.\n", plugin->name); + return; + } + if ((num_filters + num_writers + num_ui) > 1) + list_move_tail(&plugin->plugin_list, &suspend_plugins); +} + +/* + * suspend2_initialise_plugins + * + * Get ready to do some work! + */ +int suspend2_initialise_plugins(int starting_cycle) +{ + struct suspend_plugin_ops *this_plugin; + int result; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if (this_plugin->initialise) { + suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1, + "Initialising plugin %s.\n", + this_plugin->name); + if ((result = this_plugin->initialise(starting_cycle))) { + printk("%s didn't initialise okay.\n", + this_plugin->name); + return result; + } + } + } + + return 0; +} + +/* + * suspend2_cleanup_plugins + * + * Tell plugins the work is done. + */ +void suspend2_cleanup_plugins(int finishing_cycle) +{ + struct suspend_plugin_ops *this_plugin; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (this_plugin->disabled) + continue; + if (this_plugin->cleanup) { + suspend_message(SUSPEND_MEMORY, SUSPEND_MEDIUM, 1, + "Cleaning up plugin %s.\n", + this_plugin->name); + this_plugin->cleanup(finishing_cycle); + } + } +} + +/* + * get_next_filter + * + * Get the next filter in the pipeline. + */ +struct suspend_plugin_ops *get_next_filter(struct suspend_plugin_ops *filter_sought) +{ + struct suspend_plugin_ops *last_filter = NULL, *this_filter = NULL; + + list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) { + if (this_filter->disabled) + continue; + if ((last_filter == filter_sought) || (!filter_sought)) + return this_filter; + last_filter = this_filter; + } + + return active_writer; +} + +/* suspend2_get_modules + * + * Take a reference to modules so they can't go away under us. + */ + +int suspend2_get_modules(void) +{ + struct suspend_plugin_ops *this_plugin; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + if (!try_module_get(this_plugin->module)) { + /* Failed! Reverse gets and return error */ + struct suspend_plugin_ops *this_plugin2; + list_for_each_entry(this_plugin2, &suspend_plugins, plugin_list) { + if (this_plugin == this_plugin2) + return -EINVAL; + module_put(this_plugin2->module); + } + } + } + + return 0; +} + +/* suspend2_put_modules + * + * Release our references to modules we used. + */ + +void suspend2_put_modules(void) +{ + struct suspend_plugin_ops *this_plugin; + + list_for_each_entry(this_plugin, &suspend_plugins, plugin_list) { + module_put(this_plugin->module); + } +} diff -urN oldtree/kernel/power/plugins.h newtree/kernel/power/plugins.h --- oldtree/kernel/power/plugins.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/plugins.h 2006-02-13 14:51:54.139934736 -0500 @@ -0,0 +1,179 @@ +/* + * kernel/power/plugin.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It contains declarations for plugins. Plugins are additions to + * suspend2 that provide facilities such as image compression or + * encryption, backends for storage of the image and user interfaces. + * + */ + +/* This is the maximum size we store in the image header for a plugin name */ +#define SUSPEND_MAX_PLUGIN_NAME_LENGTH 30 + +/* Per-plugin metadata */ +struct plugin_header { + char name[SUSPEND_MAX_PLUGIN_NAME_LENGTH]; + int disabled; + int type; + int index; + int data_length; + unsigned long signature; +}; + +extern int num_plugins, num_writers; + +enum { + FILTER_PLUGIN, + WRITER_PLUGIN, + MISC_PLUGIN, // Block writer, eg. + CHECKSUM_PLUGIN +}; + +enum { + SUSPEND_ASYNC, + SUSPEND_SYNC +}; + +struct suspend_filter_ops { + /* Writing the image proper */ + int (*write_chunk) (struct page *buffer_page); + + /* Reading the image proper */ + int (*read_chunk) (struct page *buffer_page, int sync); + + /* Reset plugin if image exists but reading aborted */ + void (*noresume_reset) (void); + struct list_head filter_list; +}; + +struct suspend_writer_ops { + + /* Writing the image proper */ + int (*write_chunk) (struct page *buffer_page); + + /* Reading the image proper */ + int (*read_chunk) (struct page *buffer_page, int sync); + + /* Reset plugin if image exists but reading aborted */ + void (*noresume_reset) (void); + + /* Calls for allocating storage */ + + int (*storage_available) (void); // Maximum size of image we can save + // (incl. space already allocated). + + int (*storage_allocated) (void); + // Amount of storage already allocated + int (*release_storage) (void); + + /* + * Header space is allocated separately. Note that allocation + * of space for the header might result in allocated space + * being stolen from the main pool if there is no unallocated + * space. We have to be able to allocate enough space for + * the header. We can eat memory to ensure there is enough + * for the main pool. + */ + int (*allocate_header_space) (int space_requested); + int (*allocate_storage) (int space_requested); + + /* Read and write the metadata */ + int (*write_header_init) (void); + int (*write_header_chunk) (char *buffer_start, int buffer_size); + int (*write_header_cleanup) (void); + + int (*read_header_init) (void); + int (*read_header_chunk) (char *buffer_start, int buffer_size); + int (*read_header_cleanup) (void); + + /* Prepare metadata to be saved (relativise/absolutise extents) */ + int (*serialise_extents) (void); + int (*load_extents) (void); + + /* Attempt to parse an image location */ + int (*parse_sig_location) (char *buffer, int only_writer); + + /* Determine whether image exists that we can restore */ + int (*image_exists) (void); + + /* Mark the image as having tried to resume */ + void (*mark_resume_attempted) (void); + + /* Destroy image if one exists */ + int (*invalidate_image) (void); + + /* Wait on I/O */ + int (*wait_on_io) (int flush_all); + + struct list_head writer_list; +}; + +struct suspend_plugin_ops { + /* Functions common to all plugins */ + int type; + char *name; + struct module *module; + int disabled; + struct list_head plugin_list; + + /* Bytes! */ + unsigned long (*memory_needed) (void); + unsigned long (*storage_needed) (void); + + int (*print_debug_info) (char *buffer, int size); + int (*save_config_info) (char *buffer); + void (*load_config_info) (char *buffer, int len); + + /* Initialise & cleanup - general routines called + * at the start and end of a cycle. */ + int (*initialise) (int starting_cycle); + void (*cleanup) (int finishing_cycle); + + int (*write_init) (int stream_number); + int (*write_cleanup) (void); + + int (*read_init) (int stream_number); + int (*read_cleanup) (void); + + union { + struct suspend_filter_ops filter; + struct suspend_writer_ops writer; + } ops; +}; + +extern struct suspend_plugin_ops *active_writer; +extern struct list_head suspend_filters, suspend_writers, suspend_plugins; + +extern void prepare_console_plugins(void); +extern void cleanup_console_plugins(void); + +extern struct suspend_plugin_ops *find_plugin_given_name(char *name); +extern struct suspend_plugin_ops *get_next_filter(struct suspend_plugin_ops *); + +extern int suspend_register_plugin(struct suspend_plugin_ops *plugin); +extern void suspend_move_plugin_tail(struct suspend_plugin_ops *plugin); + +extern unsigned long header_storage_for_plugins(void); +extern unsigned long memory_for_plugins(void); + +extern int print_plugin_debug_info(char *buffer, int buffer_size); +extern int suspend_register_plugin(struct suspend_plugin_ops *plugin); +extern void suspend_unregister_plugin(struct suspend_plugin_ops *plugin); + +extern int suspend2_initialise_plugins(int starting_cycle); +extern void suspend2_cleanup_plugins(int finishing_cycle); + +int suspend2_get_modules(void); +void suspend2_put_modules(void); + +static inline void suspend_initialise_plugin_lists(void) { + INIT_LIST_HEAD(&suspend_filters); + INIT_LIST_HEAD(&suspend_writers); + INIT_LIST_HEAD(&suspend_plugins); +} + +extern int expected_compression_ratio(void); diff -urN oldtree/kernel/power/power.h newtree/kernel/power/power.h --- oldtree/kernel/power/power.h 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/power.h 2006-02-13 14:51:54.139934736 -0500 @@ -1,6 +1,8 @@ #include #include +#include "suspend.h" + /* With SUSPEND_CONSOLE defined suspend looks *really* cool, but we probably do not take enough locks for switching consoles, etc, so bad things might happen. @@ -48,15 +50,12 @@ extern struct subsystem power_subsys; -extern int freeze_processes(void); -extern void thaw_processes(void); - extern int pm_prepare_console(void); extern void pm_restore_console(void); /* References to section boundaries */ -extern const void __nosave_begin, __nosave_end; +//extern const void __nosave_begin, __nosave_end; extern unsigned int nr_copy_pages; extern suspend_pagedir_t *pagedir_nosave; diff -urN oldtree/kernel/power/power_off.c newtree/kernel/power/power_off.c --- oldtree/kernel/power/power_off.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/power_off.c 2006-02-13 14:51:54.139934736 -0500 @@ -0,0 +1,79 @@ +/* + * kernel/power/suspend2_core/power_off.c + * + * Copyright (C) 2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Support for powering down. + */ + +#include +#include +#include +#include +#include +#include "suspend2_common.h" +#include "suspend2.h" +#include "ui.h" + +unsigned long suspend2_powerdown_method = 0; /* 0 - Kernel power off */ + +extern struct pm_ops *pm_ops; + +/* Use suspend_enter from main.c */ +extern int suspend_enter(suspend_state_t state); + +int try_pm_state_powerdown(void) +{ + if (pm_ops && pm_ops->prepare && suspend2_powerdown_method && + pm_ops->prepare(suspend2_powerdown_method)) + return 0; + + if (suspend2_powerdown_method > 3) + kernel_power_off_prepare(); + else { + if (device_suspend(PMSG_SUSPEND)) { + printk(KERN_ERR "Some devices failed to suspend\n"); + return 0; + } + } + + if (suspend_enter(suspend2_powerdown_method)) + return 0; + + device_resume(); + + if (pm_ops && pm_ops->finish && suspend2_powerdown_method) + pm_ops->finish(suspend2_powerdown_method); + + return 1; +} + +/* + * suspend_power_down + * Functionality : Powers down or reboots the computer once the image + * has been written to disk. + * Key Assumptions : Able to reboot/power down via code called or that + * the warning emitted if the calls fail will be visible + * to the user (ie printk resumes devices). + * Called From : do_suspend2_suspend_2 + */ + +void suspend_power_down(void) +{ + if (test_action_state(SUSPEND_REBOOT)) { + suspend2_prepare_status(DONT_CLEAR_BAR, "Ready to reboot."); + kernel_restart(NULL); + } + + if (pm_ops && pm_ops->enter && suspend2_powerdown_method && try_pm_state_powerdown()) + return; + + lock_kernel(); + kernel_power_off(); + suspend2_prepare_status(DONT_CLEAR_BAR, "Powerdown failed"); + while (1) + cpu_relax(); +} + diff -urN oldtree/kernel/power/power_off.h newtree/kernel/power/power_off.h --- oldtree/kernel/power/power_off.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/power_off.h 2006-02-13 14:51:54.139934736 -0500 @@ -0,0 +1,13 @@ +/* + * kernel/power/suspend2_core/power_off.h + * + * Copyright (C) 2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Support for the powering down. + */ + +int suspend_pm_state_finish(void); +void suspend_power_down(void); +extern unsigned long suspend2_powerdown_method; diff -urN oldtree/kernel/power/prepare_image.c newtree/kernel/power/prepare_image.c --- oldtree/kernel/power/prepare_image.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/prepare_image.c 2006-02-13 14:51:54.203925008 -0500 @@ -0,0 +1,784 @@ +/* + * kernel/power/prepare_image.c + * + * Copyright (C) 2003-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * We need to eat memory until we can: + * 1. Perform the save without changing anything (RAM_NEEDED < max_pfn) + * 2. Fit it all in available space (active_writer->available_space() >= + * storage_needed()) + * 3. Reload the pagedir and pageset1 to places that don't collide with their + * final destinations, not knowing to what extent the resumed kernel will + * overlap with the one loaded at boot time. I think the resumed kernel + * should overlap completely, but I don't want to rely on this as it is + * an unproven assumption. We therefore assume there will be no overlap at + * all (worse case). + * 4. Meet the user's requested limit (if any) on the size of the image. + * The limit is in MB, so pages/256 (assuming 4K pages). + * + */ + +#include +#include + +#include "suspend2.h" +#include "pageflags.h" +#include "plugins.h" +#include "suspend2_common.h" +#include "io.h" +#include "ui.h" +#include "extent.h" +#include "prepare_image.h" +#include "checksum.h" + +static int are_frozen = 0, num_nosave = 0; +static int header_space_allocated = 0; +static int storage_allocated = 0; +static int storage_available = 0; +int extra_pd1_pages_allowance = 100; + +static int num_pcp_pages(void) +{ + struct zone *zone; + int result = 0, i = 0; + + /* PCP lists */ + for_each_zone(zone) { + struct per_cpu_pageset *pset; + int cpu; + + if (!zone->present_pages) + continue; + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + if (!cpu_possible(cpu)) + continue; + + pset = zone_pcp(zone, cpu); + + for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { + struct per_cpu_pages *pcp; + + pcp = &(pset->pcp[i]); + result += pcp->count; + } + } + } + return result; +} + +int real_nr_free_pages(void) +{ + return nr_free_pages() + num_pcp_pages(); +} + +static void get_extra_pd1_allowance(void) +{ + int orig_num_free = real_nr_free_pages(), final; + + suspend2_prepare_status(CLEAR_BAR, "Finding allowance for drivers."); + device_suspend(PMSG_FREEZE); + local_irq_disable(); /* irqs might have been re-enabled on us */ + device_power_down(PMSG_FREEZE); + + final = real_nr_free_pages(); + + device_power_up(); + local_irq_enable(); + + device_resume(); + + extra_pd1_pages_allowance = orig_num_free - final + 100; +} + +static int main_storage_needed(int use_ecr, + int ignore_extra_pd1_allow) +{ + return ((pagedir1.pageset_size + pagedir2.pageset_size + + (ignore_extra_pd1_allow ? 0 : extra_pd1_pages_allowance)) * + (use_ecr ? suspend2_expected_compression_ratio() : 100) / 100); +} + +static int header_storage_needed(void) +{ + unsigned long bytes = ((extents_allocated * 2 * sizeof(unsigned long)) + + sizeof(struct suspend_header) + + sizeof(struct plugin_header) + + (int) header_storage_for_plugins() + + (dyn_pageflags_pages_per_bitmap() << PAGE_SHIFT) + + num_plugins * + (sizeof(struct plugin_header) + sizeof(int))); + + return ((int) ((bytes + (int) PAGE_SIZE - 1) >> PAGE_SHIFT)); +} + +static void display_stats(int always, int sub_extra_pd1_allow) +{ + unsigned long storage_allocated = active_writer->ops.writer.storage_allocated(); + char buffer[255]; + snprintf(buffer, 254, + "Free:%d(%d). Sets:%d(%d),%d(%d). Header:%d. Nosave:%d-%d=%d. Storage:%lu/%u(%u). Needed:%d|%d|%d.\n", + + /* Free */ + nr_free_pages(), + nr_free_pages() - nr_free_highpages(), + + /* Sets */ + pagedir1.pageset_size, pageset1_sizelow, + pagedir2.pageset_size, pageset2_sizelow, + + /* Header */ + header_storage_needed(), + + /* Nosave */ + num_nosave, extra_pagedir_pages_allocated, + num_nosave - extra_pagedir_pages_allocated, + + /* Storage - converted to pages for comparison */ + storage_allocated, + storage_needed(1, sub_extra_pd1_allow), + storage_available, + + /* Needed */ + ram_to_suspend() - nr_free_pages() - nr_free_highpages(), + storage_needed(1, sub_extra_pd1_allow) - storage_available, + (image_size_limit > 0) ? (storage_needed(1, sub_extra_pd1_allow) - (image_size_limit << 8)) : 0); + if (always) + printk(buffer); + else + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 1, buffer); +} + +/* generate_free_page_map + * + * Description: This routine generates a bitmap of free pages from the + * lists used by the memory manager. We then use the bitmap + * to quickly calculate which pages to save and in which + * pagesets. + */ +static void generate_free_page_map(void) +{ + int i, order, loop, cpu; + struct page *page; + unsigned long flags; + struct zone *zone; + struct per_cpu_pageset *pset; + + for_each_zone(zone) { + if (!zone->present_pages) + continue; + for(i=0; i < zone->spanned_pages; i++) + SetPageInUse(pfn_to_page(zone->zone_start_pfn + i)); + } + + for_each_zone(zone) { + if (!zone->present_pages) + continue; + spin_lock_irqsave(&zone->lock, flags); + for (order = MAX_ORDER - 1; order >= 0; --order) { + list_for_each_entry(page, &zone->free_area[order].free_list, lru) + for(loop=0; loop < (1 << order); loop++) { + ClearPageInUse(page+loop); + ClearPagePageset2(page+loop); + } + } + + + for (cpu = 0; cpu < NR_CPUS; cpu++) { + if (!cpu_possible(cpu)) + continue; + + pset = zone_pcp(zone, cpu); + + for (i = 0; i < ARRAY_SIZE(pset->pcp); i++) { + struct per_cpu_pages *pcp; + struct page *page; + + pcp = &pset->pcp[i]; + list_for_each_entry(page, &pcp->list, lru) { + ClearPageInUse(page); + ClearPagePageset2(page); + } + } + } + + spin_unlock_irqrestore(&zone->lock, flags); + } +} + +/* size_of_free_region + * + * Description: Return the number of pages that are free, beginning with and + * including this one. + */ +static int size_of_free_region(struct page *page) +{ + struct zone *zone = page_zone(page); + struct page *posn = page, *last_in_zone = + zone->zone_mem_map + zone->spanned_pages - 1; + + while (posn < last_in_zone && !PageInUse(posn)) + posn++; + return (posn - page); +} + +static struct page *rodata_start, *rodata_end; +static struct page *rotext_start, *rotext_end; +static struct page *nosave_start, *nosave_end; +static struct page *rtas_start, *rtas_end; +#ifdef CONFIG_PPC_RTAS +extern char __start_rodata, __end_rodata; +extern unsigned int rtas_data, rtas_size; +#endif +#ifdef CONFIG_PPC +extern char _etext[]; +#else +extern char _text[], _etext[]; +#endif + +#ifdef CONFIG_X86_32 /* 2.6.15 and later */ +extern int bad_ppro; + +/* + * Copied from arch/i386/mm/init.c. It should be moved to + * an include file after testing. + */ +static inline int page_kills_ppro(unsigned long pagenr) +{ + if (pagenr >= 0x70000 && pagenr <= 0x7003F) + return 1; + return 0; +} + +#else +#define bad_ppro (0) +#define page_kills_ppro(pfn) (0) +#endif + +static __init int page_nosave_init(void) +{ +#ifdef CONFIG_DEBUG_RODATA + rodata_start = virt_to_page(&__start_rodata); + rodata_end = virt_to_page(&__end_rodata); +#endif +#ifdef CONFIG_PPC + rotext_start = virt_to_page(PAGE_OFFSET); +#else + rotext_start = virt_to_page(&_text); +#endif + rotext_end = virt_to_page(&_etext); + + nosave_start = virt_to_page(&__nosave_begin); + nosave_end = virt_to_page(((char *) &__nosave_end) - 1); + +#ifdef CONFIG_PPC_RTAS + rtas_start = virt_to_page(__va(rtas_data)); + rtas_end = virt_to_page(__va(rtas_data) + rtas_size); +#endif + return 0; +} + +subsys_initcall(page_nosave_init); + +static int Suspend2PageNosave(int pfn) +{ + struct page *page = pfn_to_page(pfn); + + return ( +#ifdef CONFIG_DEBUG_RODATA + (page >= rodata_start && page <= rodata_end) || +#endif +#ifdef CONFIG_DEBUG_ROTEXT + (page >= rotext_start && page <= rotext_end) || +#endif + (page >= nosave_start && page <= nosave_end) || +#ifdef CONFIG_PPC_RTAS + (page >= rtas_start && page <= rtas_end) || +#endif + PageAllocd(page) || +#ifdef PPC_PMAC + (agp_special_page && page = virt_to_page(agp_special_page)) || +#endif + !pfn_valid(pfn) || + (bad_ppro && page_kills_ppro(pfn)) || + (PageReserved(page) && PageHighMem(page)) || + (checksum_map && PageChecksumIgnore(page)) || + !page_is_ram(pfn)); +} + +/* count_data_pages + * + * This routine generates our lists of pages to be stored in each + * pageset. Since we store the data using extents, and adding new + * extents might allocate a new extent page, this routine may well + * be called more than once. + */ +static struct pageset_sizes_result count_data_pages(void) +{ + int chunk_size, loop, num_free = 0; + int use_pagedir2; + struct pageset_sizes_result result; + struct zone *zone; + + result.size1 = 0; + result.size1low = 0; + result.size2 = 0; + result.size2low = 0; + + num_nosave = 0; + + clear_dyn_pageflags(pageset1_map); + clear_dyn_pageflags(pageset1_copy_map); + + generate_free_page_map(); + + if (test_result_state(SUSPEND_ABORTED)) + return result; + + /* + * Pages not to be saved are marked Nosave irrespective of being reserved + */ + for_each_zone(zone) { + for (loop = 0; loop < zone->spanned_pages; loop++) { + int pfn = zone->zone_start_pfn + loop; + struct page *page = pfn_to_page(pfn); + + int new_nosave = + Suspend2PageNosave(pfn); + +#if 0 + int old_nosave = (PageNosave(page) || + !pfn_valid(pfn) || + !page_is_ram(pfn) || + (page >= nosave_start && page <= nosave_end) || + (PageReserved(page) && PageHighMem(page))); + /* Complain loudly so that Nigel hears about it. */ + if (old_nosave != new_nosave) { + suspend2_prepare_status(0, + "Page %d oldnosave(%d) [%d|%d|%d|%d] != newnosave(%d) " + "[%d|%d|%d|%d|%d|%d|%d|%d|%d]. Page is ram=%d.\n", + pfn, + old_nosave, + PageNosave(page), + !pfn_valid(pfn), + (page >= nosave_start && page <= nosave_end), + (PageReserved(page) && PageHighMem(page)), + new_nosave, + !pfn_valid(pfn), + (bad_ppro && page_kills_ppro(pfn)), + (page >= rodata_start && page <= rodata_end), + (page >= rotext_start && page <= rotext_end), + (page >= nosave_start && page <= nosave_end), + (page >= rtas_start && page <= rtas_end), + PageAllocd(page), + (PageReserved(page) && PageHighMem(page)), + (checksum_map && PageChecksumIgnore(page)), + page_is_ram(pfn)); + } +#endif + + if (Suspend2PageNosave(pfn)) { + num_nosave++; + continue; + } + + + if ((chunk_size=size_of_free_region(page))!=0) { + num_free += chunk_size; + loop += chunk_size - 1; + continue; + } + + use_pagedir2 = PagePageset2(page); + + if (use_pagedir2) { + result.size2++; + if (!PageHighMem(page)) + result.size2low++; + SetPagePageset1Copy(page); + } else { + result.size1++; + SetPagePageset1(page); + if (!PageHighMem(page)) + result.size1low++; + } + } + } + + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_MEDIUM, 0, + "Count data pages: Set1 (%d) + Set2 (%d) + Nosave (%d) + NumFree (%d) = %d.\n", + result.size1, result.size2, num_nosave, num_free, + result.size1 + result.size2 + num_nosave + num_free); + BITMAP_FOR_EACH_SET(allocd_pages_map, loop) + SetPagePageset1Copy(pfn_to_page(loop)); + return result; +} + +/* amount_needed + * + * Calculates the amount by which the image size needs to be reduced to meet + * our constraints. + */ +static int amount_needed(int use_image_size_limit) +{ + + int max1 = max( (int) (ram_to_suspend() - real_nr_free_pages() - + nr_free_highpages()), + ((int) (storage_needed(1, 0) - + storage_available))); + if (use_image_size_limit) + return max( max1, + (image_size_limit > 0) ? + ((int) (storage_needed(1, 0) - (image_size_limit << 8))) : 0); + return max1; +} + +/* suspend2_recalculate_stats + * + * Eaten is the number of pages which have been eaten. + * Pagedirincluded is the number of pages which have been allocated for the pagedir. + */ +struct pageset_sizes_result suspend2_recalculate_stats(int storage_unavailable) +{ + struct pageset_sizes_result result; + + suspend2_mark_pages_for_pageset2(); /* Need to call this before getting pageset1_size! */ + BUG_ON(in_atomic() && !irqs_disabled()); + result = count_data_pages(); + pageset1_sizelow = result.size1low; + pageset2_sizelow = result.size2low; + pagedir1.lastpageset_size = pagedir1.pageset_size = result.size1; + pagedir2.lastpageset_size = pagedir2.pageset_size = result.size2; + if (!storage_unavailable) { + storage_available = active_writer->ops.writer.storage_available(); + display_stats(0, 0); + } + BUG_ON(in_atomic() && !irqs_disabled()); + return result; +} + +/* update_image + * + * Allocate [more] memory and storage for the image. + */ +static int update_image(void) +{ + struct pageset_sizes_result result; + int result2, param_used; + + result = suspend2_recalculate_stats(0); + + if (suspend2_allocate_checksum_pages()) { + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + "Still need to get more pages for checksum pages.\n"); + return 1; + } + + /* Include allowance for growth in pagedir1 while writing pagedir 2 */ + if (suspend2_allocate_extra_pagedir_memory(&pagedir1, + pagedir1.pageset_size + extra_pd1_pages_allowance, + pageset2_sizelow)) { + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Still need to get more pages for pagedir 1.\n"); + return 1; + } + + thaw_processes(FREEZER_KERNEL_THREADS); + + param_used = main_storage_needed(1, 0); + if ((result2 = active_writer->ops.writer.allocate_storage(param_used))) { + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Allocate storage returned %d. Still need to get more storage space for the image proper.\n", + result2); + storage_allocated = active_writer->ops.writer.storage_allocated(); + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + return 1; + } + + param_used = header_storage_needed(); + if ((result2 = active_writer->ops.writer.allocate_header_space(param_used))) { + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Still need to get more storage space for header.\n"); + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + storage_allocated = active_writer->ops.writer.storage_allocated(); + return 1; + } + + header_space_allocated = param_used; + + /* + * Allocate remaining storage space, if possible, up to the + * maximum we know we'll need. It's okay to allocate the + * maximum if the writer is the swapwriter, but + * we don't want to grab all available space on an NFS share. + * We therefore ignore the expected compression ratio here, + * thereby trying to allocate the maximum image size we could + * need (assuming compression doesn't expand the image), but + * don't complain if we can't get the full amount we're after. + */ + + active_writer->ops.writer.allocate_storage( + min(storage_available, + main_storage_needed(0, 1))); + + storage_allocated = active_writer->ops.writer.storage_allocated(); + + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + + suspend2_recalculate_stats(0); + + suspend_message(SUSPEND_EAT_MEMORY, SUSPEND_LOW, 1, + "Amount still needed (%d) > 0:%d. Header: %d < %d: %d," + " Storage allocd: %d < %d + %d: %d.\n", + amount_needed(0), + (amount_needed(0) > 0), + header_space_allocated, header_storage_needed(), + header_space_allocated < header_storage_needed(), + storage_allocated, + header_storage_needed(), main_storage_needed(1, 1), + storage_allocated < + (header_storage_needed() + main_storage_needed(1, 1))); + + check_shift_keys(0, NULL); + + return ((amount_needed(0) > 0) || + header_space_allocated < header_storage_needed() || + storage_allocated < + (header_storage_needed() + main_storage_needed(1, 1))); +} + +/* attempt_to_freeze + * + * Try to freeze processes. + */ + +static int attempt_to_freeze(void) +{ + int result; + + /* Stop processes before checking again */ + thaw_processes(FREEZER_ALL_THREADS); + suspend2_prepare_status(CLEAR_BAR, "Freezing processes"); + result = freeze_processes(); + + if (result) { + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_FREEZING_FAILED); + } else + are_frozen = 1; + + return result; +} + +int storage_needed(int use_ecr, int ignore_extra_pd1_allow) +{ + return (main_storage_needed(use_ecr, ignore_extra_pd1_allow) + + header_storage_needed()); +} + +int ram_to_suspend(void) +{ + return (1 + + max((pagedir1.pageset_size + extra_pd1_pages_allowance - + pageset2_sizelow), 0) + + MIN_FREE_RAM + memory_for_plugins()); +} + + +/* eat_memory + * + * Try to free some memory, either to meet hard or soft constraints on the image + * characteristics. + * + * Hard constraints: + * - Pageset1 must be < half of memory; + * - We must have enough memory free at resume time to have pageset1 + * be able to be loaded in pages that don't conflict with where it has to + * be restored. + * Soft constraints + * - User specificied image size limit. + */ +static int eat_memory(void) +{ + int orig_memory_still_to_eat, last_amount_needed = 0, times_criteria_met = 0; + int free_flags = 0, did_eat_memory = 0; + + /* + * Note that if we have enough storage space and enough free memory, we may + * exit without eating anything. We give up when the last 10 iterations ate + * no extra pages because we're not going to get much more anyway, but + * the few pages we get will take a lot of time. + * + * We freeze processes before beginning, and then unfreeze them if we + * need to eat memory until we think we have enough. If our attempts + * to freeze fail, we give up and abort. + */ + + /* -- Stage 1: Freeze Processes -- */ + + + suspend2_recalculate_stats(0); + + orig_memory_still_to_eat = amount_needed(1); + last_amount_needed = orig_memory_still_to_eat; + + switch (image_size_limit) { + case -1: /* Don't eat any memory */ + if (orig_memory_still_to_eat) { + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_WOULD_EAT_MEMORY); + } + break; + case -2: /* Free caches only */ + free_flags = GFP_NOIO | __GFP_HIGHMEM; + break; + default: + free_flags = GFP_ATOMIC | __GFP_HIGHMEM; + } + + thaw_processes(FREEZER_KERNEL_THREADS); + + /* -- Stage 2: Eat memory -- */ + + while (((amount_needed(1) > 0) || (image_size_limit == -2)) && + (!test_result_state(SUSPEND_ABORTED)) && + (times_criteria_met < 10)) { + int amount_freed; + int amount_wanted = orig_memory_still_to_eat - amount_needed(1); + + suspend2_prepare_status(CLEAR_BAR, "Seeking to free %dMB of memory.", MB(amount_needed(1))); + + if (amount_wanted < 1) + amount_wanted = 1; /* image_size_limit == -2 */ + + if (orig_memory_still_to_eat) + suspend2_update_status(orig_memory_still_to_eat - amount_needed(1), + orig_memory_still_to_eat, + " Image size %d ", + MB(storage_needed(1, 0))); + else + suspend2_update_status(0, 1, "Image size %d ", + MB(storage_needed(1, 0))); + + if ((last_amount_needed - amount_needed(1)) < 10) + times_criteria_met++; + else + times_criteria_met = 0; + last_amount_needed = amount_needed(1); + amount_freed = shrink_all_memory(last_amount_needed); + suspend2_recalculate_stats(0); + + did_eat_memory = 1; + + check_shift_keys(0, NULL); + } + + if (freeze_processes()) { + set_result_state(SUSPEND_FREEZING_FAILED); + set_result_state(SUSPEND_ABORTED); + } + + if (did_eat_memory) { + unsigned long orig_state = get_suspend_state(); + /* Freeze_processes will call sys_sync too */ + restore_suspend_state(orig_state); + suspend2_recalculate_stats(0); + } + + /* Blank out image size display */ + suspend2_update_status(100, 100, NULL); + + if (!test_result_state(SUSPEND_ABORTED)) { + /* Include image size limit when checking what to report */ + if (amount_needed(1) - extra_pd1_pages_allowance > 0) + set_result_state(SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY); + + /* But don't include it when deciding whether to abort (soft limit) */ + if ((amount_needed(0) - extra_pd1_pages_allowance > 0)) { + printk("Unable to free sufficient memory to suspend. Still need %d pages.\n", + amount_needed(1)); + display_stats(1, 1); + set_result_state(SUSPEND_ABORTED); + } + + check_shift_keys(1, "Memory eating completed."); + } + + return 0; +} + +/* prepare_image + * + * Entry point to the whole image preparation section. + * + * We do four things: + * - Freeze processes; + * - Ensure image size constraints are met; + * - Complete all the preparation for saving the image, + * including allocation of storage. The only memory + * that should be needed when we're finished is that + * for actually storing the image (and we know how + * much is needed for that because the plugins tell + * us). + * - Make sure that all dirty buffers are written out. + */ + +#define MAX_TRIES 4 +int suspend2_prepare_image(void) +{ + int result = 1, tries = 0; + + are_frozen = 0; + + header_space_allocated = 0; + + if (attempt_to_freeze()) + return 0; + + if (!extra_pd1_pages_allowance) + get_extra_pd1_allowance(); + + storage_available = active_writer->ops.writer.storage_available(); + + if (!storage_available) { + printk(KERN_ERR "You need some storage available to be able to suspend.\n"); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_NOSTORAGE_AVAILABLE); + return 0; + } + + do { + suspend2_prepare_status(CLEAR_BAR, "Preparing Image."); + + if (eat_memory() || test_result_state(SUSPEND_ABORTED)) + break; + + result = update_image(); + + check_shift_keys(0, NULL); + + tries++; + + } while ((result) && (tries < MAX_TRIES) && (!test_result_state(SUSPEND_ABORTED)) && + (!test_result_state(SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY))); + + if (tries == MAX_TRIES) { + abort_suspend("Unable to get sufficient storage for the image.\n"); + display_stats(1, 0); + } + + check_shift_keys(1, "Image preparation complete."); + + return !result; +} diff -urN oldtree/kernel/power/prepare_image.h newtree/kernel/power/prepare_image.h --- oldtree/kernel/power/prepare_image.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/prepare_image.h 2006-02-13 14:51:54.203925008 -0500 @@ -0,0 +1,31 @@ +/* + * kernel/power/prepare_image.h + */ + +extern int suspend2_prepare_image(void); +extern struct pageset_sizes_result suspend2_recalculate_stats(int storage_available); +extern int real_nr_free_pages(void); +extern int image_size_limit; +extern int pageset1_sizelow, pageset2_sizelow; + +struct pageset_sizes_result { + int size1; /* Can't be unsigned - breaks MAX function */ + int size1low; + int size2; + int size2low; +}; + +#ifdef CONFIG_CRYPTO +extern int suspend2_expected_compression_ratio(void); +#else +static inline int suspend2_expected_compression_ratio(void) +{ + return 0; +}; +#endif + +#define MIN_FREE_RAM (max_low_pfn >> 7) + +extern int extra_pd1_pages_allowance; +extern int storage_needed(int use_ecr, int ignore_extra_p1_allowance); +extern int ram_to_suspend(void); diff -urN oldtree/kernel/power/proc.c newtree/kernel/power/proc.c --- oldtree/kernel/power/proc.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/proc.c 2006-02-13 14:51:54.204924856 -0500 @@ -0,0 +1,305 @@ +/* + * /kernel/power/proc.c + * + * Copyright (C) 2002-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * This file contains support for proc entries for tuning Suspend2. + * + * We have a generic handler that deals with the most common cases, and + * hooks for special handlers to use. + */ + +#include +#include +#include + +#include "proc.h" +#include "suspend2.h" +#include "storage.h" + +static int suspend_proc_initialised = 0; + +static struct list_head suspend_proc_entries; +static struct proc_dir_entry *suspend_dir; +static struct suspend_proc_data proc_params[]; + +extern void __suspend2_try_resume(void); +extern void suspend2_main(void); + +/* suspend2_read_proc + * + * Generic handling for reading the contents of bits, integers, + * unsigned longs and strings. + */ +static int suspend2_read_proc(char *page, char **start, off_t off, int count, + int *eof, void *data) +{ + int len = 0; + struct suspend_proc_data *proc_data = (struct suspend_proc_data *) data; + + if (suspend_start_anything(0)) + return -EBUSY; + + if (proc_data->needs_storage_manager & 1) + suspend2_prepare_usm(); + + switch (proc_data->type) { + case SUSPEND_PROC_DATA_CUSTOM: + if (proc_data->data.special.read_proc) { + read_proc_t *read_proc = proc_data->data.special.read_proc; + len = read_proc(page, start, off, count, eof, data); + } else + len = 0; + break; + case SUSPEND_PROC_DATA_BIT: + len = sprintf(page, "%d\n", + -test_bit(proc_data->data.bit.bit, + proc_data->data.bit.bit_vector)); + break; + case SUSPEND_PROC_DATA_INTEGER: + { + int *variable = proc_data->data.integer.variable; + len = sprintf(page, "%d\n", *variable); + break; + } + case SUSPEND_PROC_DATA_UL: + { + long *variable = proc_data->data.ul.variable; + len = sprintf(page, "%lu\n", *variable); + break; + } + case SUSPEND_PROC_DATA_STRING: + { + char *variable = proc_data->data.string.variable; + len = sprintf(page, "%s\n", variable); + break; + } + } + /* Side effect routine? */ + if (proc_data->read_proc) + proc_data->read_proc(); + + if (len <= count) + *eof = 1; + + if (proc_data->needs_storage_manager & 1) + suspend2_cleanup_usm(); + + suspend_finish_anything(0); + + return len; +} +/* suspend2_write_proc + * + * Generic routine for handling writing to files representing + * bits, integers and unsigned longs. + */ + +static int suspend2_write_proc(struct file *file, const char *buffer, + unsigned long count, void *data) +{ + struct suspend_proc_data *proc_data = (struct suspend_proc_data *) data; + char *my_buf = (char *) get_zeroed_page(GFP_ATOMIC); + int result = count, assigned_temp_buffer = 0; + + if (!my_buf) + return -ENOMEM; + + if (count > PAGE_SIZE) + count = PAGE_SIZE; + + if (copy_from_user(my_buf, buffer, count)) + return -EFAULT; + + if (suspend_start_anything(proc_data == &proc_params[0])) + return -EBUSY; + + my_buf[count] = 0; + + if (proc_data->needs_storage_manager & 2) + suspend2_prepare_usm(); + + switch (proc_data->type) { + case SUSPEND_PROC_DATA_CUSTOM: + if (proc_data->data.special.write_proc) { + write_proc_t *write_proc = proc_data->data.special.write_proc; + result = write_proc(file, buffer, count, data); + } + break; + case SUSPEND_PROC_DATA_BIT: + { + int value = simple_strtoul(my_buf, NULL, 0); + if (value) + set_bit(proc_data->data.bit.bit, + (proc_data->data.bit.bit_vector)); + else + clear_bit(proc_data->data.bit.bit, + (proc_data->data.bit.bit_vector)); + } + break; + case SUSPEND_PROC_DATA_INTEGER: + { + int *variable = proc_data->data.integer.variable; + int minimum = proc_data->data.integer.minimum; + int maximum = proc_data->data.integer.maximum; + *variable = simple_strtol(my_buf, NULL, 0); + if (((*variable) < minimum)) + *variable = minimum; + + if (((*variable) > maximum)) + *variable = maximum; + break; + } + case SUSPEND_PROC_DATA_UL: + { + unsigned long *variable = proc_data->data.ul.variable; + unsigned long minimum = proc_data->data.ul.minimum; + unsigned long maximum = proc_data->data.ul.maximum; + *variable = simple_strtoul(my_buf, NULL, 0); + + if (minimum && ((*variable) < minimum)) + *variable = minimum; + + if (maximum && ((*variable) > maximum)) + *variable = maximum; + break; + } + break; + case SUSPEND_PROC_DATA_STRING: + { + int copy_len = count; + char *variable = + proc_data->data.string.variable; + + if (proc_data->data.string.max_length && + (copy_len > proc_data->data.string.max_length)) + copy_len = proc_data->data.string.max_length; + + if (!variable) { + proc_data->data.string.variable = + variable = (char *) get_zeroed_page(GFP_ATOMIC); + assigned_temp_buffer = 1; + } + strncpy(variable, my_buf, copy_len); + if ((copy_len) && + (my_buf[copy_len - 1] == '\n')) + variable[count - 1] = 0; + variable[count] = 0; + } + break; + } + free_page((unsigned long) my_buf); + /* Side effect routine? */ + if (proc_data->write_proc) + proc_data->write_proc(); + + /* Free temporary buffers */ + if (assigned_temp_buffer) { + free_page((unsigned long) proc_data->data.string.variable); + proc_data->data.string.variable = NULL; + } + + if (proc_data->needs_storage_manager & 2) + suspend2_cleanup_usm(); + + suspend_finish_anything(proc_data == &proc_params[0]); + + return result; +} + +/* Non-plugin proc entries. + * + * This array contains entries that are automatically registered at + * boot. Plugins and the console code register their own entries separately. + * + * NB: If you move do_suspend, change suspend2_write_proc's test so that + * suspend_start_anything still gets a 1 when the user echos > do_suspend! + */ + +static struct suspend_proc_data proc_params[] = { + { .filename = "do_suspend", + .permissions = PROC_WRITEONLY, + .type = SUSPEND_PROC_DATA_CUSTOM, + .write_proc = suspend2_main, + .needs_storage_manager = 2, + }, + + { .filename = "do_resume", + .permissions = PROC_WRITEONLY, + .type = SUSPEND_PROC_DATA_CUSTOM, + .write_proc = __suspend2_try_resume, + .needs_storage_manager = 2, + }, +}; + +/* suspend_initialise_proc + * + * Initialise the /proc/suspend2 directory. + */ + +static void suspend_initialise_proc(void) +{ + int i; + int numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + if (suspend_proc_initialised) + return; + + suspend_dir = proc_mkdir("suspend2", NULL); + + BUG_ON(!suspend_dir); + + INIT_LIST_HEAD(&suspend_proc_entries); + + suspend_proc_initialised = 1; + + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); +} + +/* suspend_register_procfile + * + * Helper for registering a new /proc/suspend2 entry. + */ + +struct proc_dir_entry *suspend_register_procfile( + struct suspend_proc_data *suspend_proc_data) +{ + struct proc_dir_entry *new_entry; + + if (!suspend_proc_initialised) + suspend_initialise_proc(); + + new_entry = create_proc_entry( + suspend_proc_data->filename, + suspend_proc_data->permissions, + suspend_dir); + if (new_entry) { + list_add_tail(&suspend_proc_data->proc_data_list, &suspend_proc_entries); + new_entry->read_proc = suspend2_read_proc; + new_entry->write_proc = suspend2_write_proc; + new_entry->data = suspend_proc_data; + } else { + printk("Error! create_proc_entry returned NULL.\n"); + INIT_LIST_HEAD(&suspend_proc_data->proc_data_list); + } + return new_entry; +} + +/* suspend_unregister_procfile + * + * Helper for removing unwanted /proc/suspend2 entries. + * + */ +void suspend_unregister_procfile(struct suspend_proc_data *suspend_proc_data) +{ + if (list_empty(&suspend_proc_data->proc_data_list)) + return; + + remove_proc_entry( + suspend_proc_data->filename, + suspend_dir); + list_del(&suspend_proc_data->proc_data_list); +} diff -urN oldtree/kernel/power/proc.h newtree/kernel/power/proc.h --- oldtree/kernel/power/proc.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/proc.h 2006-02-13 14:51:54.204924856 -0500 @@ -0,0 +1,70 @@ +/* + * kernel/power/proc.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It provides declarations for suspend to use in managing + * /proc/suspend2. When we switch to kobjects, + * this will become redundant. + * + */ + +#include + +struct suspend_proc_data { + char *filename; + int permissions; + int type; + int needs_storage_manager; + union { + struct { + unsigned long *bit_vector; + int bit; + } bit; + struct { + int *variable; + int minimum; + int maximum; + } integer; + struct { + unsigned long *variable; + unsigned long minimum; + unsigned long maximum; + } ul; + struct { + char *variable; + int max_length; + } string; + struct { + read_proc_t *read_proc; + write_proc_t *write_proc; + void *data; + } special; + } data; + + /* Side effects routines. Used, eg, for reparsing the + * resume2 entry when it changes */ + void (*read_proc) (void); + void (*write_proc) (void); + struct list_head proc_data_list; +}; + +enum { + SUSPEND_PROC_DATA_NONE, + SUSPEND_PROC_DATA_CUSTOM, + SUSPEND_PROC_DATA_BIT, + SUSPEND_PROC_DATA_INTEGER, + SUSPEND_PROC_DATA_UL, + SUSPEND_PROC_DATA_STRING +}; + +#define PROC_WRITEONLY 0200 +#define PROC_READONLY 0400 +#define PROC_RW 0600 + +struct proc_dir_entry *suspend_register_procfile( + struct suspend_proc_data *suspend_proc_data); +void suspend_unregister_procfile(struct suspend_proc_data *suspend_proc_data); + diff -urN oldtree/kernel/power/process.c newtree/kernel/power/process.c --- oldtree/kernel/power/process.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/process.c 2006-02-13 14:51:54.206924552 -0500 @@ -1,134 +1,432 @@ /* - * drivers/power/process.c - Functions for starting/stopping processes on - * suspend transitions. + * kernel/power/process.c * - * Originally from swsusp. + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2004 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Freeze_and_free contains the routines software suspend uses to freeze other + * processes during the suspend cycle and to (if necessary) free up memory in + * accordance with limitations on the image size. + * + * Ideally, the image saved to disk would be an atomic copy of the entire + * contents of all RAM and related hardware state. One of the first + * prerequisites for getting our approximation of this is stopping the activity + * of other processes. We can't stop all other processes, however, since some + * are needed in doing the I/O to save the image. Freeze_and_free.c contains + * the routines that control suspension and resuming of these processes. + * + * Under high I/O load, we need to be careful about the order in which we + * freeze processes. If we freeze processes in the wrong order, we could + * deadlock others. The freeze_order array this specifies the order in which + * critical processes are frozen. All others are suspended after these have + * entered the refrigerator. + * + * Another complicating factor is that freeing memory requires the processes + * to not be frozen, but at the end of freeing memory, they need to be frozen + * so that we can be sure we actually have eaten enough memory. This is why + * freezing and freeing are in the one file. The freezer is not called from + * the main logic, but indirectly, via the code for eating memory. The eat + * memory logic is iterative, first freezing processes and checking the stats, + * then (if necessary) unfreezing them and eating more memory until it looks + * like the criteria are met (at which point processes are frozen & stats + * checked again). */ - -#undef DEBUG - -#include -#include #include +#include #include +#include +#include +#include +#include + +unsigned long freezer_state = 0; + +#if 0 +//#ifdef CONFIG_PM_DEBUG +#define freezer_message(msg, a...) do { printk(msg, ##a); } while(0) +#else +#define freezer_message(msg, a...) do { } while(0) +#endif + +/* Timeouts when freezing */ +#define FREEZER_TOTAL_TIMEOUT (5 * HZ) +#define FREEZER_CHECK_TIMEOUT (HZ / 10) + +DECLARE_COMPLETION(kernelspace_thaw); +DECLARE_COMPLETION(userspace_thaw); +static atomic_t nr_userspace_frozen; +static atomic_t nr_kernelspace_frozen; + +struct frozen_fs +{ + struct list_head fsb_list; + struct super_block *sb; +}; + +LIST_HEAD(frozen_fs_list); + +void freezer_make_fses_rw(void) +{ + struct frozen_fs *fs, *next_fs; + + list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) { + thaw_bdev(fs->sb->s_bdev, fs->sb); + + list_del(&fs->fsb_list); + kfree(fs); + } +} /* - * Timeout for stopping processes + * Done after userspace is frozen, so there should be no danger of + * fses being unmounted while we're in here. */ -#define TIMEOUT (6 * HZ) +int freezer_make_fses_ro(void) +{ + struct frozen_fs *fs; + struct super_block *sb; + + /* Generate the list */ + list_for_each_entry(sb, &super_blocks, s_list) { + if (!sb->s_root || !sb->s_bdev || + (sb->s_frozen == SB_FREEZE_TRANS) || + (sb->s_flags & MS_RDONLY)) + continue; + fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC); + fs->sb = sb; + list_add_tail(&fs->fsb_list, &frozen_fs_list); + }; + + /* Do the freezing in reverse order so filesystems dependant + * upon others are frozen in the right order. (Eg loopback + * on ext3). */ + list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list) + freeze_bdev(fs->sb->s_bdev); -static inline int freezeable(struct task_struct * p) + return 0; +} + +/* + * freezeable + * + * Description: Determine whether a process should be frozen yet. + * Parameters: struct task_struct * The process to consider. + * int Boolean - 0 = userspace else all. + * Returns: int 0 if don't freeze yet, otherwise do. + */ +static int freezeable(struct task_struct * p, int all_freezable) { if ((p == current) || + (p->flags & PF_FROZEN) || (p->flags & PF_NOFREEZE) || (p->exit_state == EXIT_ZOMBIE) || (p->exit_state == EXIT_DEAD) || (p->state == TASK_STOPPED) || - (p->state == TASK_TRACED)) + (p->state == TASK_TRACED) || + (!p->mm && !all_freezable)) return 0; return 1; } -/* Refrigerator is place where frozen processes are stored :-). */ -void refrigerator(void) +static void __freeze_process(struct completion *completion_handler, + atomic_t *nr_frozen) { - /* Hmm, should we be allowed to suspend when there are realtime - processes around? */ long save; - save = current->state; - pr_debug("%s entered refrigerator\n", current->comm); - printk("="); - frozen_process(current); - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); /* We sent fake signal, clean it up */ - spin_unlock_irq(¤t->sighand->siglock); - - while (frozen(current)) { - current->state = TASK_UNINTERRUPTIBLE; - schedule(); - } - pr_debug("%s left refrigerator\n", current->comm); + freezer_message("%s (%d) frozen.\n", + current->comm, current->pid); + save = current->state; + + atomic_inc(nr_frozen); + wait_for_completion(completion_handler); + atomic_dec(nr_frozen); + current->state = save; + freezer_message("%s (%d) leaving freezer.\n", + current->comm, current->pid); } -/* 0 = success, else # of processes that we failed to stop */ -int freeze_processes(void) +/* + * Invoked by the task todo list notifier when the task to be + * frozen is running. + */ +static int freeze_process(struct notifier_block *nl, unsigned long x, void *v) { - int todo; - unsigned long start_time; - struct task_struct *g, *p; unsigned long flags; - printk( "Stopping tasks: " ); - start_time = jiffies; - do { - todo = 0; - read_lock(&tasklist_lock); - do_each_thread(g, p) { - if (!freezeable(p)) - continue; - if (frozen(p)) - continue; + might_sleep(); - freeze(p); - spin_lock_irqsave(&p->sighand->siglock, flags); - signal_wake_up(p, 0); - spin_unlock_irqrestore(&p->sighand->siglock, flags); - todo++; - } while_each_thread(g, p); - read_unlock(&tasklist_lock); - yield(); /* Yield is okay here */ - if (todo && time_after(jiffies, start_time + TIMEOUT)) { - printk( "\n" ); - printk(KERN_ERR " stopping tasks failed (%d tasks remaining)\n", todo ); - break; - } - } while(todo); - - /* This does not unfreeze processes that are already frozen - * (we have slightly ugly calling convention in that respect, - * and caller must call thaw_processes() if something fails), - * but it cleans up leftover PF_FREEZE requests. - */ - if (todo) { - read_lock(&tasklist_lock); - do_each_thread(g, p) - if (freezing(p)) { - pr_debug(" clean up: %s\n", p->comm); - p->flags &= ~PF_FREEZE; - spin_lock_irqsave(&p->sighand->siglock, flags); - recalc_sigpending_tsk(p); - spin_unlock_irqrestore(&p->sighand->siglock, flags); - } - while_each_thread(g, p); - read_unlock(&tasklist_lock); - return todo; + /* Locking to handle race against waking the process in + * freeze threads. */ + spin_lock_irqsave(¤t->sighand->siglock, flags); + current->flags |= PF_FROZEN; + + if (nl) + notifier_chain_unregister(¤t->todo, nl); + + recalc_sigpending(); + spin_unlock_irqrestore(¤t->sighand->siglock, flags); + + if (nl) + kfree(nl); + + if (test_freezer_state(FREEZER_ON)) { + if (current->mm) + __freeze_process(&userspace_thaw, &nr_userspace_frozen); + else + __freeze_process(&kernelspace_thaw, + &nr_kernelspace_frozen); } - printk( "|\n" ); - BUG_ON(in_atomic()); + spin_lock_irqsave(¤t->sighand->siglock, flags); + recalc_sigpending(); + spin_unlock_irqrestore(¤t->sighand->siglock, flags); + + current->flags &= ~PF_FROZEN; + return 0; } -void thaw_processes(void) +void thaw_processes(int do_all_threads) { + if (do_all_threads) { + clear_freezer_state(FREEZER_ON); + clear_freezer_state(ABORT_FREEZING); + } + + complete_all(&kernelspace_thaw); + while (atomic_read(&nr_kernelspace_frozen) > 0) + yield(); + + init_completion(&kernelspace_thaw); + freezer_make_fses_rw(); + + if (do_all_threads) { + complete_all(&userspace_thaw); + while (atomic_read(&nr_userspace_frozen) > 0) + yield(); + init_completion(&userspace_thaw); + } +} + +/* + * num_freezeable + * + * Description: Determine how many processes of our type are still to be + * frozen. As a side effect, update the progress bar too. + * Parameters: int Which type we are trying to freeze. + * int Whether we are displaying our progress. + */ +static int num_freezeable(int do_all_threads) { + struct task_struct *g, *p; + int todo_this_type = 0; - printk( "Restarting tasks..." ); read_lock(&tasklist_lock); do_each_thread(g, p) { - if (!freezeable(p)) + if (freezeable(p, do_all_threads)) + todo_this_type++; + } while_each_thread(g, p); + read_unlock(&tasklist_lock); + + return todo_this_type; +} + +/* + * num_uninterruptible + * + * Description: Determine how many processes of our type are in state + * task uninterruptible. + * Parameters: int Which type we are trying to freeze. + */ +static int num_uninterruptible(int do_all_threads) { + + struct task_struct *g, *p; + int count = 0; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (freezeable(p, do_all_threads) && + p->state == TASK_UNINTERRUPTIBLE) + count++; + } while_each_thread(g, p); + read_unlock(&tasklist_lock); + + return count; +} + +/* + * Tell threads of the type to enter the freezer. + */ +static void signal_threads(int do_all_threads) +{ + struct task_struct *g, *p; + struct notifier_block *n; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (!freezeable(p, do_all_threads)) continue; - if (!thaw_process(p)) - printk(KERN_INFO " Strange, %s not stopped\n", p->comm ); + + n = kmalloc(sizeof(struct notifier_block), + GFP_ATOMIC); + + if (n) { + n->notifier_call = freeze_process; + n->priority = 0; + notifier_chain_register(&p->todo, n); + } } while_each_thread(g, p); + read_unlock(&tasklist_lock); +} + +/* + * Prod processes that haven't entered the refrigerator yet. + */ +static void prod_processes(int do_all_threads) +{ + struct task_struct *g, *p; + unsigned long flags; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (!freezeable(p, do_all_threads)) + continue; + + spin_lock_irqsave(&p->sighand->siglock, flags); + if (!(p->flags & PF_FROZEN)) { + recalc_sigpending(); + signal_wake_up(p, 0); + } + spin_unlock_irqrestore(&p->sighand->siglock, flags); + } while_each_thread(g, p); + read_unlock(&tasklist_lock); +} + +/* + * Freezer failure. + * + * Check whether we failed to freeze all the processes that + * should be frozen. If we find a task that failed to freeze, + * we give useful information on what failed and how. + */ +static int freezer_failure(int do_all_threads) +{ + int result = 0; + struct task_struct *g, *p; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (!freezeable(p, do_all_threads) || + p->state == TASK_UNINTERRUPTIBLE) + continue; + if (!result) { + printk(KERN_ERR "Stopping tasks failed.\n"); + printk(KERN_ERR "Tasks that refused to be " + "refrigerated and haven't since exited:\n"); + set_freezer_state(ABORT_FREEZING); + result = 1; + } + + if ((freezing(p))) { + printk(" - %s (#%d) signalled but " + "didn't enter refrigerator.\n", + p->comm, p->pid); + } else + printk(" - %s (#%d) signalled " + "and todo list empty.\n", + p->comm, p->pid); + } while_each_thread(g, p); read_unlock(&tasklist_lock); - schedule(); - printk( " done\n" ); + + return result; +} + +/* + * freeze_threads + * + * Freeze a set of threads having particular attributes. + * + * Types: + * 2: User threads. + * 3: Kernel threads. + */ +static int freeze_threads(int do_all_threads) +{ + int result = 0, still_to_do; + unsigned long start_time = jiffies; + + if (do_all_threads) + freezer_make_fses_ro(); + + signal_threads(do_all_threads); + + /* Watch them do it, wake them if they ignore us. */ + do { + prod_processes(do_all_threads); + + set_task_state(current, TASK_INTERRUPTIBLE); + schedule_timeout(FREEZER_CHECK_TIMEOUT); + + still_to_do = num_freezeable(do_all_threads) - + num_uninterruptible(do_all_threads); + + } while(still_to_do && (!test_freezer_state(ABORT_FREEZING)) && + !time_after(jiffies, start_time + FREEZER_TOTAL_TIMEOUT)); + + /* + * Did we time out? See if we failed to freeze processes as well. + * + */ + if ((time_after(jiffies, start_time + FREEZER_TOTAL_TIMEOUT)) + && (still_to_do)) + result = freezer_failure(do_all_threads); + + BUG_ON(in_atomic()); + + return 0; +} + +/* + * freeze_processes - Freeze processes prior to saving an image of memory. + * + * Return value: 0 = success, 1 = faulure. + */ +int freeze_processes(void) +{ + enum system_states old_state = system_state; + int result = 0; + + if (!test_freezer_state(FREEZER_ON)) { + /* + * No race. While !FREEZER_ON, processes + * won't enter __freeze_process + */ + init_completion(&userspace_thaw); + init_completion(&kernelspace_thaw); + set_freezer_state(FREEZER_ON); + } + + /* Now freeze processes that were syncing and are still running */ + if (freeze_threads(0) || (test_freezer_state(ABORT_FREEZING))) { + result = 1; + goto out; + } + + /* Freeze kernel threads */ + if (freeze_threads(1) || (test_freezer_state(ABORT_FREEZING))) + result = 1; + +out: + system_state = old_state; + return result; } -EXPORT_SYMBOL(refrigerator); +EXPORT_SYMBOL(freezer_state); diff -urN oldtree/kernel/power/snapshot.c newtree/kernel/power/snapshot.c --- oldtree/kernel/power/snapshot.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/snapshot.c 2006-02-13 14:51:54.207924400 -0500 @@ -146,7 +146,7 @@ return 0; page = pfn_to_page(pfn); - BUG_ON(PageReserved(page) && PageNosave(page)); + //BUG_ON(PageReserved(page) && PageNosave(page)); if (PageNosave(page)) return 0; if (PageReserved(page) && pfn_is_nosave(pfn)) { diff -urN oldtree/kernel/power/storage.c newtree/kernel/power/storage.c --- oldtree/kernel/power/storage.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/storage.c 2006-02-13 14:51:54.207924400 -0500 @@ -0,0 +1,323 @@ +/* + * kernel/power/storage.c + * + * Copyright (C) 2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for talking to a userspace program that manages storage. + * + * The kernel side: + * - starts the userspace program; + * - sends messages telling it when to open and close the connection; + * - tells it when to quit; + * + * The user space side: + * - passes messages regarding status; + * + */ + +#include +#include + +#include "proc.h" +#include "plugins.h" +#include "netlink.h" +#include "storage.h" +#include "ui.h" + +static struct user_helper_data usm_helper_data; +static struct suspend_plugin_ops usm_ops; +static int message_received = 0; +static int activations = 0; +static int usm_prepare_count = 0; +static int storage_manager_last_action = 0; +static int storage_manager_action = 0; + +static int usm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + + type = nlh->nlmsg_type; + + /* A control message: ignore them */ + if (type < NETLINK_MSG_BASE) + return 0; + + /* Unknown message: reply with EINVAL */ + if (type >= USM_MSG_MAX) + return -EINVAL; + + /* All operations require privileges, even GET */ + if (security_netlink_recv(skb)) + return -EPERM; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && usm_helper_data.pid != -1) + return -EBUSY; + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case USM_MSG_SUCCESS: + case USM_MSG_FAILED: + message_received = type; + complete(&usm_helper_data.wait_for_process); + break; + default: + printk("Storage manager doesn't recognise message %d.\n", type); + } + + return 1; +} + +int suspend2_activate_storage(int force) +{ + int tries = 1; + + if (usm_helper_data.pid == -1 || usm_ops.disabled) + return 0; + + message_received = 0; + activations++; + + if (activations > 1 && !force) + return 0; + + while ((!message_received || message_received == USM_MSG_FAILED) && tries < 2) { + suspend2_prepare_status(DONT_CLEAR_BAR, "Activate storage attempt %d.\n", tries); + + init_completion(&usm_helper_data.wait_for_process); + + suspend2_send_netlink_message(&usm_helper_data, + USM_MSG_CONNECT, + NULL, 0); + + /* Wait 2 seconds for the userspace process to make contact */ + wait_for_completion_timeout(&usm_helper_data.wait_for_process, 2*HZ); + + tries++; + } + + return 0; +} + +int suspend2_deactivate_storage(int force) +{ + if (usm_helper_data.pid == -1 || usm_ops.disabled) + return 0; + + message_received = 0; + activations--; + + if (activations && !force) + return 0; + + init_completion(&usm_helper_data.wait_for_process); + + suspend2_send_netlink_message(&usm_helper_data, + USM_MSG_DISCONNECT, + NULL, 0); + + wait_for_completion_timeout(&usm_helper_data.wait_for_process, 2*HZ); + + if (!message_received || message_received == USM_MSG_FAILED) { + printk("Returning failure disconnecting storage.\n"); + return 1; + } + + return 0; +} + +#ifdef CONFIG_PM_DEBUG +static void storage_manager_simulate(void) +{ + printk("--- Storage manager simulate ---\n"); + suspend2_prepare_usm(); + schedule(); + printk("--- Deactivate storage 1 ---\n"); + suspend2_deactivate_storage(1); + schedule(); + printk("--- Activate storage 1 ---\n"); + suspend2_activate_storage(1); + schedule(); + printk("--- Cleanup usm ---\n"); + suspend2_cleanup_usm(); + schedule(); + printk("--- Storage manager simulate ends ---\n"); +} +#endif + +static unsigned long usm_storage_needed(void) +{ + return strlen(usm_helper_data.program); +} + +static int usm_save_config_info(char *buf) +{ + int len = strlen(usm_helper_data.program); + memcpy(buf, usm_helper_data.program, len); + return len; +} + +static void usm_load_config_info(char *buf, int size) +{ + /* Don't load the saved path if one has already been set */ + if (usm_helper_data.program[0]) + return; + + memcpy(usm_helper_data.program, buf, size); +} + +static unsigned long usm_memory_needed(void) +{ + /* ball park figure of 32 pages */ + return (32 * PAGE_SIZE); +} + +/* suspend2_prepare_usm + */ +int suspend2_prepare_usm(void) +{ + usm_prepare_count++; + + if (usm_prepare_count > 1 || usm_ops.disabled) + return 0; + + usm_helper_data.pid = -1; + + if (!*usm_helper_data.program) + return 0; + + suspend2_netlink_setup(&usm_helper_data); + + if (usm_helper_data.pid == -1) + printk("Suspend2 Storage Manager wanted, but couldn't start it.\n"); + + suspend2_activate_storage(0); + + return (usm_helper_data.pid != -1); +} + +void suspend2_cleanup_usm(void) +{ + usm_prepare_count--; + + if (usm_helper_data.pid > -1 && !usm_prepare_count) { + struct task_struct *t; + + suspend2_deactivate_storage(0); + + suspend2_send_netlink_message(&usm_helper_data, + NETLINK_MSG_CLEANUP, NULL, 0); + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(usm_helper_data.pid))) + t->flags &= ~PF_NOFREEZE; + read_unlock(&tasklist_lock); + + suspend2_netlink_close(&usm_helper_data); + + usm_helper_data.pid = -1; + } +} + +static void storage_manager_activate(void) +{ + if (storage_manager_action == storage_manager_last_action) + return; + + if (storage_manager_action) + suspend2_prepare_usm(); + else + suspend2_cleanup_usm(); + + storage_manager_last_action = storage_manager_action; +} + +/* + * User interface specific /proc/suspend entries. + */ + +static struct suspend_proc_data proc_params[] = { + { .filename = "disable_storage_manager", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &usm_ops.disabled, + .minimum = 0, + .maximum = 1, + } + } + }, + { .filename = "storage_manager", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = usm_helper_data.program, + .max_length = 254, + } + } + }, + { .filename = "activate_storage", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &storage_manager_action, + .minimum = 0, + .maximum = 1, + } + }, + .write_proc = storage_manager_activate, + }, + +#ifdef CONFIG_PM_DEBUG + { .filename = "simulate_atomic_copy", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_NONE, + .write_proc = storage_manager_simulate, + } +#endif +}; + +static struct suspend_plugin_ops usm_ops = { + .type = MISC_PLUGIN, + .name = "Userspace Storage Manager", + .module = THIS_MODULE, + .storage_needed = usm_storage_needed, + .save_config_info = usm_save_config_info, + .load_config_info = usm_load_config_info, + .memory_needed = usm_memory_needed, +}; + +/* suspend_usm_proc_init + * Description: Boot time initialisation for user interface. + */ +static __init int suspend_usm_proc_init(void) +{ + int result, i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + if (!(result = suspend_register_plugin(&usm_ops))) + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); + + usm_helper_data.nl = NULL; + usm_helper_data.program[0] = '\0'; + usm_helper_data.pid = -1; + usm_helper_data.skb_size = 0; + usm_helper_data.pool_limit = 6; + usm_helper_data.netlink_id = NETLINK_SUSPEND2_USM; + usm_helper_data.name = "userspace storage manager"; + usm_helper_data.rcv_msg = usm_user_rcv_msg; + usm_helper_data.interface_version = 1; + usm_helper_data.must_init = 0; + init_completion(&usm_helper_data.wait_for_process); + + return result; +} + +late_initcall(suspend_usm_proc_init); diff -urN oldtree/kernel/power/storage.h newtree/kernel/power/storage.h --- oldtree/kernel/power/storage.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/storage.h 2006-02-13 14:51:54.207924400 -0500 @@ -0,0 +1,21 @@ +/* + * + */ + +int suspend2_prepare_usm(void); +void suspend2_cleanup_usm(void); + +int suspend2_activate_storage(int force); +int suspend2_deactivate_storage(int force); + +enum { + USM_MSG_BASE = 0x10, + + /* Kernel -> Userspace */ + USM_MSG_CONNECT = 0x30, + USM_MSG_DISCONNECT = 0x31, + USM_MSG_SUCCESS = 0x40, + USM_MSG_FAILED = 0x41, + + USM_MSG_MAX, +}; diff -urN oldtree/kernel/power/suspend.c newtree/kernel/power/suspend.c --- oldtree/kernel/power/suspend.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend.c 2006-02-13 14:51:54.209924096 -0500 @@ -0,0 +1,1112 @@ +/* + * kernel/power/suspend2.c + */ +/** \mainpage Suspend2. + * + * Suspend2 provides support for saving and restoring an image of + * system memory to an arbitrary storage device, either on the local computer, + * or across some network. The support is entirely OS based, so Suspend2 + * works without requiring BIOS, APM or ACPI support. The vast majority of the + * code is also architecture independant, so it should be very easy to port + * the code to new architectures. Suspend includes support for SMP, 4G HighMem + * and preemption. Initramfses and initrds are also supported. + * + * Suspend2 uses a modular design, in which the method of storing the image is + * completely abstracted from the core code, as are transformations on the data + * such as compression and/or encryption (multiple 'plugins' can be used to + * provide arbitrary combinations of functionality). The user interface is also + * modular, so that arbitrarily simple or complex interfaces can be used to + * provide anything from debugging information through to eye candy. + * + * \section Copyright + * + * Suspend2 is released under the GPLv2. + * + * Copyright (C) 1998-2001 Gabor Kuti
+ * Copyright (C) 1998,2001,2002 Pavel Machek
+ * Copyright (C) 2002-2003 Florent Chabaud
+ * Copyright (C) 2002-2005 Nigel Cunningham
+ * + * \section Credits + * + * Nigel would like to thank the following people for their work: + * + * Pavel Machek
+ * Modifications, defectiveness pointing, being with Gabor at the very beginning, + * suspend to swap space, stop all tasks. Port to 2.4.18-ac and 2.5.17. + * + * Steve Doddi
+ * Support the possibility of hardware state restoring. + * + * Raph
+ * Support for preserving states of network devices and virtual console + * (including X and svgatextmode) + * + * Kurt Garloff
+ * Straightened the critical function in order to prevent compilers from + * playing tricks with local variables. + * + * Andreas Mohr + * + * Alex Badea
+ * Fixed runaway init + * + * Jeff Snyder
+ * ACPI patch + * + * Nathan Friess
+ * Some patches. + * + * Michael Frank
+ * Extensive testing and help with improving stability. Nigel was constantly + * amazed by the quality and quantity of Michael's help. + * + * Bernard Blackham
+ * Web page & Wiki administration, some coding. Another person without whom + * Suspend would not be where it is. + * + * ..and of course the myriads of Suspend2 users who have helped diagnose + * and fix bugs, made suggestions on how to improve the code, proofread + * documentation, and donated time and money. + * + * Thanks also to corporate sponsors: + * + * Cyclades.com. Nigel's employers from Dec 2004, who allow him to work on + * Suspend and PM related issues on company time. + * + * LinuxFund.org. Sponsored Nigel's work on Suspend for four months Oct 2003 + * to Jan 2004. + * + * LAC Linux. Donated P4 hardware that enabled development and ongoing + * maintenance of SMP and Highmem support. + * + * OSDL. Provided access to various hardware configurations, make occasional + * small donations to the project. + */ + +#define SUSPEND_MAIN_C + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "version.h" +#include "suspend2.h" +#include "plugins.h" +#include "proc.h" +#include "pageflags.h" +#include "prepare_image.h" +#include "io.h" +#include "ui.h" +#include "suspend2_common.h" +#include "extent.h" +#include "power_off.h" +#include "atomic_copy.h" +#include "debug_pagealloc.h" +#include "storage.h" + +#ifdef CONFIG_X86 +#include /* for kernel_fpu_end */ +#endif + +/* Variables to be preserved over suspend */ +int pageset1_sizelow = 0, pageset2_sizelow = 0, image_size_limit = 0; +unsigned long suspend2_orig_mem_free = 0; + +static dyn_pageflags_t pageset1_check_map; +static dyn_pageflags_t pageset2_check_map; +static char *debug_info_buffer; +static char suspend_core_version[] = SUSPEND_CORE_VERSION; + +extern void do_suspend2_lowlevel(int resume); +extern __nosavedata char resume_commandline[COMMAND_LINE_SIZE]; + +unsigned long suspend_action = 0; +unsigned long suspend_result = 0; +unsigned long suspend_debug_state = 0; + +/* + * --- Variables ----- + * + * The following are used by the arch specific low level routines + * and only needed if suspend2 is compiled in. Other variables, + * used by the freezer even if suspend2 is not compiled in, are + * found in process.c + */ + +/*! How long I/O took. */ +int suspend_io_time[2][2]; + +/* Compression ratio */ +__nosavedata unsigned long bytes_in = 0, bytes_out = 0; + +/*! Pageset metadata. */ +struct pagedir pagedir1 = { 0, 0}, pagedir2 = { 0, 0}; + +/* Suspend2 variables used by built-in routines. */ + +/*! The number of suspends we have started (some may have been cancelled) */ +unsigned int nr_suspends = 0; + +/*! The console log level we default to. */ +int suspend_default_console_level = 0; + +/* + * For resume2= kernel option. It's pointless to compile + * suspend2 without any writers, but compilation shouldn't + * fail if you do. + */ + +unsigned long software_suspend_state = ((1 << SUSPEND_DISABLED) | (1 << SUSPEND_BOOT_TIME) | + (1 << SUSPEND_RESUME_NOT_DONE) | (1 << SUSPEND_IGNORE_LOGLEVEL)); + +mm_segment_t oldfs; + +char resume2_file[256] = CONFIG_SUSPEND2_DEFAULT_RESUME2; + +static atomic_t actions_running; + +/* + * Basic clean-up routine. + */ +void suspend_finish_anything(int finishing_cycle) +{ + if (atomic_dec_and_test(&actions_running)) { + suspend2_cleanup_plugins(finishing_cycle); + suspend2_put_modules(); + clear_suspend_state(SUSPEND_RUNNING); + } + + set_fs(oldfs); +} + +/* + * Basic set-up routine. + */ +int suspend_start_anything(int starting_cycle) +{ + oldfs = get_fs(); + + if (atomic_add_return(1, &actions_running) == 1) { + set_fs(KERNEL_DS); + + set_suspend_state(SUSPEND_RUNNING); + + if (suspend2_get_modules()) { + printk("Get modules failed!\n"); + clear_suspend_state(SUSPEND_RUNNING); + set_fs(oldfs); + return -EBUSY; + } + + if (suspend2_initialise_plugins(starting_cycle)) { + printk("Initialise plugins failed!\n"); + suspend_finish_anything(starting_cycle); + return -EBUSY; + } + } + + return 0; +} + +/* + * save_image + * Result code (int): Zero on success, non zero on failure. + * Functionality : High level routine which performs the steps necessary + * to prepare and save the image after preparatory steps + * have been taken. + * Key Assumptions : Processes frozen, sufficient memory available, drivers + * suspended. + * Called from : suspend2_suspend_2 + */ + +static int save_image(void) +{ + int temp_result; + + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + " - Final values: %d and %d.\n", + pagedir1.pageset_size, + pagedir2.pageset_size); + + check_shift_keys(1, "About to write pagedir2."); + + temp_result = write_pageset(&pagedir2, 2); + + if (temp_result == -1 || test_result_state(SUSPEND_ABORTED)) + return -1; + + check_shift_keys(1, "About to copy pageset 1."); + + suspend2_deactivate_storage(1); + + suspend2_prepare_status(DONT_CLEAR_BAR, "Doing atomic copy."); + + do_suspend2_lowlevel(0); + + return 0; +} + +/* + * Save the second part of the image. + */ +int save_image_part1(void) +{ + int temp_result, old_ps1_size = pagedir1.pageset_size; + dyn_pageflags_t temp; + + /* Quick switch: We want to compare the old stats with the new ones. */ + temp = pageset1_map; + pageset1_map = pageset1_check_map; + pageset1_check_map = temp; + + temp = pageset2_map; + pageset2_map = pageset2_check_map; + pageset2_check_map = temp; + + BUG_ON(!irqs_disabled()); + + suspend2_recalculate_stats(1); + + if ((pagedir1.pageset_size - old_ps1_size) > extra_pd1_pages_allowance) { + abort_suspend("Pageset1 has grown by %d pages." + " Only %d growth is allowed for!\n", + pagedir1.pageset_size - old_ps1_size, + extra_pd1_pages_allowance); + return -1; + } + + suspend2_map_atomic_copy_pages(); + + BUG_ON(!irqs_disabled()); + + if (!test_action_state(SUSPEND_TEST_FILTER_SPEED) && + !test_action_state(SUSPEND_TEST_BIO)) + suspend2_copy_pageset1(); + + /* + * ---- FROM HERE ON, NEED TO REREAD PAGESET2 IF ABORTING!!! ----- + * + */ + + suspend2_unmap_atomic_copy_pages(); + +#ifdef CONFIG_X86 + kernel_fpu_end(); +#endif + + device_power_up(); + + local_irq_enable(); + + device_resume(); + + if (suspend2_activate_storage(1)) + panic("Failed to reactivate our storage."); + + suspend2_update_status(pagedir2.pageset_size, + pagedir1.pageset_size + pagedir2.pageset_size, + NULL); + + if (test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + check_shift_keys(1, "About to write pageset1."); + + /* + * End of critical section. + */ + + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + "-- Writing pageset1\n"); + + temp_result = write_pageset(&pagedir1, 1); + + /* We didn't overwrite any memory, so no reread needs to be done. */ + if (test_action_state(SUSPEND_TEST_FILTER_SPEED)) + return -1; + + if (temp_result == -1 || test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + check_shift_keys(1, "About to write header."); + + if (test_result_state(SUSPEND_ABORTED)) + goto abort_reloading_pagedir_two; + + temp_result = write_image_header(); + + if (test_action_state(SUSPEND_TEST_BIO)) + return -1; + + if (temp_result || (test_result_state(SUSPEND_ABORTED))) + goto abort_reloading_pagedir_two; + + check_shift_keys(1, "About to power down or reboot."); + + return 0; + +abort_reloading_pagedir_two: + temp_result = read_pageset2(1); + + /* If that failed, we're sunk. Panic! */ + if (temp_result) + panic("Attempt to reload pagedir 2 while aborting " + "a suspend failed."); + + return -1; + +} + +#define SNPRINTF(a...) len += snprintf_used(debug_info_buffer + len, \ + PAGE_SIZE - len - 1, ## a) + +static int io_MB_per_second(int read_write) +{ + if (!suspend_io_time[read_write][1]) + return 0; + + return MB((unsigned long) suspend_io_time[read_write][0]) * HZ / + suspend_io_time[read_write][1]; +} + +/* get_debug_info + * Functionality: Store debug info in a buffer. + * Called from: suspend_try_suspend. + */ + + +static int get_suspend_debug_info(void) +{ + int len = 0; + if (!debug_info_buffer) { + debug_info_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + if (!debug_info_buffer) { + printk("Error! Unable to allocate buffer for" + "software suspend debug info.\n"); + return 0; + } + } + + SNPRINTF("Suspend2 debugging info:\n"); + SNPRINTF("- SUSPEND core : %s\n", SUSPEND_CORE_VERSION); + SNPRINTF("- Kernel Version : %s\n", UTS_RELEASE); + SNPRINTF("- Compiler vers. : %d.%d\n", __GNUC__, __GNUC_MINOR__); + SNPRINTF("- Attempt number : %d\n", nr_suspends); + SNPRINTF("- Parameters : %ld %ld %ld %d %d %ld\n", + suspend_result, + suspend_action, + suspend_debug_state, + suspend_default_console_level, + image_size_limit, + suspend2_powerdown_method); + SNPRINTF("- Overall expected compression percentage: %d.\n", + 100 - suspend2_expected_compression_ratio()); + len+= print_plugin_debug_info(debug_info_buffer + len, + PAGE_SIZE - len - 1); + if (suspend_io_time[0][1]) { + if ((io_MB_per_second(0) < 5) || (io_MB_per_second(1) < 5)) { + SNPRINTF("- I/O speed: Write %d KB/s", + (KB((unsigned long) suspend_io_time[0][0]) * HZ / + suspend_io_time[0][1])); + if (suspend_io_time[1][1]) + SNPRINTF(", Read %d KB/s", + (KB((unsigned long) suspend_io_time[1][0]) * HZ / + suspend_io_time[1][1])); + } else { + SNPRINTF("- I/O speed: Write %d MB/s", + (MB((unsigned long) suspend_io_time[0][0]) * HZ / + suspend_io_time[0][1])); + if (suspend_io_time[1][1]) + SNPRINTF(", Read %d MB/s", + (MB((unsigned long) suspend_io_time[1][0]) * HZ / + suspend_io_time[1][1])); + } + SNPRINTF(".\n"); + } + else + SNPRINTF("- No I/O speed stats available.\n"); + + return len; +} + +/* + * debuginfo_read_proc + * Functionality : Displays information that may be helpful in debugging + * software suspend. + */ +int debuginfo_read_proc(char *page, char **start, off_t off, int count, + int *eof, void *data) +{ + int info_len, copy_len; + + info_len = get_suspend_debug_info(); + + copy_len = min(info_len - (int) off, count); + if (copy_len < 0) + copy_len = 0; + + if (copy_len) { + memcpy(page, debug_info_buffer + off, copy_len); + *start = page; + } + + if (copy_len + off == info_len) + *eof = 1; + + free_page((unsigned long) debug_info_buffer); + debug_info_buffer = NULL; + return copy_len; +} + +static int allocate_bitmaps(void) +{ + suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 1, + "Allocating in_use_map\n"); + if (allocate_dyn_pageflags(&in_use_map) || + allocate_dyn_pageflags(&pageset1_map) || + allocate_dyn_pageflags(&pageset1_copy_map) || + allocate_dyn_pageflags(&allocd_pages_map) || + allocate_dyn_pageflags(&pageset2_map) || +#ifdef CONFIG_DEBUG_PAGEALLOC + allocate_dyn_pageflags(&unmap_map) || +#endif + allocate_dyn_pageflags(&pageset1_check_map) || + allocate_dyn_pageflags(&pageset2_check_map)) + return 1; + + return 0; +} + +static void free_metadata(void) +{ + free_dyn_pageflags(&pageset1_map); + free_dyn_pageflags(&pageset1_copy_map); + free_dyn_pageflags(&allocd_pages_map); + free_dyn_pageflags(&pageset2_map); + free_dyn_pageflags(&in_use_map); + free_dyn_pageflags(&pageset1_check_map); + free_dyn_pageflags(&pageset2_check_map); +} + +static int check_still_keeping_image(void) +{ + if (test_action_state(SUSPEND_KEEP_IMAGE)) { + printk("Image already stored: powering down immediately."); + suspend_power_down(); + return 1; /* Just in case we're using S3 */ + } + + printk("Invalidating previous image.\n"); + active_writer->ops.writer.invalidate_image(); + + return 0; +} + +static int suspend2_init(void) +{ + suspend_result = 0; + + printk(name_suspend "Initiating a software suspend cycle.\n"); + + nr_suspends++; + clear_suspend_state(SUSPEND_NOW_RESUMING); + + suspend_io_time[0][0] = suspend_io_time[0][1] = + suspend_io_time[1][0] = + suspend_io_time[1][1] = 0; + + suspend2_prepare_console(); + + free_metadata(); /* We might have kept it */ + + //attempt_to_parse_resume_device(); + + if (test_suspend_state(SUSPEND_DISABLED)) + return 0; + + if (allocate_bitmaps()) + return 0; + + disable_nonboot_cpus(); + + return 1; +} + +void suspend2_cleanup(void) +{ + int i; + + i = get_suspend_debug_info(); + + suspend2_free_extra_pagedir_memory(); + + pagedir1.pageset_size = pagedir2.pageset_size = 0; + + thaw_processes(FREEZER_KERNEL_THREADS); + +#ifdef CONFIG_SUSPEND2_KEEP_IMAGE + if (test_action_state(SUSPEND_KEEP_IMAGE) && + !test_result_state(SUSPEND_ABORTED)) { + suspend_message(SUSPEND_ANY_SECTION, SUSPEND_LOW, 1, + name_suspend "Not invalidating the image due " + "to Keep Image being enabled.\n"); + set_result_state(SUSPEND_KEPT_IMAGE); + } else +#endif + if (active_writer) + active_writer->ops.writer.invalidate_image(); + + free_metadata(); + +#ifdef CONFIG_DEBUG_PAGE_ALLOC + free_dyn_pageflags(&unmap_map); +#endif + + if (debug_info_buffer) { + /* Printk can only handle 1023 bytes, including + * its level mangling. */ + for (i = 0; i < 3; i++) + printk("%s", debug_info_buffer + (1023 * i)); + free_page((unsigned long) debug_info_buffer); + debug_info_buffer = NULL; + } + + thaw_processes(FREEZER_ALL_THREADS); + + suspend2_cleanup_console(); + + enable_nonboot_cpus(); +} + +static int can_suspend(void) +{ + if (test_suspend_state(SUSPEND_DISABLED)) + attempt_to_parse_resume_device(); + + if (test_suspend_state(SUSPEND_DISABLED)) { + printk(name_suspend "Software suspend is disabled.\n" + "This may be because you haven't put something along the " + "lines of\n\nresume2=swap:/dev/hda1\n\n" + "in lilo.conf or equivalent. (Where /dev/hda1 is your " + "swap partition).\n"); + set_result_state(SUSPEND_ABORTED); + return 0; + } + + return 1; +} + +/* + * suspend2_main + * Functionality : First level of code for software suspend invocations. + * Stores and restores load averages (to avoid a spike), + * allocates bitmaps, freezes processes and eats memory + * as required before suspending drivers and invoking + * the 'low level' code to save the state to disk. + * By the time we return from do_suspend2_lowlevel, we + * have either failed to save the image or successfully + * suspended and reloaded the image. The difference can + * be discerned by checking SUSPEND_ABORTED. + * Called From : + */ + +void suspend2_main(void) +{ + if (suspend2_activate_storage(0)) + return; + + if (!can_suspend()) + goto cleanup; + + /* + * If kept image and still keeping image and suspending to RAM, we will + * return 1 after suspending and resuming (provided the power doesn't + * run out. + */ + if (test_result_state(SUSPEND_KEPT_IMAGE) && check_still_keeping_image()) + goto cleanup; + + if (suspend2_init() && suspend2_prepare_image() && !test_result_state(SUSPEND_ABORTED) && + !test_action_state(SUSPEND_FREEZER_TEST)) + { + if (!test_result_state(SUSPEND_ABORTED)) { + suspend2_prepare_status(DONT_CLEAR_BAR, "Starting to save the image.."); + save_image(); + } + } + + suspend2_cleanup(); +cleanup: + suspend2_deactivate_storage(0); +} + +/* image_exists_read + * + * Return 0 or 1, depending on whether an image is found. + */ +static int image_exists_read(char *page, char **start, off_t off, int count, + int *eof, void *data) +{ + int len = 0; + + if (suspend2_activate_storage(0)) + return count; + + //attempt_to_parse_resume_device(); + + if (!active_writer) + len = sprintf(page, "-1\n"); + else + len = sprintf(page, "%d\n", active_writer->ops.writer.image_exists()); + + *eof = 1; + + suspend2_deactivate_storage(0); + + return len; +} + +/* image_exists_read + * + * Return 0 or 1, depending on whether an image is found. + */ +static int image_exists_write(struct file *file, const char *buffer, + unsigned long count, void *data) +{ + if (suspend2_activate_storage(0)) + return count; + + if (active_writer && active_writer->ops.writer.image_exists()) + active_writer->ops.writer.invalidate_image(); + + suspend2_deactivate_storage(0); + + return count; +} + +/* + * Core proc entries that aren't built in. + * + * This array contains entries that are automatically registered at + * boot. Plugins and the console code register their own entries separately. + */ +static struct suspend_proc_data proc_params[] = { + { .filename = "debug_info", + .permissions = PROC_READONLY, + .type = SUSPEND_PROC_DATA_CUSTOM, + .data = { + .special = { + .read_proc = debuginfo_read_proc, + } + } + }, + + { .filename = "extra_pages_allowance", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &extra_pd1_pages_allowance, + .minimum = 0, + .maximum = 32767, + } + } + }, + + { .filename = "ignore_rootfs", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_IGNORE_ROOTFS, + } + } + }, + + { .filename = "image_exists", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_CUSTOM, + .needs_storage_manager = 3, + .data = { + .special = { + .read_proc = image_exists_read, + .write_proc = image_exists_write, + } + } + }, + + { .filename = "image_size_limit", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &image_size_limit, + .minimum = -2, + .maximum = 32767, + } + } + }, + + { .filename = "last_result", + .permissions = PROC_READONLY, + .type = SUSPEND_PROC_DATA_UL, + .data = { + .ul = { + .variable = &suspend_result, + } + } + }, + + { .filename = "reboot", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_REBOOT, + } + } + }, + + { .filename = "resume2", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .needs_storage_manager = 2, + .data = { + .string = { + .variable = resume2_file, + .max_length = 255, + } + }, + .write_proc = attempt_to_parse_resume_device2, + }, + + { .filename = "resume_commandline", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = resume_commandline, + .max_length = COMMAND_LINE_SIZE, + } + }, + }, + + + { .filename = "version", + .permissions = PROC_READONLY, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = suspend_core_version, + } + } + }, + +#ifdef CONFIG_PM_DEBUG + { .filename = "freezer_test", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_FREEZER_TEST, + } + } + }, + + { .filename = "test_bio", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_TEST_BIO, + } + } + }, + + { .filename = "test_filter_speed", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_TEST_FILTER_SPEED, + } + } + }, + + { .filename = "slow", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_SLOW, + } + } + }, + + { .filename = "no_pageset2", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_NO_PAGESET2, + } + } + }, + +#endif + +#if defined(CONFIG_ACPI) + { .filename = "powerdown_method", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_UL, + .data = { + .ul = { + .variable = &suspend2_powerdown_method, + .minimum = 0, + .maximum = 5, + } + } + }, +#endif + +#ifdef CONFIG_SUSPEND2_KEEP_IMAGE + { .filename = "keep_image", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_KEEP_IMAGE, + } + } + }, +#endif +}; + + +/* + * Called from init kernel_thread. + * We check if we have an image and if so we try to resume. + * We also start ksuspendd if configuration looks right. + */ + +int suspend2_resume(void) +{ + int read_image_result = 0; + + if (sizeof(swp_entry_t) != sizeof(long)) { + printk(KERN_WARNING name_suspend + "The size of swp_entry_t != size of long. " + "Please report this!\n"); + return 1; + } + + if (!resume2_file[0]) + printk(KERN_WARNING name_suspend + "You need to use a resume2= command line parameter to " + "tell Suspend2 where to look for an image.\n"); + + suspend2_activate_storage(0); + + if (!(test_suspend_state(SUSPEND_RESUME_DEVICE_OK)) && + !attempt_to_parse_resume_device()) { + /* + * Without a usable storage device we can do nothing - + * even if noresume is given + */ + + if (!num_writers) + printk(KERN_ALERT name_suspend + "No writers have been registered.\n"); + else + printk(KERN_ALERT name_suspend + "Missing or invalid storage location " + "(resume2= parameter). Please correct and " + "rerun lilo (or equivalent) before " + "suspending.\n"); + suspend2_deactivate_storage(0); + return 1; + } + + suspend2_orig_mem_free = real_nr_free_pages(); + + read_image_result = read_pageset1(); /* non fatal error ignored */ + + if (test_suspend_state(SUSPEND_NORESUME_SPECIFIED)) + printk(KERN_WARNING name_suspend "Resuming disabled as requested.\n"); + + suspend2_deactivate_storage(0); + + if (read_image_result) + return 1; + + suspend_atomic_restore(); + + BUG(); + + return 0; +} + +static __init int core_load(void) +{ + int i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + printk("Suspend2 Core.\n"); + + suspend_initialise_plugin_lists(); + + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); + + return 0; +} + +/* -- Functions for kickstarting a suspend or resume --- */ + +/* + * Check if we have an image and if so try to resume. + */ + +void __suspend2_try_resume(void) +{ + set_suspend_state(SUSPEND_TRYING_TO_RESUME); + + clear_suspend_state(SUSPEND_RESUME_NOT_DONE); + + suspend2_resume(); + + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + clear_suspend_state(SUSPEND_TRYING_TO_RESUME); +} + +/* Wrapper for when called from init/do_mounts.c */ +void suspend2_try_resume(void) +{ + if (suspend_start_anything(0)) + return; + + __suspend2_try_resume(); + + /* + * For initramfs, we have to clear the boot time + * flag after trying to resume + */ + clear_suspend_state(SUSPEND_BOOT_TIME); + + suspend_finish_anything(0); +} + +/* + * suspend2_try_suspend + * Functionality : Wrapper around suspend2_main. + * Called From : drivers/acpi/sleep/main.c + * kernel/reboot.c + */ + +void suspend2_try_suspend(void) +{ + if (suspend_start_anything(0)) + return; + + suspend2_main(); + + suspend_finish_anything(0); +} + +/* -- Commandline Parameter Handling --- + * + * Resume setup: obtain the storage device. + */ + +static int __init resume2_setup(char *str) +{ + if (!*str) + return 0; + + strncpy(resume2_file, str, 255); + return 0; +} + +/* + * Allow the user to set the action parameter from lilo, prior to resuming. + */ +static int __init suspend_act_setup(char *str) +{ + if(str) + suspend_action=simple_strtol(str,NULL,0); + set_suspend_state(SUSPEND_ACT_USED); + return 0; +} + +/* + * Allow the user to set the debug parameter from lilo, prior to resuming. + */ +/* + * Allow the user to specify that we should ignore any image found and + * invalidate the image if necesssary. This is equivalent to running + * the task queue and a sync and then turning off the power. The same + * precautions should be taken: fsck if you're not journalled. + */ +static int __init noresume2_setup(char *str) +{ + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + return 0; +} + +static int __init suspend2_retry_resume_setup(char *str) +{ + set_suspend_state(SUSPEND_RETRY_RESUME); + return 0; +} + +#ifdef CONFIG_PM_DEBUG + +static int __init suspend_dbg_setup(char *str) +{ + if(str) + suspend_debug_state=simple_strtol(str,NULL,0); + set_suspend_state(SUSPEND_DBG_USED); + return 0; +} + +/* + * Allow the user to set the debug level parameter from lilo, prior to + * resuming. + */ +static int __init suspend_lvl_setup(char *str) +{ + if(str) + console_loglevel = + suspend_default_console_level = + simple_strtol(str,NULL,0); + set_suspend_state(SUSPEND_LVL_USED); + clear_suspend_state(SUSPEND_IGNORE_LOGLEVEL); + return 0; +} + +__setup("suspend_dbg=", suspend_dbg_setup); +__setup("suspend_lvl=", suspend_lvl_setup); +#endif + +__setup("noresume2", noresume2_setup); +__setup("resume2=", resume2_setup); +__setup("suspend_act=", suspend_act_setup); +__setup("suspend_retry_resume", suspend2_retry_resume_setup); + +late_initcall(core_load); +EXPORT_SYMBOL(software_suspend_state); diff -urN oldtree/kernel/power/suspend.h newtree/kernel/power/suspend.h --- oldtree/kernel/power/suspend.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend.h 2006-02-13 14:51:54.209924096 -0500 @@ -0,0 +1,28 @@ +/* + * kernel/power/suspend.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It contains declarations used throughout swsusp. + * + */ + +#ifndef KERNEL_POWER_SUSPEND_H +#define KERNEL_POWER_SUSPEND_H + +#define SUSPEND_PD_PAGES(x) (((x)*sizeof(struct pbe))/PAGE_SIZE+1) + +/* mm/page_alloc.c */ +extern void drain_local_pages(void); + +void save_processor_state(void); +void restore_processor_state(void); +struct saved_context; +void __save_processor_state(struct saved_context *ctxt); +void __restore_processor_state(struct saved_context *ctxt); + +extern suspend_pagedir_t *pagedir_nosave __nosavedata; + +#endif diff -urN oldtree/kernel/power/suspend2.h newtree/kernel/power/suspend2.h --- oldtree/kernel/power/suspend2.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend2.h 2006-02-13 14:51:54.210923944 -0500 @@ -0,0 +1,31 @@ +/* + * kernel/power/suspend2.h + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * It contains declarations used throughout swsusp and suspend2. + * + */ +#ifndef KERNEL_POWER_SUSPEND_CORE_H +#define KERNEL_POWER_SUSPEND_CORE_H + +#include +#include + +extern unsigned long suspend2_orig_mem_free; + +#define KB(x) ((x) << (PAGE_SHIFT - 10)) +#define MB(x) ((x) >> (20 - PAGE_SHIFT)) + +extern int suspend_start_anything(int starting_cycle); +extern void suspend_finish_anything(int finishing_cycle); + +#if 1 +#define PRINTK(a...) do { } while(0) +#else +#define PRINTK(fmt, arg...) printk(KERN_DEBUG fmt, ##arg) +#endif + +#endif diff -urN oldtree/kernel/power/suspend2_common.h newtree/kernel/power/suspend2_common.h --- oldtree/kernel/power/suspend2_common.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend2_common.h 2006-02-13 14:51:54.210923944 -0500 @@ -0,0 +1,28 @@ +#ifdef CONFIG_PM_DEBUG +#define set_debug_state(bit) (test_and_set_bit(bit, &suspend_debug_state)) +#define clear_debug_state(bit) (test_and_clear_bit(bit, &suspend_debug_state)) +#else +#define set_debug_state(bit) (0) +#define clear_debug_state(bit) (0) +#endif + +#define set_result_state(bit) (test_and_set_bit(bit, &suspend_result)) +#define clear_result_state(bit) (test_and_clear_bit(bit, &suspend_result)) + +enum { + SUSPEND_ABORT_REQUESTED = 1, + SUSPEND_NOSTORAGE_AVAILABLE, + SUSPEND_INSUFFICIENT_STORAGE, + SUSPEND_FREEZING_FAILED, + SUSPEND_UNEXPECTED_ALLOC, + SUSPEND_KEPT_IMAGE, + SUSPEND_WOULD_EAT_MEMORY, + SUSPEND_UNABLE_TO_FREE_ENOUGH_MEMORY, + SUSPEND_ENCRYPTION_SETUP_FAILED +}; + +extern int suspend_act_used; +extern int suspend_lvl_used; +extern int suspend_dbg_used; +extern int suspend_default_console_level; +extern unsigned int nr_suspends; diff -urN oldtree/kernel/power/suspend_block_io.c newtree/kernel/power/suspend_block_io.c --- oldtree/kernel/power/suspend_block_io.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend_block_io.c 2006-02-13 14:51:54.211923792 -0500 @@ -0,0 +1,1066 @@ +/* + * block_io.c + * + * Copyright 2004-2005 Nigel Cunningham + * + * Distributed under GPLv2. + * + * This file contains block io functions for suspend2. These are + * used by the swapwriter and it is planned that they will also + * be used by the NFSwriter. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "suspend2.h" +#include "proc.h" +#include "plugins.h" +#include "prepare_image.h" +#include "block_io.h" +#include "extent.h" +#include "suspend2_common.h" +#include "ui.h" + +/* Bits in struct io_info->flags */ +enum { + IO_WRITING, + IO_RESTORE_PAGE_PROT, + IO_AWAITING_READ, + IO_AWAITING_WRITE, + IO_AWAITING_SUBMIT, + IO_AWAITING_CLEANUP, + IO_HANDLE_PAGE_PROT +}; + +#define MAX_OUTSTANDING_IO 2048 + +/* + * + * IO in progress information storage and helpers + * + */ + +struct io_info { + struct bio *sys_struct; + sector_t block[MAX_BUF_PER_PAGE]; + struct page *buffer_page; + struct page *data_page; + unsigned long flags; + struct block_device *dev; + struct list_head list; + int readahead_index; + struct work_struct work; + int printme; +}; + +/* Locks separated to allow better SMP support. + * An io_struct moves through the lists as follows. + * free -> submit_batch -> busy -> ready_for_cleanup -> free + */ +static LIST_HEAD(ioinfo_free); +static DEFINE_SPINLOCK(ioinfo_free_lock); + +static LIST_HEAD(ioinfo_ready_for_cleanup); +static DEFINE_SPINLOCK(ioinfo_ready_lock); + +static LIST_HEAD(ioinfo_submit_batch); +static DEFINE_SPINLOCK(ioinfo_submit_lock); + +static LIST_HEAD(ioinfo_busy); +static DEFINE_SPINLOCK(ioinfo_busy_lock); + +static atomic_t submit_batch; +static int submit_batch_size = 64; +static int submit_batched(void); + +struct task_struct *suspend_bio_task; + +/* [Max] number of I/O operations pending */ +static atomic_t outstanding_io; +static int max_outstanding_io = 0; +static atomic_t buffer_allocs, buffer_frees; + +/* [Max] number of pages used for above struct */ +static int infopages = 0; +static int maxinfopages = 0; + +static volatile unsigned long suspend_readahead_flags[(MAX_OUTSTANDING_IO + BITS_PER_LONG - 1) / BITS_PER_LONG]; +static spinlock_t suspend_readahead_flags_lock = SPIN_LOCK_UNLOCKED; +static struct page *suspend_readahead_pages[MAX_OUTSTANDING_IO]; +static int readahead_index, readahead_submit_index; + +static int current_stream; +struct extent_iterate_saved_state suspend_writer_posn_save[3]; + +/* Pointer to current entry being loaded/saved. */ +struct extent_iterate_state suspend_writer_posn; + +/* Not static, so that the allocators can setup and complete + * writing the header */ +char *suspend_writer_buffer; +int suspend_writer_buffer_posn; + +int suspend_read_fd; + +static unsigned long nr_schedule_calls[8]; + +static char *sch_caller[] = { + "get_io_info_struct #1 ", + "get_io_info_struct #2 ", + "get_io_info_struct #3 ", + "suspend_finish_all_io ", + "wait_on_one_page ", + "submit ", + "start_one ", + "suspend_wait_on_readahead", +}; + +static struct suspend2_bdev_info *s2_devinfo; +int need_extra_next; + +/* + * suspend_reset_io_stats + * + * Description: Reset all our sanity-checking statistics. + */ +static void suspend_reset_io_stats(void) +{ + int i; + + max_outstanding_io = 0; + maxinfopages = 0; + + for (i = 0; i < 8; i++) + nr_schedule_calls[i] = 0; +} + +/* + * suspend_check_io_stats + * + * Description: Check that our statistics look right and print + * any debugging info wanted. + */ +static void suspend_check_io_stats(void) +{ + int i; + + BUG_ON(atomic_read(&outstanding_io)); + BUG_ON(infopages); + BUG_ON(!list_empty(&ioinfo_submit_batch)); + BUG_ON(!list_empty(&ioinfo_busy)); + BUG_ON(!list_empty(&ioinfo_ready_for_cleanup)); + BUG_ON(!list_empty(&ioinfo_free)); + BUG_ON(atomic_read(&buffer_allocs) != atomic_read(&buffer_frees)); + + suspend_message(SUSPEND_WRITER, SUSPEND_LOW, 0, + "Maximum outstanding_io was %d.\n", + max_outstanding_io); + suspend_message(SUSPEND_WRITER, SUSPEND_LOW, 0, + "Max info pages was %d.\n", + maxinfopages); + if (atomic_read(&buffer_allocs) != atomic_read(&buffer_frees)) + suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0, + "Buffer allocs (%d) != buffer frees (%d)", + atomic_read(&buffer_allocs), + atomic_read(&buffer_frees)); + for(i = 0; i < 8; i++) + suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0, + "Nr schedule calls %s: %lu.\n", sch_caller[i], nr_schedule_calls[i]); +} + +/* + * cleanup_one + * + * Description: Clean up after completing I/O on a page. + * Arguments: struct io_info: Data for I/O to be completed. + */ +static void __suspend_bio_cleanup_one(struct io_info *io_info) +{ + struct page *buffer_page; + struct page *data_page; + char *buffer_address, *data_address; + int reading; + + buffer_page = io_info->buffer_page; + data_page = io_info->data_page; + + reading = test_bit(IO_AWAITING_READ, &io_info->flags); + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0, + "Cleanup IO: [%p]\n", + io_info); + + if (reading && io_info->readahead_index == -1) { + /* + * Copy the page we read into the buffer our caller provided. + */ + data_address = (char *) kmap(data_page); + buffer_address = (char *) kmap(buffer_page); + memcpy(data_address, buffer_address, PAGE_SIZE); + kunmap(data_page); + kunmap(buffer_page); + + } + + if (!reading || io_info->readahead_index == -1) { + /* Sanity check */ + if (page_count(buffer_page) != 2) + printk(KERN_EMERG "Cleanup IO: Page count on page %p is %d. Not good!\n", + buffer_page, page_count(buffer_page)); + put_page(buffer_page); + __free_page(buffer_page); + atomic_inc(&buffer_frees); + } else + put_page(buffer_page); + + bio_put(io_info->sys_struct); + io_info->sys_struct = NULL; + io_info->flags = 0; +} + +/* __suspend_io_cleanup + */ + +static int suspend_bio_cleanup_one(void *data) +{ + struct io_info *io_info = (struct io_info *) data; + int readahead_index; + unsigned long flags; + + /* + * If this I/O was a readahead, remember its index. + */ + readahead_index = io_info->readahead_index; + + /* + * Add it to the free list. + */ + list_del_init(&io_info->list); + + /* + * Do the cleanup. + */ + __suspend_bio_cleanup_one(io_info); + + /* + * Record the readahead as done. + */ + if (readahead_index > -1) { + int index = readahead_index/BITS_PER_LONG; + int bit = readahead_index - (index * BITS_PER_LONG); + spin_lock_irqsave(&suspend_readahead_flags_lock, flags); + set_bit(bit, &suspend_readahead_flags[index]); + spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags); + } + + spin_lock_irqsave(&ioinfo_free_lock, flags); + list_add_tail(&io_info->list, &ioinfo_free); + spin_unlock_irqrestore(&ioinfo_free_lock, flags); + + /* Important: Must be last thing we do to avoid a race with + * finish_all_io when using keventd to do the cleanup */ + atomic_dec(&outstanding_io); + + return 0; +} + +/* suspend_cleanup_some_completed_io + * + * NB: This is designed so that multiple callers can be in here simultaneously. + */ + +static void suspend_cleanup_some_completed_io(void) +{ + int num_cleaned = 0; + struct io_info *first; + unsigned long flags; + + spin_lock_irqsave(&ioinfo_ready_lock, flags); + while(!list_empty(&ioinfo_ready_for_cleanup)) { + int result; + first = list_entry(ioinfo_ready_for_cleanup.next, struct io_info, list); + + BUG_ON(!test_and_clear_bit(IO_AWAITING_CLEANUP, &first->flags)); + + list_del_init(&first->list); + + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); + + result = suspend_bio_cleanup_one((void *) first); + + spin_lock_irqsave(&ioinfo_ready_lock, flags); + if (result) + continue; + num_cleaned++; + if (num_cleaned == submit_batch_size) + break; + } + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); +} + +/* do_bio_wait + * + * Actions taken when we want some I/O to get run. + * + * Submit any I/O that's batched up (if we're not already doing + * that, unplug queues, schedule and clean up whatever we can. + */ +static void do_bio_wait(int caller) +{ + int num_submitted = 0; + + nr_schedule_calls[caller]++; + + /* Don't want to wait on I/O we haven't submitted! */ + num_submitted = submit_batched(); + + kblockd_flush(); + + io_schedule(); + + suspend_cleanup_some_completed_io(); +} + +/* + * suspend_finish_all_io + * + * Description: Finishes all IO and frees all IO info struct pages. + */ +static void suspend_finish_all_io(void) +{ + struct io_info *this, *next = NULL; + unsigned long flags; + + /* Wait for all I/O to complete. */ + while (atomic_read(&outstanding_io)) + do_bio_wait(2); + + spin_lock_irqsave(&ioinfo_free_lock, flags); + + /* + * Two stages, to avoid using freed pages. + * + * First free all io_info structs on a page except the first. + */ + list_for_each_entry_safe(this, next, &ioinfo_free, list) { + if (((unsigned long) this) & ~PAGE_MASK) + list_del(&this->list); + } + + /* + * Now we have only one reference to each page, and can safely + * free pages, knowing we're not going to be trying to access the + * same page after freeing it. + */ + list_for_each_entry_safe(this, next, &ioinfo_free, list) { + list_del(&this->list); + free_page((unsigned long) this); + infopages--; + suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 0, + "[FreedIOPage %lx]", this); + } + + spin_unlock_irqrestore(&ioinfo_free_lock, flags); +} + +/* + * wait_on_one_page + * + * Description: Wait for a particular I/O to complete. + */ +static void wait_on_one_page(struct io_info *io_info) +{ + do { do_bio_wait(3); } while (io_info->flags); +} + +/* + * wait_on_readahead + * + * Wait until a particular readahead is ready. + */ +static void suspend_wait_on_readahead(int readahead_index) +{ + int index = readahead_index / BITS_PER_LONG; + int bit = readahead_index - index * BITS_PER_LONG; + + /* read_ahead_index is the one we want to return */ + while (!test_bit(bit, &suspend_readahead_flags[index])) + do_bio_wait(6); +} + +/* + * readahead_done + * + * Returns whether the readahead requested is ready. + */ + +static int suspend_readahead_ready(int readahead_index) +{ + int index = readahead_index / BITS_PER_LONG; + int bit = readahead_index - (index * BITS_PER_LONG); + + return test_bit(bit, &suspend_readahead_flags[index]); +} + +/* suspend_readahead_prepare + * Set up for doing readahead on an image */ +static int suspend_prepare_readahead(int index) +{ + unsigned long new_page = get_zeroed_page(GFP_ATOMIC); + + if(!new_page) + return -ENOMEM; + + suspend_readahead_pages[index] = virt_to_page(new_page); + return 0; +} + +/* suspend_readahead_cleanup + * Clean up structures used for readahead */ +static void suspend_cleanup_readahead(int page) +{ + __free_page(suspend_readahead_pages[page]); + suspend_readahead_pages[page] = 0; + return; +} + +/* + * suspend_end_bio + * + * Description: Function called by block driver from interrupt context when I/O + * is completed. This is the reason we use spinlocks in + * manipulating the io_info lists. + * Nearly the fs/buffer.c version, but we want to mark the page as + * done in our own structures too. + */ + +static int suspend_end_bio(struct bio *bio, unsigned int num, int err) +{ + struct io_info *io_info = bio->bi_private; + unsigned long flags; + + spin_lock_irqsave(&ioinfo_busy_lock, flags); + list_del_init(&io_info->list); + spin_unlock_irqrestore(&ioinfo_busy_lock, flags); + + set_bit(IO_AWAITING_CLEANUP, &io_info->flags); + + spin_lock_irqsave(&ioinfo_ready_lock, flags); + list_add_tail(&io_info->list, &ioinfo_ready_for_cleanup); + spin_unlock_irqrestore(&ioinfo_ready_lock, flags); + return 0; +} + +/** + * submit - submit BIO request. + * @rw: READ or WRITE. + * @io_info: IO info structure. + * + * Based on Patrick's pmdisk code from long ago: + * "Straight from the textbook - allocate and initialize the bio. + * If we're writing, make sure the page is marked as dirty. + * Then submit it and carry on." + * + * With a twist, though - we handle block_size != PAGE_SIZE. + * Caller has already checked that our page is not fragmented. + */ + +static int submit(int rw, struct io_info *io_info) +{ + int error = 0; + struct bio *bio = NULL; + unsigned long flags; + + while (!bio) { + bio = bio_alloc(GFP_ATOMIC,1); + if (!bio) + do_bio_wait(4); + } + + bio->bi_bdev = io_info->dev; + bio->bi_sector = io_info->block[0]; + bio->bi_private = io_info; + bio->bi_end_io = suspend_end_bio; + bio->bi_flags |= (1 << BIO_SUSPEND2); + io_info->sys_struct = bio; + if (io_info->printme) + PRINTK("%s dev %p block %ld => sector %ld\n", + rw ? "Write" : "Read", + bio->bi_bdev, io_info->block[0], + (unsigned long) bio->bi_sector); + + if (bio_add_page(bio, io_info->buffer_page, PAGE_SIZE, 0) < PAGE_SIZE) { + printk("ERROR: adding page to bio at %lld\n", + (unsigned long long) io_info->block[0]); + bio_put(bio); + return -EFAULT; + } + + if (rw == WRITE) + bio_set_pages_dirty(bio); + + spin_lock_irqsave(&ioinfo_busy_lock, flags); + list_add_tail(&io_info->list, &ioinfo_busy); + spin_unlock_irqrestore(&ioinfo_busy_lock, flags); + + submit_bio(rw,bio); + + return error; +} + +/* + * submit a batch. The submit function can wait on I/O, so we have + * simple locking to avoid infinite recursion. + */ +static int submit_batched(void) +{ + static int running_already = 0; + struct io_info *first; + unsigned long flags; + int num_submitted = 0; + + running_already = 1; + spin_lock_irqsave(&ioinfo_submit_lock, flags); + while(!list_empty(&ioinfo_submit_batch)) { + first = list_entry(ioinfo_submit_batch.next, struct io_info, list); + + BUG_ON(!test_and_clear_bit(IO_AWAITING_SUBMIT, &first->flags)); + + list_del_init(&first->list); + + atomic_dec(&submit_batch); + + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + + if (test_bit(IO_AWAITING_READ, &first->flags)) + submit(READ, first); + else + submit(WRITE, first); + + spin_lock_irqsave(&ioinfo_submit_lock, flags); + + num_submitted++; + if (num_submitted == submit_batch_size) + break; + } + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + running_already = 0; + + return num_submitted; +} + +static void add_to_batch(struct io_info *io_info) +{ + unsigned long flags; + + set_bit(IO_AWAITING_SUBMIT, &io_info->flags); + + /* Put our prepared I/O struct on the batch list. */ + spin_lock_irqsave(&ioinfo_submit_lock, flags); + list_add_tail(&io_info->list, &ioinfo_submit_batch); + spin_unlock_irqrestore(&ioinfo_submit_lock, flags); + + atomic_inc(&submit_batch); + + if ((!suspend_bio_task) && (atomic_read(&submit_batch) >= submit_batch_size)) + submit_batched(); +} + +/* + * get_io_info_struct + * + * Description: Get an I/O struct. + * Returns: Pointer to the struct prepared for use. + */ +static struct io_info *get_io_info_struct(void) +{ + unsigned long newpage = 0, flags; + struct io_info *this = NULL; + int remaining = 0; + + do { + while (atomic_read(&outstanding_io) >= MAX_OUTSTANDING_IO) + do_bio_wait(0); + + /* Can start a new I/O. Is there a free one? */ + if (!list_empty(&ioinfo_free)) { + /* Yes. Grab it. */ + spin_lock_irqsave(&ioinfo_free_lock, flags); + break; + } + + /* No. Need to allocate a new page for I/O info structs. */ + newpage = get_zeroed_page(GFP_ATOMIC); + if (!newpage) { + do_bio_wait(1); + continue; + } + + suspend_message(SUSPEND_MEMORY, SUSPEND_VERBOSE, 0, + "[NewIOPage %lx]", newpage); + infopages++; + if (infopages > maxinfopages) + maxinfopages++; + + /* Prepare the new page for use. */ + this = (struct io_info *) newpage; + remaining = PAGE_SIZE; + spin_lock_irqsave(&ioinfo_free_lock, flags); + while (remaining >= (sizeof(struct io_info))) { + list_add_tail(&this->list, &ioinfo_free); + this = (struct io_info *) (((char *) this) + + sizeof(struct io_info)); + remaining -= sizeof(struct io_info); + } + break; + } while (1); + + /* + * We have an I/O info struct. Remove it from the free list. + * It will be added to the submit or busy list later. + */ + this = list_entry(ioinfo_free.next, struct io_info, list); + list_del_init(&this->list); + spin_unlock_irqrestore(&ioinfo_free_lock, flags); + return this; +} + +/* + * start_one + * + * Description: Prepare and start a read or write operation. + * Note that we use our own buffer for reading or writing. + * This simplifies doing readahead and asynchronous writing. + * We can begin a read without knowing the location into which + * the data will eventually be placed, and the buffer passed + * for a write can be reused immediately (essential for the + * plugins system). + * Failure? What's that? + * Returns: The io_info struct created. + */ +static struct io_info *start_one(int rw, struct submit_params *submit_info) +{ + struct io_info *io_info = get_io_info_struct(); + unsigned long buffer_virt = 0; + char *to, *from; + struct page *buffer_page; + + if (!io_info) + return NULL; + + /* Get our local buffer */ + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1, + "Start_IO: [%p]", io_info); + + /* Copy settings to the io_info struct */ + io_info->data_page = submit_info->page; + io_info->readahead_index = submit_info->readahead_index; + io_info->printme = submit_info->printme; + + if (io_info->readahead_index == -1) { + while (!(buffer_virt = get_zeroed_page(GFP_ATOMIC))) + do_bio_wait(5); + + atomic_inc(&buffer_allocs); + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 0, + "[ALLOC BUFFER]->%d", + real_nr_free_pages()); + buffer_page = virt_to_page(buffer_virt); + + io_info->buffer_page = buffer_page; + } else { + unsigned long flags; + int index = io_info->readahead_index / BITS_PER_LONG; + int bit = io_info->readahead_index - index * BITS_PER_LONG; + + spin_lock_irqsave(&suspend_readahead_flags_lock, flags); + clear_bit(bit, &suspend_readahead_flags[index]); + spin_unlock_irqrestore(&suspend_readahead_flags_lock, flags); + + io_info->buffer_page = buffer_page = submit_info->page; + } + + /* If writing, copy our data. The data is probably in + * lowmem, but we cannot be certain. If there is no + * compression/encryption, we might be passed the + * actual source page's address. */ + if (rw == WRITE) { + set_bit(IO_WRITING, &io_info->flags); + + to = (char *) buffer_virt; + from = kmap_atomic(io_info->data_page, KM_USER1); + memcpy(to, from, PAGE_SIZE); + kunmap_atomic(from, KM_USER1); + } + + /* Submit the page */ + get_page(buffer_page); + + io_info->dev = submit_info->dev; + io_info->block[0] = submit_info->block[0]; + + if (rw == READ) + set_bit(IO_AWAITING_READ, &io_info->flags); + else + set_bit(IO_AWAITING_WRITE, &io_info->flags); + + suspend_message(SUSPEND_WRITER, SUSPEND_HIGH, 1, + "-> (PRE BRW) %d\n", + real_nr_free_pages()); + + if (submit_batch_size > 1) + add_to_batch(io_info); + else + submit(rw, io_info); + + atomic_inc(&outstanding_io); + if (atomic_read(&outstanding_io) > max_outstanding_io) + max_outstanding_io++; + + return io_info; +} + +static int suspend_do_io(int rw, + struct submit_params *submit_info, int syncio) +{ + struct io_info *io_info; + + if(!submit_info->dev) { + printk("Suspend do io called with null bdev.\n"); + return 1; + } + + io_info = start_one(rw, submit_info); + + if (!io_info) + return 1; + else if (syncio) + wait_on_one_page(io_info); + + /* If we were the only one, clean everything up */ + if (!atomic_read(&outstanding_io)) + suspend_finish_all_io(); + return 0; +} + +/* We used to use bread here, but it doesn't correctly handle + * blocksize != PAGE_SIZE. Now we create a submit_info to get the data we + * want and use our normal routines (synchronously). + */ + +static int suspend_bdev_page_io(int rw, struct block_device *bdev, long pos, + struct page *page) +{ + struct submit_params submit_info; + + if (!bdev) { + printk("Suspend_bdev_page_io got null device.\n"); + return 0; + } + + submit_info.page = page; + submit_info.dev = bdev; + submit_info.block[0] = pos; + submit_info.readahead_index = -1; + return suspend_do_io(rw, &submit_info, 1); +} + +static unsigned long suspend_bio_memory_needed(void) +{ + /* We want to have at least enough memory so as to have + * MAX_OUTSTANDING_IO transactions on the fly at once. If we + * can to more, fine. */ + return (MAX_OUTSTANDING_IO * (PAGE_SIZE + sizeof(struct request) + + sizeof(struct bio) + sizeof(struct io_info))); +} + +static void suspend_set_devinfo(struct suspend2_bdev_info *info) +{ + s2_devinfo = info; +} + +static int forward_one_page(void) +{ + int i, j; + + for (j = 0; j < need_extra_next + 1; j++) { + extent_state_next(&suspend_writer_posn); + + /* Have to go forward one to ensure we're on the right chain, + * before we can know how many more blocks to skip.*/ + for (i = 1; i < s2_devinfo[suspend_writer_posn.current_chain].blocks_per_page; i++) + extent_state_next(&suspend_writer_posn); + + if (extent_state_eof(&suspend_writer_posn)) { + printk("Extent state eof.\n"); + return -ENODATA; + } + } + + need_extra_next = 0; + + return 0; +} + +static int suspend_rw_page(int rw, struct page *page, + int readahead_index, int sync) +{ + int i, current_chain; + struct submit_params submit_params; + + if (test_action_state(SUSPEND_TEST_FILTER_SPEED)) + return 0; + + submit_params.readahead_index = readahead_index; + submit_params.page = page; + + if (forward_one_page()) + return -ENODATA; + + current_chain = suspend_writer_posn.current_chain; + submit_params.dev = s2_devinfo[current_chain].bdev; + submit_params.block[0] = (suspend_writer_posn.current_offset - + s2_devinfo[current_chain].blocks_per_page + 1) << + s2_devinfo[current_chain].bmap_shift; + + //printk("%s: %lx:%d.\n", rw ? "Write" : "Read", submit_params.dev->bd_dev, submit_params.block[0]); + + i = suspend_do_io(rw, &submit_params, sync); + + if (i) + return -EIO; + + return 0; +} + +static int suspend_bio_read_chunk(struct page *buffer_page, int sync) +{ + static int last_result; + unsigned long *virt; + + if (sync == SUSPEND_ASYNC) + return suspend_rw_page(READ, buffer_page, -1, sync); + + /* Start new readahead while we wait for our page */ + if (readahead_index == -1) { + last_result = 0; + readahead_index = readahead_submit_index = 0; + } + + /* Start a new readahead? */ + if (last_result) { + /* We failed to submit a read, and have cleaned up + * all the readahead previously submitted */ + if (readahead_submit_index == readahead_index) + return -EPERM; + goto wait; + } + + do { + if (suspend_prepare_readahead(readahead_submit_index)) + break; + + last_result = suspend_rw_page( + READ, + suspend_readahead_pages[readahead_submit_index], + readahead_submit_index, SUSPEND_ASYNC); + if (last_result) { + printk("Begin read chunk for page %d returned %d.\n", + readahead_submit_index, last_result); + suspend_cleanup_readahead(readahead_submit_index); + break; + } + + readahead_submit_index++; + + if (readahead_submit_index == MAX_OUTSTANDING_IO) + readahead_submit_index = 0; + + } while((!last_result) && (readahead_submit_index != readahead_index) && + (!suspend_readahead_ready(readahead_index))); + +wait: + suspend_wait_on_readahead(readahead_index); + + virt = kmap_atomic(buffer_page, KM_USER1); + memcpy(virt, page_address(suspend_readahead_pages[readahead_index]), + PAGE_SIZE); + kunmap_atomic(virt, KM_USER1); + + suspend_cleanup_readahead(readahead_index); + + readahead_index++; + if (readahead_index == MAX_OUTSTANDING_IO) + readahead_index = 0; + + return 0; +} + +static int suspend_read_init(int stream_number) +{ + current_stream = stream_number; + extent_state_restore(&suspend_writer_posn, + &suspend_writer_posn_save[current_stream]); + + BUG_ON(!suspend_writer_posn.current_extent); + + suspend_reset_io_stats(); + + readahead_index = readahead_submit_index = -1; + + return 0; +} + +static int suspend_read_cleanup(void) +{ + suspend_finish_all_io(); + while (readahead_index != readahead_submit_index) { + suspend_cleanup_readahead(readahead_index); + readahead_index++; + if (readahead_index == MAX_OUTSTANDING_IO) + readahead_index = 0; + } + suspend_check_io_stats(); + return 0; +} + +static int suspend_write_init(int stream_number) +{ + extent_state_restore(&suspend_writer_posn, + &suspend_writer_posn_save[stream_number]); + current_stream = stream_number; + + BUG_ON(!suspend_writer_posn.current_extent); + + suspend_reset_io_stats(); + + return 0; +} + +static int suspend_write_cleanup(void) +{ + if (current_stream == 2) + extent_state_save(&suspend_writer_posn, + &suspend_writer_posn_save[1]); + + suspend_finish_all_io(); + + suspend_check_io_stats(); + + return 0; +} + +static int suspend_write_chunk(struct page *buffer_page) +{ + return suspend_rw_page(WRITE, buffer_page, -1, 0); +} + +static int suspend_rw_header_chunk(int rw, char *buffer, int buffer_size) +{ + int bytes_left = buffer_size; + + /* Read a chunk of the header */ + while (bytes_left) { + char *source_start = buffer + buffer_size - bytes_left; + char *dest_start = suspend_writer_buffer + suspend_writer_buffer_posn; + int capacity = PAGE_SIZE - suspend_writer_buffer_posn; + char *to = rw ? dest_start : source_start; + char *from = rw ? source_start : dest_start; + + if (bytes_left <= capacity) { + memcpy(to, from, bytes_left); + suspend_writer_buffer_posn += bytes_left; + return rw ? 0 : buffer_size; + } + + /* Next to read the next page */ + memcpy(to, from, capacity); + bytes_left -= capacity; + + if (rw == READ && test_suspend_state(SUSPEND_TRY_RESUME_RD)) + sys_read(suspend_read_fd, + suspend_writer_buffer, BLOCK_SIZE); + else { + if (suspend_rw_page(rw, + virt_to_page(suspend_writer_buffer), + -1, !rw)) + return -EIO; + } + + suspend_writer_buffer_posn = 0; + check_shift_keys(0, NULL); + } + + return rw ? 0 : buffer_size; +} + +static int read_header_chunk(char *buffer, int buffer_size) +{ + return suspend_rw_header_chunk(READ, buffer, buffer_size); +} + +static int write_header_chunk(char *buffer, int buffer_size) +{ + return suspend_rw_header_chunk(WRITE, buffer, buffer_size); +} + +struct suspend_bio_ops suspend_bio_ops = { + .submit_io = suspend_do_io, + .bdev_page_io = suspend_bdev_page_io, + .rw_page = suspend_rw_page, + .wait_on_readahead = suspend_wait_on_readahead, + .check_io_stats = suspend_check_io_stats, + .reset_io_stats = suspend_reset_io_stats, + .finish_all_io = suspend_finish_all_io, + .prepare_readahead = suspend_prepare_readahead, + .cleanup_readahead = suspend_cleanup_readahead, + .readahead_pages = suspend_readahead_pages, + .readahead_ready = suspend_readahead_ready, + .need_extra_next = &need_extra_next, + .forward_one_page = forward_one_page, + .set_devinfo = suspend_set_devinfo, + .read_init = suspend_read_init, + .read_chunk = suspend_bio_read_chunk, + .read_cleanup = suspend_read_cleanup, + .write_init = suspend_write_init, + .write_chunk = suspend_write_chunk, + .write_cleanup = suspend_write_cleanup, + .read_header_chunk = read_header_chunk, + .write_header_chunk = write_header_chunk, +}; + +static struct suspend_plugin_ops suspend_blockwriter_ops = +{ + .name = "Block I/O", + .type = MISC_PLUGIN, + .module = THIS_MODULE, + .memory_needed = suspend_bio_memory_needed, +}; + +static __init int suspend_block_io_load(void) +{ + return suspend_register_plugin(&suspend_blockwriter_ops); +} + +#ifdef MODULE +static __exit void suspend_block_io_unload(void) +{ + suspend_unregister_plugin(&suspend_blockwriter_ops); +} + +module_init(suspend_block_io_load); +module_exit(suspend_block_io_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 block io functions"); +#else +late_initcall(suspend_block_io_load); +#endif diff -urN oldtree/kernel/power/suspend_checksums.c newtree/kernel/power/suspend_checksums.c --- oldtree/kernel/power/suspend_checksums.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend_checksums.c 2006-02-13 14:51:54.212923640 -0500 @@ -0,0 +1,509 @@ +#include +#include +#ifdef CONFIG_KDB +#include +#include +#endif +#include + +#include "suspend.h" +#include "plugins.h" +#include "pageflags.h" +#include "proc.h" +#include "pagedir.h" +#include "ui.h" + +#define CHECKSUMS_PER_PAGE ((PAGE_SIZE - sizeof(void *)) / sizeof(unsigned long)) +#define NEXT_CHECKSUM_PAGE(page) *((unsigned long *) (((char *) (page)) + PAGE_SIZE - sizeof(void *))) + +static int checksum_pages; +static unsigned long *first_checksum_page, *last_checksum_page; +static int num_reload_pages = 0; + +struct reload_data +{ + int pageset; + int pagenumber; + struct page *page_address; + char *base_version; + char *compared_version; + struct reload_data *next; +}; + +static struct reload_data *first_reload_data, *last_reload_data; + +unsigned long suspend_page_checksum(struct page *page) +{ + unsigned long *virt; + int i; + unsigned long value = 0; + + virt = (unsigned long *) kmap_atomic(page, KM_USER0); + for (i = 0; i < (PAGE_SIZE / sizeof(unsigned long)); i++) + value += *(virt + i); + kunmap_atomic(virt, KM_USER0); + return value; +} + +extern void get_first_pbe(struct pbe *pbe, struct pagedir *pagedir); +extern void get_next_pbe(struct pbe *pbe); + +void __suspend_calculate_checksums(dyn_pageflags_t map, unsigned long **current_checksum_page, + int *page_index) +{ + int page_number; + + BITMAP_FOR_EACH_SET(map, page_number) { + *(*current_checksum_page + *page_index) = + suspend_page_checksum(pfn_to_page(page_number)); + *page_index++; + if (*page_index == CHECKSUMS_PER_PAGE) { + *page_index = 0; + *current_checksum_page = (unsigned long *) + NEXT_CHECKSUM_PAGE(*current_checksum_page); + } + }; +} + +void suspend_calculate_checksums(void) +{ + int page_index = 0; + unsigned long *current_checksum_page = first_checksum_page; + + if (!first_checksum_page) { + suspend2_prepare_status(1, 0, "Unable to checksum at this point."); + return; + } + + suspend2_prepare_status(1, 0, "Calculating checksums... "); + + __suspend_calculate_checksums(pageset1_map, ¤t_checksum_page, + &page_index); + + __suspend_calculate_checksums(pageset2_map, ¤t_checksum_page, + &page_index); + + suspend2_prepare_status(1, 0, "Checksums done."); +} + +int __suspend_check_checksums(int whichpagedir, unsigned long **current_checksum_page, + int *page_index, struct reload_data **next_reload_data) +{ + int page_number, num_differences = 0; + unsigned long sum_now; + dyn_pageflags_t map; + + if (whichpagedir == 1) + map = pageset1_map; + else + map = pageset2_map; + + BITMAP_FOR_EACH_SET(map, page_number) { + /* Also ignore the page containing our variables */ + if (!PageChecksumIgnore(pfn_to_page(page_number))) { + /* Also ignore the page containing our variables */ + sum_now = suspend_page_checksum(pfn_to_page(page_number)); + if (sum_now != *(*current_checksum_page + *page_index)) { + num_differences++; + if (next_reload_data) { + char *virt; + struct reload_data *this = *next_reload_data; + this->pageset = whichpagedir; + this->pagenumber = page_number; + this->page_address = pfn_to_page(page_number); + virt = kmap_atomic(pfn_to_page(page_number), KM_USER0); + memcpy(this->compared_version, + virt, PAGE_SIZE); + kunmap_atomic(virt, KM_USER0); + *next_reload_data = this->next; + } + } + } + + *page_index++; + if (*page_index == CHECKSUMS_PER_PAGE) { + *page_index = 0; + *current_checksum_page = (unsigned long *) + NEXT_CHECKSUM_PAGE(*current_checksum_page); + } + } + + return num_differences; +} + +void suspend_check_checksums(void) +{ + int page_index = 0, num_differences = 0; + unsigned long *current_checksum_page = first_checksum_page; + struct reload_data *next_reload_data = first_reload_data; + + if (!first_checksum_page) { + suspend2_prepare_status(1, 0, "Unable to checksum at this point."); + return; + } + + num_differences += __suspend_check_checksums(1, ¤t_checksum_page, + &page_index, &next_reload_data); + + num_differences += __suspend_check_checksums(2, ¤t_checksum_page, + &page_index, &next_reload_data); +} + +/* + * free_reload_data. + * + * Reload data begins on a page boundary. + */ +void suspend_free_reload_data(void) +{ + struct reload_data *this_data = first_reload_data; + struct reload_data *prev_reload_data = this_data; + + while (this_data) { + if (this_data->compared_version) + free_pages((unsigned long) this_data->compared_version, 0); + + if (this_data->base_version) + free_pages((unsigned long) this_data->base_version, 0); + + this_data = this_data->next; + + if (!(((unsigned long) this_data) & ~PAGE_MASK)) { + prev_reload_data->next = this_data; + prev_reload_data = this_data; + } + } + + this_data = first_reload_data; + while (this_data) { + prev_reload_data = this_data; + this_data = this_data->next; + free_pages((unsigned long) prev_reload_data, 0); + num_reload_pages--; + } + + first_reload_data = last_reload_data = NULL; + +} + +/* suspend_reread_pages() + * + * Description: Reread pages from an image for diagnosing differences. + * Arguments: page_list: A list containing information on pages + * to be reloaded, sorted by pageset and + * page index. + * Returns: Zero on success or -1 on failure. + */ + +int suspend_reread_pages(struct reload_data *page_list) +{ + int result = 0, whichtoread, pageset_offset = -1; + long i = 0; + struct suspend_plugin_ops *this_filter, *first_filter = get_next_filter(NULL); + dyn_pageflags_t *pageflags = &pageset1_map; + + if (!page_list) + return 0; + + for (whichtoread = page_list->pageset; whichtoread <= 2; whichtoread++) { + struct pagedir *pagedir; + + switch (whichtoread) { + case 1: + pagedir = &pagedir1; + break; + case 2: + pagedir = &pagedir2; + pageflags = &pageset2_map; + pageset_offset = -1; + i = -1; + break; + default: + goto out; + } + + suspend_message(SUSPEND_IO, SUSPEND_LOW, 0, + "Reread pages from pagedir %d.\n", whichtoread); + + /* Initialise page transformers */ + list_for_each_entry(this_filter, &suspend_filters, ops.filter.filter_list) { + if (this_filter->disabled) + continue; + if (this_filter->read_init && + this_filter->read_init(whichtoread)) { + abort_suspend("Failed to initialise a filter."); + return 1; + } + } + + /* Initialise writer */ + if (active_writer->read_init(whichtoread)) { + abort_suspend("Failed to initialise the writer."); + result = 1; + goto reread_free_buffers; + } + + /* Read the pages */ + while(i <= page_list->pagenumber) { + /* Read */ + result = first_filter->ops.filter.read_chunk( + virt_to_page(page_list->base_version), + SUSPEND_SYNC); + + if (result) { + abort_suspend("Failed to read a chunk of the image."); + goto reread_free_buffers; + } + + /* Interactivity*/ + check_shift_keys(0, NULL); + + /* Prepare next */ + pageset_offset = get_next_bit_on(*pageflags, pageset_offset); + + /* Got the one we're after? */ + i++; + + if (i == page_list->pagenumber) + page_list = page_list->next; + + if (page_list->pageset != whichtoread) + break; + } + +reread_free_buffers: + + /* Cleanup reads from this pageset. */ + list_for_each_entry(this_filter, &suspend_plugins, plugin_list) { + if (this_filter->disabled) + continue; + if (this_filter->read_cleanup && + this_filter->read_cleanup()) { + abort_suspend("Failed to cleanup a filter."); + result = 1; + } + } + + if (active_writer->read_cleanup()) { + abort_suspend("Failed to cleanup the writer."); + result = 1; + } + } +out: + printk("\n"); + + return result; +} +void suspend_free_checksum_pages(void) +{ + unsigned long *next_checksum_page; + + while(first_checksum_page) { + next_checksum_page = + (unsigned long *) NEXT_CHECKSUM_PAGE(first_checksum_page); + free_pages((unsigned long) first_checksum_page, 0); + first_checksum_page = next_checksum_page; + } + last_checksum_page = NULL; + checksum_pages = 0; +} + +#define PRINTABLE(a) (((a) < 32 || (a) > 122) ? '.' : (a)) +static void local_print_location( + unsigned char *real, + unsigned char *original, + unsigned char *resumetime) +{ + int i; + + for (i = 0; i < 8; i++) + if (*(original + i) != *(resumetime + i)) + break; + if (i == 8) + return; + + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "%p", real); + if (PageChecksumIgnore(virt_to_page(real))) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + " [NoSave]"); + if (PageSlab(virt_to_page(real))) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + " [Slab]"); + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "\n"); + +#ifdef CONFIG_KDB + for (i = 0; i < 8; i++) { + static const char *last_sym = NULL; + if (*(original + i) != *(resumetime + i)) { + kdb_symtab_t symtab; + + kdbnearsym((unsigned long) real + i, + &symtab); + + if ((!symtab.sym_name) || + (symtab.sym_name == last_sym)) + continue; + + last_sym = symtab.sym_name; + + suspend_message(SUSPEND_INTEGRITY, SUSPEND_LOW, 1, + "%p = %s\n", + symtab.sym_start, + symtab.sym_name); + } + } +#endif + + for (i = 0; i < 8; i++) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + "%2x ", *(original + i)); + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " "); + for (i = 0; i < 8; i++) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + "%c", PRINTABLE(*(original + i))); + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " "); + + for (i = 0; i < 8; i++) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + "%2x ", *(resumetime + i)); + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, " "); + for (i = 0; i < 8; i++) + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, + "%c", PRINTABLE(*(resumetime + i))); + suspend_message(SUSPEND_INTEGRITY, SUSPEND_HIGH, 1, "\n\n"); +} + +int suspend_allocate_reload_data(int pages) +{ + struct reload_data *this_data; + unsigned long data_start; + int i; + + if (num_reload_pages >= pages) + return 0; + + for (i = 1; i <= pages; i++) { + data_start = get_zeroed_page(GFP_ATOMIC); + + if (!data_start) + return -ENOMEM; + + SetPageChecksumIgnore(virt_to_page(data_start)); + this_data = (struct reload_data *) data_start; + num_reload_pages++; + + while (data_start == + ((((unsigned long) (this_data + 1)) - 1) & PAGE_MASK)) { + struct page *page; + unsigned long virt; + + virt = get_zeroed_page(GFP_ATOMIC); + if (!virt) { + printk("Couldn't get a page in which to store " + "a changed page.\n"); + return -ENOMEM; + } + page = virt_to_page(virt); + + this_data->compared_version = (char *) virt; + SetPageChecksumIgnore(page); + + virt = get_zeroed_page(GFP_ATOMIC); + if (!virt) { + printk("Couldn't get a page in which to store " + "a baseline page.\n"); + return -ENOMEM; + } + page = virt_to_page(virt); + + this_data->base_version = (char *) virt; + SetPageChecksumIgnore(page); + + if (last_reload_data) + last_reload_data->next = this_data; + else + first_reload_data = this_data; + + last_reload_data = this_data; + + this_data++; + } + + check_shift_keys(0, NULL); + } + + return 0; +} + +void suspend_print_differences(void) +{ + struct reload_data *this_data = first_reload_data; + int i; + + suspend_reread_pages(first_reload_data); + + while (this_data) { + if (this_data->pageset && + this_data->pagenumber) { + suspend_message(SUSPEND_INTEGRITY, SUSPEND_MEDIUM, 1, + "Pagedir %d. Page %d. Address %p." + " Base %p. Copy %p.\n", + this_data->pageset, + this_data->pagenumber, + page_address(this_data->page_address), + this_data->base_version, + this_data->compared_version); + for (i= 0; i < (PAGE_SIZE / 8); i++) { + local_print_location( + page_address(this_data->page_address) + i * 8, + this_data->base_version + i * 8, + this_data->compared_version + i * 8); + check_shift_keys(0, NULL); + } + check_shift_keys(1, NULL); + } else + return; + this_data = this_data->next; + } +} + +int __suspend_allocate_checksum_pages(void) +{ + int pages_required = + (pagedir1.pageset_size + pagedir2.pageset_size) / CHECKSUMS_PER_PAGE; + unsigned long this_page; + + while (checksum_pages <= pages_required) { + this_page = get_zeroed_page(GFP_ATOMIC); + if (!this_page) + return -ENOMEM; + + if (!first_checksum_page) + first_checksum_page = + (unsigned long *) this_page; + else + NEXT_CHECKSUM_PAGE(last_checksum_page) = this_page; + + last_checksum_page = (unsigned long *) this_page; + SetPageChecksumIgnore(virt_to_page(this_page)); + checksum_pages++; + } + + return suspend_allocate_reload_data(2); +} + +int suspend_checksum_init(void) +{ + if (suspend_allocate_dyn_pageflags(&checksum_map)) + return 1; + return 0; +} + + +void suspend_checksum_cleanup(void) +{ + suspend_free_reload_data(); + suspend_free_checksum_pages(); + + suspend_free_dyn_pageflags(&checksum_map); +} diff -urN oldtree/kernel/power/suspend_file.c newtree/kernel/power/suspend_file.c --- oldtree/kernel/power/suspend_file.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend_file.c 2006-02-13 14:51:54.213923488 -0500 @@ -0,0 +1,1069 @@ +/* + * Filewriter.c + * + * Copyright 2005 Nigel Cunningham + * + * Distributed under GPLv2. + * + * This file encapsulates functions for usage of a simple file as a + * backing store. It is based upon the swapwriter, and shares the + * same basic working. Here, though, we have nothing to do with + * swapspace, and only one device to worry about. + * + * The user can just + * + * echo Suspend2 > /path/to/my_file + * + * and + * + * echo /path/to/my_file > /proc/software_suspend/filewriter_target + * + * then put what they find in /proc/software_suspend/resume2 + * as their resume2= parameter in lilo.conf (and rerun lilo if using it). + * + * Having done this, they're ready to suspend and resume. + * + * TODO: + * - File resizing. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "suspend2.h" +#include "suspend2_common.h" +#include "version.h" +#include "proc.h" +#include "plugins.h" +#include "ui.h" +#include "extent.h" +#include "io.h" +#include "storage.h" +#include "block_io.h" + +static struct suspend_plugin_ops filewriterops; + +/* Details of our target. */ + +char filewriter_target[256]; +static struct inode *target_inode; +static struct file *target_file; +static struct block_device *target_bdev; +static int used_devt = 0; +static sector_t target_firstblock = 0; +static int target_storage_available = 0; +static int target_claim = 0; + +static char HaveImage[] = "HaveImage\n"; +static char NoImage[] = "Suspend2\n"; +static const int resumed_before_byte = sizeof(HaveImage) + 1; +#define sig_size resumed_before_byte + +extern dev_t ROOT_DEV; +extern char *__initdata root_device_name; + +/* Header_pages must be big enough for signature */ +static int header_pages, main_pages; + +#define target_is_normal_file() (S_ISREG(target_inode->i_mode)) + +static struct suspend2_bdev_info devinfo; + +static void set_devinfo(struct block_device *bdev, int target_blkbits) +{ + devinfo.bdev = bdev; + if (!target_blkbits) { + devinfo.bmap_shift = devinfo.blocks_per_page = 0; + } else { + devinfo.bmap_shift = target_blkbits - 9; + devinfo.blocks_per_page = (1 << (PAGE_SHIFT - target_blkbits)); + } +} + +/* Extent chain for blocks */ +static struct extent_chain block_chain; + +/* Signature operations */ +enum { + GET_IMAGE_EXISTS, + INVALIDATE, + MARK_RESUME_ATTEMPTED, +}; + +/* Helpers. */ + +static int filewriter_storage_available(void) +{ + int result = 0; + + if (!target_inode) + return 0; + + switch (target_inode->i_mode & S_IFMT) { + case S_IFSOCK: + case S_IFCHR: + case S_IFIFO: /* Socket, Char, Fifo */ + return -1; + case S_IFREG: /* Regular file: current size - holes + free space on part */ + result = target_storage_available; + break; + case S_IFBLK: /* Block device */ + if (target_bdev->bd_disk) { + if (target_bdev->bd_part) + result = (unsigned long)target_bdev->bd_part->nr_sects >> (PAGE_SHIFT - 9); + else + result = (unsigned long)target_bdev->bd_disk->capacity >> (PAGE_SHIFT - 9); + } else { + printk("bdev->bd_disk null.\n"); + return 0; + } + } + + return result; +} + +static int has_contiguous_blocks(int page_num) +{ + int j; + sector_t last = 0; + + for (j = 0; j < devinfo.blocks_per_page; j++) { + sector_t this = bmap(target_inode, + page_num * devinfo.blocks_per_page + j); + + if (!this || (last && (last + 1) != this)) + break; + + last = this; + } + + return (j == devinfo.blocks_per_page); +} + +/* + * Ramdisk access variables + */ + +static int size_ignoring_sparseness(void) +{ + int mappable = 0, i; + + if (target_is_normal_file()) { + for (i = 0; i < (target_inode->i_size >> PAGE_SHIFT) ; i++) + if (has_contiguous_blocks(i)) + mappable++; + + return mappable; + } else + return filewriter_storage_available(); +} + +static void get_main_pool_phys_params(void) +{ + int i; + + if (block_chain.first) + put_extent_chain(&block_chain); + + if (target_is_normal_file()) { + int extent_min = -1, extent_max = -1; + + for (i = 0; + i < (target_inode->i_size >> PAGE_SHIFT); + i++) { + sector_t new_sector; + + if (!has_contiguous_blocks(i)) + continue; + + new_sector = bmap(target_inode, + (i * devinfo.blocks_per_page)); + + /* + * I'd love to be able to fill in holes and resize + * files, but not yet... + */ + + if (new_sector == extent_max + 1) + extent_max+= devinfo.blocks_per_page; + else { + if (extent_min > -1) { + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent %d-%d.\n", + extent_min << devinfo.bmap_shift, + ((extent_max + 1) << devinfo.bmap_shift) - 1); + append_extent_to_extent_chain( + &block_chain, + extent_min, + extent_max); + } + extent_min = new_sector; + extent_max = extent_min + devinfo.blocks_per_page - 1; + } + } + if (extent_min > -1) { + append_extent_to_extent_chain(&block_chain, + extent_min, extent_max); + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent %d-%d.\n", + extent_min << devinfo.bmap_shift, + ((extent_max + 1) << devinfo.bmap_shift) - 1); + } + + } else + if (target_storage_available > 0) { + append_extent_to_extent_chain(&block_chain, + 0, + min(main_pages, target_storage_available) * devinfo.blocks_per_page - 1); + } +} + +static void get_target_info(int get_size) +{ + if (!target_bdev || IS_ERR(target_bdev)) { + target_inode = NULL; + set_devinfo(NULL, 0); + target_storage_available = 0; + } else { + if (!target_inode) + target_inode = target_bdev->bd_inode; + set_devinfo(target_bdev, target_inode->i_blkbits); + if (get_size) + target_storage_available = size_ignoring_sparseness(); + } +} + +static void filewriter_cleanup(int finishing_cycle) +{ + if (target_bdev) { + if (target_claim) { + bd_release(target_bdev); + target_claim = 0; + } + + if (used_devt) { + blkdev_put(target_bdev); + used_devt = 0; + } + target_bdev = NULL; + get_target_info(0); + } + + if (target_file > 0) { + filp_close(target_file, NULL); + target_file = NULL; + } +} + +static void filewriter_get_target_info(char *target, int get_size, + int resume2) +{ + if (target_file) + filewriter_cleanup(0); + + if (!target || !strlen(target)) + return; + + target_file = filp_open(target, O_RDWR, 0); + + if (IS_ERR(target_file) || !target_file) { + dev_t resume_dev_t; + + if (!resume2) { + printk("Open file %s returned %p.\n", target, target_file); + target_file = NULL; + return; + } + + target_file = NULL; + resume_dev_t = name_to_dev_t(target); + if (!resume_dev_t) { + printk("Open file %s returned %p and name_to_devt failed.\n", target, target_file); + if (!resume_dev_t) { + struct kstat stat; + int error = vfs_stat(target, &stat); + if (error) { + printk("Stating the file also failed. Nothing more we can do.\n"); + return; + } + resume_dev_t = stat.rdev; + } + return; + } + target_bdev = open_by_devnum(resume_dev_t, FMODE_READ); + if (IS_ERR(target_bdev)) { + printk("Got a dev_num (%lx) but failed to open it.\n", + (unsigned long) resume_dev_t); + return; + } + used_devt = 1; + target_inode = target_bdev->bd_inode; + } else + target_inode = target_file->f_mapping->host; + + if (S_ISLNK(target_inode->i_mode) || + S_ISDIR(target_inode->i_mode) || + S_ISSOCK(target_inode->i_mode) || + S_ISFIFO(target_inode->i_mode)) { + printk("The filewriter works with regular files, character files and block devices.\n"); + goto cleanup; + } + + if (!used_devt) { + if (S_ISBLK(target_inode->i_mode)) { + target_bdev = I_BDEV(target_inode); + if (!bd_claim(target_bdev, &filewriterops)) + target_claim = 1; + } else + target_bdev = target_inode->i_sb->s_bdev; + } + + get_target_info(get_size); + + if (!resume2) + target_firstblock = bmap(target_inode, 0) << devinfo.bmap_shift; + + return; +cleanup: + target_inode = NULL; + if (target_file) { + filp_close(target_file, NULL); + target_file = NULL; + } + get_target_info(0); +} + +int parse_signature(char *header) +{ + int have_image = !memcmp(HaveImage, header, sizeof(HaveImage) - 1); + int no_image_header = !memcmp(NoImage, header, sizeof(NoImage) - 1); + + if (no_image_header) + return 0; + + if (!have_image) + return -1; + + if (header[resumed_before_byte] & 1) + set_suspend_state(SUSPEND_RESUMED_BEFORE); + else + clear_suspend_state(SUSPEND_RESUMED_BEFORE); + + return 1; +} + +/* prepare_signature */ + +static int prepare_signature(char *current_header) +{ + /* + * Explicitly put the \0 that clears the 'tried to resume from + * this image before' flag. + */ + strncpy(current_header, HaveImage, sizeof(HaveImage)); + current_header[resumed_before_byte] = 0; + return 0; +} + +static int filewriter_storage_allocated(void) +{ + int result; + + if (!target_inode) + return 0; + + if (target_is_normal_file()) { + result = (int) target_storage_available; + } else + result = header_pages + main_pages; + + return result; +} + +static int filewriter_release_storage(void) +{ + if ((test_action_state(SUSPEND_KEEP_IMAGE)) && + test_suspend_state(SUSPEND_NOW_RESUMING)) + return 0; + + put_extent_chain(&block_chain); + + header_pages = main_pages = 0; + return 0; +} + +static int filewriter_allocate_header_space(int space_requested) +{ + int i; + + /* We only steal pages from the main pool. If it doesn't have any yet... */ + + if (!block_chain.first) + return 0; + + extent_state_goto_start(&suspend_writer_posn); + + for (i = 0; i < space_requested; i++) { + if (suspend_bio_ops.forward_one_page()) + return -ENOSPC; + } + + /* The end of header pages will be the start of pageset 2 */ + extent_state_save(&suspend_writer_posn, &suspend_writer_posn_save[2]); + header_pages = space_requested; + return 0; +} + +static int filewriter_allocate_storage(int space_requested) +{ + int result = 0, prev_header_pages; + /* FIXME This looks wrong */ + int blocks_to_get = (space_requested << devinfo.bmap_shift) - block_chain.size; + + /* Only release_storage reduces the size */ + if (blocks_to_get < 1) + return 0; + + main_pages = space_requested; + + get_main_pool_phys_params(); + + suspend_message(SUSPEND_WRITER, SUSPEND_MEDIUM, 0, + "Finished with block_chain.size == %d.\n", + block_chain.size); + + if (block_chain.size < (header_pages + main_pages)) + result = -ENOSPC; + + prev_header_pages = header_pages; + header_pages = 0; + filewriter_allocate_header_space(prev_header_pages); + return result; +} + +static int filewriter_write_header_init(void) +{ + char new_sig[sig_size]; + + suspend_writer_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + suspend_writer_buffer_posn = 0; + + /* We change it once the whole header is written */ + strcpy(new_sig, NoImage); + suspend_bio_ops.write_header_chunk(new_sig, sig_size); + + extent_state_goto_start(&suspend_writer_posn); + + /* Info needed to bootstrap goes at the start of the header. + * First we save the basic info needed for reading, including the number + * of header pages. Then we save the structs containing data needed + * for reading the header pages back. + * Note that even if header pages take more than one page, when we + * read back the info, we will have restored the location of the + * next header page by the time we go to use it. + */ + suspend_bio_ops.write_header_chunk((char *) &suspend_writer_posn_save, + 3 * sizeof(struct extent_iterate_saved_state)); + + serialise_extent_chain(&block_chain); + + return 0; +} + +static int filewriter_write_header_cleanup(void) +{ + /* Write any unsaved data */ + if (suspend_writer_buffer_posn) { + if (suspend_bio_ops.rw_page(WRITE, + virt_to_page(suspend_writer_buffer), + -1, 0)) + return -EIO; + } + + suspend_bio_ops.finish_all_io(); + + extent_state_goto_start(&suspend_writer_posn); + extent_state_next(&suspend_writer_posn); + + /* Adjust image header */ + suspend_bio_ops.bdev_page_io(READ, target_bdev, + target_firstblock, + virt_to_page(suspend_writer_buffer)); + + prepare_signature(suspend_writer_buffer); + + suspend_bio_ops.bdev_page_io(WRITE, target_bdev, + target_firstblock, + virt_to_page(suspend_writer_buffer)); + + free_page((unsigned long) suspend_writer_buffer); + suspend_writer_buffer = NULL; + + suspend_bio_ops.finish_all_io(); + + return 0; +} + +/* HEADER READING */ + +#ifdef CONFIG_DEVFS_FS +int create_dev(char *name, dev_t dev, char *devfs_name); +#else +static int create_dev(char *name, dev_t dev, char *devfs_name) +{ + sys_unlink(name); + return sys_mknod(name, S_IFBLK|0600, new_encode_dev(dev)); +} +#endif + +static int rd_init(void) +{ + suspend_writer_buffer_posn = 0; + + create_dev("/dev/root", ROOT_DEV, root_device_name); + create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, 0), NULL); + + suspend_read_fd = sys_open("/dev/root", O_RDONLY, 0); + if (suspend_read_fd < 0) + goto out; + + sys_read(suspend_read_fd, suspend_writer_buffer, BLOCK_SIZE); + + memcpy(&suspend_writer_posn_save, + suspend_writer_buffer + suspend_writer_buffer_posn, + sizeof(suspend_writer_posn_save)); + + suspend_writer_buffer_posn += sizeof(suspend_writer_posn_save); + + return 0; +out: + sys_unlink("/dev/ram"); + sys_unlink("/dev/root"); + return -EIO; +} + +static int file_init(void) +{ + suspend_writer_buffer_posn = sig_size; + + /* Read filewriter configuration */ + suspend_bio_ops.bdev_page_io(READ, target_bdev, + target_firstblock, + virt_to_page((unsigned long) suspend_writer_buffer)); + + return 0; +} + +/* + * read_header_init() + * + * Ramdisk support based heavily on init/do_mounts_rd.c + * + * Description: + * 1. Attempt to read the device specified with resume2=. + * 2. Check the contents of the header for our signature. + * 3. Warn, ignore, reset and/or continue as appropriate. + * 4. If continuing, read the filewriter configuration section + * of the header and set up block device info so we can read + * the rest of the header & image. + * + * Returns: + * May not return if user choose to reboot at a warning. + * -EINVAL if cannot resume at this time. Booting should continue + * normally. + */ + +static int filewriter_read_header_init(void) +{ + int result; + + *(suspend_bio_ops.need_extra_next) = 1; + + suspend_writer_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + if (test_suspend_state(SUSPEND_TRY_RESUME_RD)) + result = rd_init(); + else + result = file_init(); + + if (result) + return result; + + suspend_writer_buffer_posn = sig_size; + memcpy(&suspend_writer_posn_save, + suspend_writer_buffer + suspend_writer_buffer_posn, + 3 * sizeof(struct extent_iterate_saved_state)); + + suspend_writer_buffer_posn += 3 * sizeof(struct extent_iterate_saved_state); + + extent_state_goto_start(&suspend_writer_posn); + load_extent_chain(&block_chain); + + return 0; +} + +static int filewriter_read_header_cleanup(void) +{ + free_page((unsigned long) suspend_writer_buffer); + suspend_writer_buffer = NULL; + return 0; +} + +static int filewriter_signature_op(int op) +{ + char *cur; + int result = 0, changed = 0; + + if(target_bdev <= 0) + return -1; + + cur = (char *) get_zeroed_page(GFP_ATOMIC); + if (!cur) { + printk("Unable to allocate a page for reading the image signature.\n"); + return -ENOMEM; + } + + suspend_bio_ops.bdev_page_io(READ, target_bdev, + target_firstblock, + virt_to_page(cur)); + + result = parse_signature(cur); + + switch (op) { + case INVALIDATE: + if (result == -1) + goto out; + + strcpy(cur, NoImage); + cur[resumed_before_byte] = 0; + result = changed = 1; + break; + case MARK_RESUME_ATTEMPTED: + if (result == 1) { + cur[resumed_before_byte] |= 1; + changed = 1; + } + break; + } + + if (changed) + suspend_bio_ops.bdev_page_io(WRITE, target_bdev, + target_firstblock, + virt_to_page(cur)); + +out: + suspend_bio_ops.finish_all_io(); + free_page((unsigned long) cur); + return result; +} + +/* + * workspace_size + * + * Description: + * Returns the number of bytes of RAM needed for this + * code to do its work. (Used when calculating whether + * we have enough memory to be able to suspend & resume). + * + */ +static unsigned long filewriter_memory_needed(void) +{ + return 0; +} + +/* Print debug info + * + * Description: + */ + +static int filewriter_print_debug_stats(char *buffer, int size) +{ + int len = 0; + + if (active_writer != &filewriterops) { + len = snprintf_used(buffer, size, "- Filewriter inactive.\n"); + return len; + } + + len = snprintf_used(buffer, size, "- Filewriter active.\n"); + + len+= snprintf_used(buffer+len, size-len, " Storage available for image: %ld pages.\n", + filewriter_storage_allocated()); + + return len; +} + +/* + * Storage needed + * + * Returns amount of space in the image header required + * for the filewriter's data. + * + * We ensure the space is allocated, but actually save the + * data from write_header_init and therefore don't also define a + * save_config_info routine. + */ +static unsigned long filewriter_storage_needed(void) +{ + return strlen(filewriter_target) + 1; +} + +/* + * filewriter_invalidate_image + * + */ +static int filewriter_invalidate_image(void) +{ + int result; + + if (nr_suspends > 0) + filewriter_release_storage(); + + result = filewriter_signature_op(INVALIDATE); + if (result == 1 && !nr_suspends) + printk(KERN_WARNING name_suspend "Image invalidated.\n"); + + return result; +} + +/* + * Image_exists + * + */ + +static int filewriter_image_exists(void) +{ + return filewriter_signature_op(GET_IMAGE_EXISTS); +} + +/* + * Mark resume attempted. + * + * Record that we tried to resume from this image. + */ + +static void filewriter_mark_resume_attempted(void) +{ + filewriter_signature_op(MARK_RESUME_ATTEMPTED); +} + +static void filewriter_set_resume2(void) +{ + char *buffer = (char *) get_zeroed_page(GFP_ATOMIC); + char *buffer2 = (char *) get_zeroed_page(GFP_ATOMIC); + unsigned long sector = bmap(target_inode, 0); + int offset = 0; + + if (target_bdev) { + set_devinfo(target_bdev, target_inode->i_blkbits); + + bdevname(target_bdev, buffer2); + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + "/dev/%s", buffer2); + + if (sector) + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + ":0x%lx", sector << devinfo.bmap_shift); + } else + offset += snprintf(buffer + offset, PAGE_SIZE - offset, + "%s is not a valid target.", filewriter_target); + + sprintf(resume2_file, "file:%s", buffer); + + free_page((unsigned long) buffer); + free_page((unsigned long) buffer2); + + attempt_to_parse_resume_device(); +} + +static int __test_filewriter_target(char *target, int resume_time) +{ + filewriter_get_target_info(filewriter_target, 0, 0); + if (filewriter_signature_op(GET_IMAGE_EXISTS) > -1) { + printk(name_suspend "Filewriter: File signature found.\n"); + if (!resume_time) + filewriter_set_resume2(); + + suspend_bio_ops.set_devinfo(&devinfo); + suspend_writer_posn.chains = &block_chain; + suspend_writer_posn.num_chains = 1; + + return 0; + } + + if (*filewriter_target) + printk(KERN_ERR name_suspend + "Filewriter: Sorry. No signature found at %s.\n", + filewriter_target); + else + printk(KERN_ERR name_suspend + "Filewriter: Sorry. No signature found.\n"); + + return 1; +} + +static void test_filewriter_target(void) +{ + __test_filewriter_target(filewriter_target, 0); +} + +/* + * Parse Image Location + * + * Attempt to parse a resume2= parameter. + * Swap Writer accepts: + * resume2=file:DEVNAME[:FIRSTBLOCK] + * + * Where: + * DEVNAME is convertable to a dev_t by name_to_dev_t + * FIRSTBLOCK is the location of the first block in the file. + * BLOCKSIZE is the logical blocksize >= SECTOR_SIZE & <= PAGE_SIZE, + * mod SECTOR_SIZE == 0 of the device. + * Data is validated by attempting to read a header from the + * location given. Failure will result in filewriter refusing to + * save an image, and a reboot with correct parameters will be + * necessary. + */ + +static int filewriter_parse_sig_location(char *commandline, int only_writer) +{ + char *thischar, *devstart = NULL, *colon = NULL, *at_symbol = NULL; + int result = -EINVAL, target_blocksize = 0; + + if (strncmp(commandline, "file:", 5)) { + if (!only_writer) + return 1; + } else + commandline += 5; + + /* + * Don't check signature again if we're beginning a cycle. If we already + * did the initialisation successfully, assume we'll be okay when it comes + * to resuming. + */ + if (target_bdev) + return 0; + + devstart = thischar = commandline; + while ((*thischar != ':') && (*thischar != '@') && + ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == ':') { + colon = thischar; + *colon = 0; + thischar++; + } + + while ((*thischar != '@') && ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == '@') { + at_symbol = thischar; + *at_symbol = 0; + } + + if (colon) + target_firstblock = (int) simple_strtoul(colon + 1, NULL, 0); + else + target_firstblock = 0; + + if (at_symbol) { + target_blocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0); + if (target_blocksize & (SECTOR_SIZE - 1)) { + printk("Filewriter: Blocksizes are multiples of %d.\n", SECTOR_SIZE); + result = -EINVAL; + goto out; + } + } + + filewriter_get_target_info(commandline, 0, 1); + + if (!target_bdev || IS_ERR(target_bdev)) { + target_bdev = NULL; + result = -1; + goto out; + } + + if (target_blocksize) + set_devinfo(target_bdev, generic_ffs(target_blocksize)); + + result = __test_filewriter_target(commandline, 1); + +out: + if (colon) + *colon = ':'; + if (at_symbol) + *at_symbol = '@'; + + return result; +} + +/* filewriter_save_config_info + * + * Description: Save the target's name, not for resume time, but for all_settings. + * Arguments: Buffer: Pointer to a buffer of size PAGE_SIZE. + * Returns: Number of bytes used for saving our data. + */ + +static int filewriter_save_config_info(char *buffer) +{ + strcpy(buffer, filewriter_target); + return strlen(filewriter_target) + 1; +} + +/* filewriter_load_config_info + * + * Description: Reload target's name. + * Arguments: Buffer: Pointer to the start of the data. + * Size: Number of bytes that were saved. + */ + +static void filewriter_load_config_info(char *buffer, int size) +{ + strcpy(filewriter_target, buffer); +} + +static int filewriter_initialise(int starting_cycle) +{ + int result = 0; + + if (starting_cycle) { + if (active_writer != &filewriterops) + return 0; + + if (!*filewriter_target) { + printk("Filewriter is the active writer, but no filename has been set.\n"); + return 1; + } + } + + if (filewriter_target) + filewriter_get_target_info(filewriter_target, starting_cycle, 0); + + if (starting_cycle && (filewriter_image_exists() == -1)) { + printk("%s is does not have a valid signature for suspending.\n", + filewriter_target); + result = 1; + } + + return result; +} + +static struct suspend_proc_data filewriter_proc_data[] = { + + { + .filename = "filewriter_target", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .needs_storage_manager = 2, + .data = { + .string = { + .variable = filewriter_target, + .max_length = 256, + } + }, + .write_proc = test_filewriter_target, + }, + + { .filename = "disable_filewriter", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &filewriterops.disabled, + .minimum = 0, + .maximum = 1, + } + }, + .write_proc = attempt_to_parse_resume_device2, + } +}; + +static struct suspend_plugin_ops filewriterops = { + .type = WRITER_PLUGIN, + .name = "File Writer", + .module = THIS_MODULE, + .memory_needed = filewriter_memory_needed, + .print_debug_info = filewriter_print_debug_stats, + .save_config_info = filewriter_save_config_info, + .load_config_info = filewriter_load_config_info, + .storage_needed = filewriter_storage_needed, + .initialise = filewriter_initialise, + .cleanup = filewriter_cleanup, + + .ops = { + .writer = { + .storage_available = filewriter_storage_available, + .storage_allocated = filewriter_storage_allocated, + .release_storage = filewriter_release_storage, + .allocate_header_space = filewriter_allocate_header_space, + .allocate_storage = filewriter_allocate_storage, + .image_exists = filewriter_image_exists, + .mark_resume_attempted = filewriter_mark_resume_attempted, + .write_header_init = filewriter_write_header_init, + .write_header_cleanup = filewriter_write_header_cleanup, + .read_header_init = filewriter_read_header_init, + .read_header_cleanup = filewriter_read_header_cleanup, + .invalidate_image = filewriter_invalidate_image, + .parse_sig_location = filewriter_parse_sig_location, + } + } +}; + +/* ---- Registration ---- */ +static __init int filewriter_load(void) +{ + int result; + int i, numfiles = sizeof(filewriter_proc_data) / sizeof(struct suspend_proc_data); + + printk("Suspend2 FileWriter loading.\n"); + + filewriterops.read_init = suspend_bio_ops.read_init; + filewriterops.ops.writer.read_chunk = suspend_bio_ops.read_chunk; + filewriterops.read_cleanup = suspend_bio_ops.read_cleanup; + filewriterops.write_init = suspend_bio_ops.write_init; + filewriterops.ops.writer.write_chunk = suspend_bio_ops.write_chunk; + filewriterops.write_cleanup = suspend_bio_ops.write_cleanup; + filewriterops.ops.writer.read_header_chunk = + suspend_bio_ops.read_header_chunk; + filewriterops.ops.writer.write_header_chunk = + suspend_bio_ops.write_header_chunk; + + if (!(result = suspend_register_plugin(&filewriterops))) { + for (i=0; i< numfiles; i++) + suspend_register_procfile(&filewriter_proc_data[i]); + } else + printk("Suspend2 FileWriter unable to register!\n"); + + return result; +} + +#ifdef MODULE +static __exit void filewriter_unload(void) +{ + int i, numfiles = sizeof(filewriter_proc_data) / sizeof(struct suspend_proc_data); + + printk("Suspend2 FileWriter unloading.\n"); + + for (i=0; i< numfiles; i++) + suspend_unregister_procfile(&filewriter_proc_data[i]); + suspend_unregister_plugin(&filewriterops); +} + +module_init(filewriter_load); +module_exit(filewriter_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 filewriter"); +#else +late_initcall(filewriter_load); +#endif diff -urN oldtree/kernel/power/suspend_swap.c newtree/kernel/power/suspend_swap.c --- oldtree/kernel/power/suspend_swap.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/suspend_swap.c 2006-02-13 14:51:54.214923336 -0500 @@ -0,0 +1,1238 @@ +/* + * Swapwriter.c + * + * Copyright 2004-2005 Nigel Cunningham + * + * Distributed under GPLv2. + * + * This file encapsulates functions for usage of swap space as a + * backing store. + */ + +#include +#include +#include +#include +#include + +#include "suspend2.h" +#include "suspend2_common.h" +#include "version.h" +#include "proc.h" +#include "plugins.h" +#include "io.h" +#include "ui.h" +#include "extent.h" +#include "block_io.h" + +static struct suspend_plugin_ops swapwriterops; + +#define SIGNATURE_VER 6 + +/* --- Struct of pages stored on disk */ + +union diskpage { + union swap_header swh; /* swh.magic is the only member used */ +}; + +union p_diskpage { + union diskpage *pointer; + char *ptr; + unsigned long address; +}; + +/* Devices used for swap */ +static struct suspend2_bdev_info devinfo[MAX_SWAPFILES]; + +/* Extent chains for swap & blocks */ +struct extent_chain swapextents; +struct extent_chain block_chain[MAX_SWAPFILES]; + +static dev_t header_dev_t; +static struct block_device *header_block_device; +static unsigned long headerblock; + +/* For swapfile automatically swapon/off'd. */ +static char swapfilename[SWAP_FILENAME_MAXLENGTH] = ""; +extern asmlinkage long sys_swapon(const char *specialfile, int swap_flags); +extern asmlinkage long sys_swapoff(const char *specialfile); +static int suspend_swapon_status; + +/* Header Page Information */ +static int header_pages_allocated; + +/* User Specified Parameters. */ + +static unsigned long resume_firstblock; +static int resume_blocksize; +static dev_t resume_dev_t; +static struct block_device *resume_block_device; + +struct sysinfo swapinfo; +static int swapwriter_invalidate_image(void); + +/* Block devices open. */ +struct bdev_opened +{ + dev_t device; + struct block_device *bdev; + int set_swapinfo; + int claimed; +}; + +/* + * Entry MAX_SWAPFILES is the resume block device, which may + * not be a swap device enabled when we suspend. + * Entry MAX_SWAPFILES + 1 is the header block device, which + * is needed before we find out which slot it occupies. + */ +static struct bdev_opened *bdev_info_list[MAX_SWAPFILES + 2]; + +static void close_bdev(int i) +{ + struct bdev_opened *this = bdev_info_list[i]; + + if (this->claimed) + bd_release(this->bdev); + + /* Release our reference. */ + blkdev_put(this->bdev); + + if (this->set_swapinfo) + swap_info[i].bdev = NULL; + + /* Free our info. */ + kfree(this); + + bdev_info_list[i] = NULL; +} + +static void close_bdevs(void) +{ + int i; + + for (i = 0; i < MAX_SWAPFILES; i++) + if (bdev_info_list[i]) + close_bdev(i); + + resume_block_device = header_block_device = NULL; +} + +static struct block_device *open_bdev(int index, dev_t device) +{ + struct bdev_opened *this; + struct block_device *bdev; + + if (bdev_info_list[index] && (bdev_info_list[index]->device == device)) { + bdev = bdev_info_list[index]->bdev; + return bdev; + } + + if (bdev_info_list[index] && bdev_info_list[index]->device != device) + close_bdev(index); + + bdev = open_by_devnum(device, FMODE_READ); + + if (IS_ERR(bdev) || !bdev) { + if (suspend_early_boot_message(1,SUSPEND_CONTINUE_REQ, + "Failed to get access to block device " + "%d.\n You could be " + "booting with a 2.6 kernel when you " + "suspended a 2.4 kernel."), device) + swapwriter_invalidate_image(); + return ERR_PTR(-EINVAL); + } + + this = kmalloc(sizeof(struct bdev_opened), GFP_KERNEL); + BUG_ON(!this); + + bdev_info_list[index] = this; + this->device = device; + this->bdev = bdev; + if ((index < MAX_SWAPFILES) && !swap_info[index].bdev) { + this->set_swapinfo = 1; + devinfo[index].bdev = swap_info[index].bdev = bdev; + } + + return bdev; +} + +/* Must be silent - might be called from cat /proc/suspend/debug_info + * Returns 0 if was off, -EBUSY if was on, error value otherwise. + */ +static int enable_swapfile(void) +{ + int activateswapresult = -EINVAL; + + if (suspend_swapon_status) + return 0; + + if (swapfilename[0]) { + /* Attempt to swap on with maximum priority */ + activateswapresult = sys_swapon(swapfilename, 0xFFFF); + if ((activateswapresult) && (activateswapresult != -EBUSY)) + printk(name_suspend + "The swapfile/partition specified by " + "/proc/suspend/swapfile (%s) could not" + " be turned on (error %d). Attempting " + "to continue.\n", + swapfilename, activateswapresult); + if (!activateswapresult) + suspend_swapon_status = 1; + } + return activateswapresult; +} + +/* Returns 0 if was on, -EINVAL if was off, error value otherwise */ +static int disable_swapfile(void) +{ + int result = -EINVAL; + + if (!suspend_swapon_status) + return 0; + + if (swapfilename[0]) { + result = sys_swapoff(swapfilename); + if (result == -EINVAL) + return 0; /* Wasn't on */ + if (!result) + suspend_swapon_status = 0; + } + + return result; +} + +static int try_to_parse_resume_device(char *commandline) +{ + struct kstat stat; + int error; + + resume_dev_t = name_to_dev_t(commandline); + + if (!resume_dev_t) { + error = vfs_stat(commandline, &stat); + if (!error) + resume_dev_t = stat.rdev; + } + + if (!resume_dev_t) { + if (test_suspend_state(SUSPEND_TRYING_TO_RESUME)) + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Failed to translate \"%s\" into a device id.\n", + commandline); + else + printk(name_suspend + "Can't translate \"%s\" into a device id yet.\n", + commandline); + return 1; + } + + if (IS_ERR(resume_block_device = + open_bdev(MAX_SWAPFILES, resume_dev_t))) { + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Failed to get access to \"%s\", where" + " the swap header should be found.", + commandline); + return 1; + } + + return 0; +} + +/* + * If we have read part of the image, we might have filled memory with + * data that should be zeroed out. + */ +static void swapwriter_noresume_reset(void) +{ + memset((char *) &devinfo, 0, sizeof(devinfo)); + close_bdevs(); +} + +static int parse_signature(char *header, int restore) +{ + int type = -1; + + if (!memcmp("SWAP-SPACE",header,10)) + return 0; + else if (!memcmp("SWAPSPACE2",header,10)) + return 1; + + else if (!memcmp("S1SUSP",header,6)) + type = 4; + else if (!memcmp("S2SUSP",header,6)) + type = 5; + + else if (!memcmp("z",header,1)) + type = 12; + else if (!memcmp("Z",header,1)) + type = 13; + + /* + * Put bdev of suspend header in last byte of swap header + * (unsigned short) + */ + if (type > 11) { + dev_t *header_ptr = (dev_t *) &header[1]; + unsigned char *headerblocksize_ptr = + (unsigned char *) &header[5]; + u32 *headerblock_ptr = (u32 *) &header[6]; + header_dev_t = *header_ptr; + /* + * We are now using the highest bit of the char to indicate + * whether we have attempted to resume from this image before. + */ + clear_suspend_state(SUSPEND_RESUMED_BEFORE); + if (((int) *headerblocksize_ptr) & 0x80) + set_suspend_state(SUSPEND_RESUMED_BEFORE); + headerblock = (unsigned long) *headerblock_ptr; + } + + if ((restore) && (type > 5)) { + /* We only reset our own signatures */ + if (type & 1) + memcpy(header,"SWAPSPACE2",10); + else + memcpy(header,"SWAP-SPACE",10); + } + + return type; +} + +/* + * prepare_signature + */ + +static int prepare_signature(dev_t bdev, unsigned long block, + char *current_header) +{ + int current_type = parse_signature(current_header, 0); + dev_t *header_ptr = (dev_t *) (¤t_header[1]); + unsigned long *headerblock_ptr = + (unsigned long *) (¤t_header[6]); + + if ((current_type > 1) && (current_type < 6)) + return 1; + + /* At the moment, I don't have a way to handle the block being + * > 32 bits. Not enough room in the signature and no way to + * safely put the data elsewhere. */ + + if (BITS_PER_LONG == 64 && ffs(block) > 31) { + suspend2_prepare_status(DONT_CLEAR_BAR, + "Header sector requires 33+ bits. " + "Would not be able to resume."); + return 1; + } + + if (current_type & 1) + current_header[0] = 'Z'; + else + current_header[0] = 'z'; + *header_ptr = bdev; + /* prev is the first/last swap page of the resume area */ + *headerblock_ptr = (unsigned long) block; + return 0; +} + +static int swapwriter_allocate_storage(int space_requested); + +static int swapwriter_allocate_header_space(int space_requested) +{ + int i; + + if (!swapextents.size) + swapwriter_allocate_storage(space_requested); + + extent_state_goto_start(&suspend_writer_posn); + + for (i = 0; i < space_requested; i++) { + if (suspend_bio_ops.forward_one_page()) { + printk("Out of space while seeking to allocate header pages,\n"); + return -ENOSPC; + } + + header_pages_allocated++; + } + + /* The end of header pages will be the start of pageset 2 */ + extent_state_save(&suspend_writer_posn, &suspend_writer_posn_save[2]); + return 0; +} + +static void get_main_pool_phys_params(void) +{ + struct extent *extentpointer = NULL; + unsigned long address; + int i, extent_min = -1, extent_max = -1, last_chain = -1; + int prev_header_pages_allocated; + + for (i = 0; i < MAX_SWAPFILES; i++) + if (block_chain[i].first) + put_extent_chain(&block_chain[i]); + + extent_for_each(&swapextents, extentpointer, address) { + swp_entry_t swap_address = extent_val_to_swap_entry(address); + unsigned swapfilenum = swp_type(swap_address); + pgoff_t offset = swp_offset(swap_address); + struct swap_info_struct *sis = get_swap_info_struct(swapfilenum); + sector_t new_sector = map_swap_page(sis, offset); + + if ((new_sector == extent_max + 1) && + (last_chain == swapfilenum)) + extent_max++; + else { + if (extent_min > -1) { + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent %d-%d.\n", + extent_min << + devinfo[last_chain].bmap_shift, + extent_max << + devinfo[last_chain].bmap_shift); + + append_extent_to_extent_chain( + &block_chain[last_chain], + extent_min, extent_max); + } + extent_min = extent_max = new_sector; + last_chain = swapfilenum; + } + } + + if (extent_min > -1) { + if (test_action_state(SUSPEND_TEST_BIO)) + printk("Adding extent %d-%d.\n", + extent_min << + devinfo[last_chain].bmap_shift, + extent_max << + devinfo[last_chain].bmap_shift); + append_extent_to_extent_chain( + &block_chain[last_chain], + extent_min, extent_max); + } + + prev_header_pages_allocated = header_pages_allocated; + header_pages_allocated = 0; + swapwriter_allocate_header_space(prev_header_pages_allocated); +} + +static int swapwriter_storage_allocated(void) +{ + return swapextents.size; +} + +static int swapwriter_storage_available(void) +{ + int result; + si_swapinfo(&swapinfo); + result = swapinfo.freeswap + swapwriter_storage_allocated(); + return result; +} + +static int swapwriter_initialise(int starting_cycle) +{ + if (starting_cycle) + enable_swapfile(); + + if (resume_dev_t && !resume_block_device && + IS_ERR(resume_block_device = + open_bdev(MAX_SWAPFILES, resume_dev_t))) + return 1; + + return 0; +} + +static void swapwriter_cleanup(int ending_cycle) +{ + if (ending_cycle) + disable_swapfile(); + + close_bdevs(); +} + +static int swapwriter_release_storage(void) +{ + int i = 0; + + if ((test_action_state(SUSPEND_KEEP_IMAGE)) && + test_suspend_state(SUSPEND_NOW_RESUMING)) + return 0; + + header_pages_allocated = 0; + + if (swapextents.first) { + /* Free swap entries */ + struct extent *extentpointer; + unsigned long extentvalue, start = 0, last = 0; + int first = 1; + swp_entry_t entry; + extent_for_each(&swapextents, extentpointer, + extentvalue) { + if (first) { + start = last = extentvalue; + first = 0; + } else { + if (last + 1 == extentvalue) + last++; + else + start = last = extentvalue; + } + entry = extent_val_to_swap_entry(extentvalue); + swap_free(entry); + } + + put_extent_chain(&swapextents); + + for (i = 0; i < MAX_SWAPFILES; i++) + if (block_chain[i].first) + put_extent_chain(&block_chain[i]); + } + + return 0; +} + +static int swapwriter_allocate_storage(int space_requested) +{ + int i, result = 0, first = 1; + int pages_to_get = space_requested - swapextents.size; + unsigned long extent_min = 0, extent_max = 0; + + if (pages_to_get < 1) + return 0; + + for (i=0; i < MAX_SWAPFILES; i++) { + if ((devinfo[i].bdev = swap_info[i].bdev)) + devinfo[i].dev_t = swap_info[i].bdev->bd_dev; + devinfo[i].bmap_shift = 3; + devinfo[i].blocks_per_page = 1; + } + + for(i=0; i < pages_to_get; i++) { + swp_entry_t entry; + unsigned long new_value; + + entry = get_swap_page(); + if (!entry.val) { + printk("Failed to get a swap page.\n"); + result = -ENOSPC; + break; + } + + new_value = swap_entry_to_extent_val(entry); + if (first) { + first = 0; + extent_min = extent_max = new_value; + } else { + if (new_value == extent_max + 1) + extent_max++; + else { + append_extent_to_extent_chain( + &swapextents, + extent_min, extent_max); + extent_min = extent_max = new_value; + } + } + } + + append_extent_to_extent_chain( + &swapextents, + extent_min, extent_max); + + get_main_pool_phys_params(); + return result; +} + +static int swapwriter_write_header_init(void) +{ + int i, result; + + extent_state_goto_start(&suspend_writer_posn); + /* Forward one page will be done prior to the read */ + + for (i = 0; i < MAX_SWAPFILES; i++) + if (swap_info[i].swap_file) + devinfo[i].dev_t = swap_info[i].bdev->bd_dev; + else + devinfo[i].dev_t = (dev_t) 0; + + suspend_writer_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + if (!suspend_writer_buffer) { + printk("Failed to get swapwriter buffer.\n"); + return -ENOMEM; + } + + suspend_writer_buffer_posn = 0; + + /* Info needed to bootstrap goes at the start of the header. + * First we save the positions and devinfo, including the number + * of header pages. Then we save the structs containing data needed + * for reading the header pages back. + * Note that even if header pages take more than one page, when we + * read back the info, we will have restored the location of the + * next header page by the time we go to use it. + */ + if ((result = suspend_bio_ops.write_header_chunk((char *) &suspend_writer_posn_save, + sizeof(suspend_writer_posn_save)))) + return result; + + if ((result = suspend_bio_ops.write_header_chunk((char *) &devinfo, + sizeof(devinfo)))) + return result; + + for (i=0; i < MAX_SWAPFILES; i++) + serialise_extent_chain(&block_chain[i]); + + return 0; +} + +static int swapwriter_write_header_cleanup(void) +{ + int result; + + /* Write any unsaved data */ + if (suspend_writer_buffer_posn) { + struct submit_params submit_params; + int current_chain; + + if (suspend_bio_ops.forward_one_page()) + return -EIO; + + current_chain = suspend_writer_posn.current_chain; + submit_params.readahead_index = -1; + submit_params.dev = swap_info[suspend_writer_posn.current_chain].bdev; + submit_params.block[0] = suspend_writer_posn.current_offset << + devinfo[current_chain].bmap_shift; + submit_params.page = virt_to_page(suspend_writer_buffer); + + suspend_bio_ops.submit_io(WRITE, &submit_params, 0); + } + + extent_state_goto_start(&suspend_writer_posn); + suspend_bio_ops.forward_one_page(); + + /* Adjust swap header */ + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(suspend_writer_buffer)); + + result = prepare_signature(swap_info[suspend_writer_posn.current_chain].bdev->bd_dev, + suspend_writer_posn.current_offset, + ((union swap_header *) suspend_writer_buffer)->magic.magic); + + if (!result) + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(suspend_writer_buffer)); + + free_page((unsigned long) suspend_writer_buffer); + suspend_writer_buffer = NULL; + + suspend_bio_ops.finish_all_io(); + + return result; +} + +/* ------------------------- HEADER READING ------------------------- */ + +/* + * read_header_init() + * + * Description: + * 1. Attempt to read the device specified with resume2=. + * 2. Check the contents of the swap header for our signature. + * 3. Warn, ignore, reset and/or continue as appropriate. + * 4. If continuing, read the swapwriter configuration section + * of the header and set up block device info so we can read + * the rest of the header & image. + * + * Returns: + * May not return if user choose to reboot at a warning. + * -EINVAL if cannot resume at this time. Booting should continue + * normally. + */ + +static int swapwriter_read_header_init(void) +{ + int i; + + BUG_ON(!resume_block_device); + BUG_ON(!resume_dev_t); + + suspend_writer_buffer = (char *) get_zeroed_page(GFP_ATOMIC); + + BUG_ON(!suspend_writer_buffer); + + if (!header_dev_t) { + printk("read_header_init called when we haven't " + "verified there is an image!\n"); + return -EINVAL; + } + + /* + * If the header is not on the resume_dev_t, get the resume device first. + */ + if (header_dev_t != resume_dev_t) { + header_block_device = open_bdev(MAX_SWAPFILES + 1, + header_dev_t); + + if (IS_ERR(header_block_device)) + return PTR_ERR(header_block_device); + } else + header_block_device = resume_block_device; + + /* + * Read swapwriter configuration. + * Headerblock size taken into account already. + */ + suspend_bio_ops.bdev_page_io(READ, header_block_device, + headerblock << 3, + virt_to_page((unsigned long) suspend_writer_buffer)); + + memcpy(&suspend_writer_posn_save, suspend_writer_buffer, 3 * sizeof(struct extent_iterate_saved_state)); + + suspend_writer_buffer_posn = 3 * sizeof(struct extent_iterate_saved_state); + + memcpy(&devinfo, suspend_writer_buffer + suspend_writer_buffer_posn, sizeof(devinfo)); + + suspend_writer_buffer_posn += sizeof(devinfo); + + /* Restore device info */ + for (i = 0; i < MAX_SWAPFILES; i++) { + dev_t thisdevice = devinfo[i].dev_t; + struct block_device *result; + + devinfo[i].bdev = swap_info[i].bdev = NULL; + + if (!thisdevice) + continue; + + if (thisdevice == resume_dev_t) { + devinfo[i].bdev = swap_info[i].bdev = resume_block_device; + bdev_info_list[i] = bdev_info_list[MAX_SWAPFILES]; + BUG_ON(!bdev_info_list[i]); + bdev_info_list[i]->set_swapinfo = 1; + bdev_info_list[MAX_SWAPFILES] = NULL; + continue; + } + + if (thisdevice == header_dev_t) { + devinfo[i].bdev = swap_info[i].bdev = header_block_device; + bdev_info_list[i] = bdev_info_list[MAX_SWAPFILES + 1]; + BUG_ON(!bdev_info_list[i]); + bdev_info_list[i]->set_swapinfo = 1; + bdev_info_list[MAX_SWAPFILES + 1] = NULL; + continue; + } + + result = open_bdev(i, thisdevice); + if (IS_ERR(result)) { + close_bdevs(); + return PTR_ERR(result); + } + } + + extent_state_goto_start(&suspend_writer_posn); + *(suspend_bio_ops.need_extra_next) = 1; + + for (i = 0; i < MAX_SWAPFILES; i++) + load_extent_chain(&block_chain[i]); + + return 0; +} + +static int swapwriter_read_header_cleanup(void) +{ + free_page((unsigned long) suspend_writer_buffer); + return 0; +} + +/* swapwriter_invalidate_image + * + */ +static int swapwriter_invalidate_image(void) +{ + union p_diskpage cur; + int result = 0; + char newsig[11]; + + cur.address = get_zeroed_page(GFP_ATOMIC); + if (!cur.address) { + printk("Unable to allocate a page for restoring the swap signature.\n"); + return -ENOMEM; + } + + /* + * If nr_suspends == 0, we must be booting, so no swap pages + * will be recorded as used yet. + */ + + if (nr_suspends > 0) + swapwriter_release_storage(); + + /* + * We don't do a sanity check here: we want to restore the swap + * whatever version of kernel made the suspend image. + * + * We need to write swap, but swap may not be enabled so + * we write the device directly + */ + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(cur.pointer)); + + result = parse_signature(cur.pointer->swh.magic.magic, 1); + + if (result < 4) + goto out; + + strncpy(newsig, cur.pointer->swh.magic.magic, 10); + newsig[10] = 0; + + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(cur.pointer)); + + if (!nr_suspends) + printk(KERN_WARNING name_suspend "Image invalidated.\n"); +out: + suspend_bio_ops.finish_all_io(); + free_page(cur.address); + return 0; +} + +/* + * workspace_size + * + * Description: + * Returns the number of bytes of RAM needed for this + * code to do its work. (Used when calculating whether + * we have enough memory to be able to suspend & resume). + * + */ +static unsigned long swapwriter_memory_needed(void) +{ + return 1; +} + +/* Print debug info + * + * Description: + */ + +static int swapwriter_print_debug_stats(char *buffer, int size) +{ + int len = 0; + struct sysinfo sysinfo; + + if (active_writer != &swapwriterops) { + len = snprintf_used(buffer, size, "- Swapwriter inactive.\n"); + return len; + } + + len = snprintf_used(buffer, size, "- Swapwriter active.\n"); + if (swapfilename[0]) + len+= snprintf_used(buffer+len, size-len, + " Attempting to automatically swapon: %s.\n", swapfilename); + + si_swapinfo(&sysinfo); + + len+= snprintf_used(buffer+len, size-len, " Swap available for image: %ld pages.\n", + sysinfo.freeswap + swapwriter_storage_allocated()); + + return len; +} + +/* + * Storage needed + * + * Returns amount of space in the swap header required + * for the swapwriter's data. This ignores the links between + * pages, which we factor in when allocating the space. + * + * We ensure the space is allocated, but actually save the + * data from write_header_init and therefore don't also define a + * save_config_info routine. + */ +static unsigned long swapwriter_storage_needed(void) +{ + return sizeof(suspend_writer_posn_save) + sizeof(devinfo); +} + +/* + * Image_exists + */ + +static int swapwriter_image_exists(void) +{ + int signature_found; + union p_diskpage diskpage; + + if (!resume_dev_t) { + printk("Not even trying to read header " + "because resume_dev_t is not set.\n"); + return 0; + } + + if (!resume_block_device && + IS_ERR(open_bdev(MAX_SWAPFILES, resume_dev_t))) + return 0; + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + suspend_bio_ops.finish_all_io(); + + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + free_page(diskpage.address); + + if (signature_found < 2) { + return 0; /* Normal swap space */ + } else if (signature_found == -1) { + printk(KERN_ERR name_suspend + "Unable to find a signature. Could you have moved " + "a swap file?\n"); + return 0; + } else if (signature_found < 6) { + if ((!(test_suspend_state(SUSPEND_NORESUME_SPECIFIED))) + && suspend_early_boot_message(1, + SUSPEND_CONTINUE_REQ, + "Detected the signature of an alternate " + "implementation.\n")) + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + return 0; + } else if ((signature_found >> 1) != SIGNATURE_VER) { + if ((!(test_suspend_state(SUSPEND_NORESUME_SPECIFIED))) && + suspend_early_boot_message(1, SUSPEND_CONTINUE_REQ, + "Found a different style suspend image signature.")) + set_suspend_state(SUSPEND_NORESUME_SPECIFIED); + } + + return 1; +} + +/* + * Mark resume attempted. + * + * Record that we tried to resume from this image. + */ + +static void swapwriter_mark_resume_attempted(void) +{ + union p_diskpage diskpage; + int signature_found; + + if (!resume_dev_t) { + printk("Not even trying to record attempt at resuming" + " because resume_dev_t is not set.\n"); + return; + } + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + + suspend_bio_ops.bdev_page_io(READ, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + + switch (signature_found) { + case 12: + case 13: + diskpage.pointer->swh.magic.magic[5] |= 0x80; + break; + } + + suspend_bio_ops.bdev_page_io(WRITE, resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + suspend_bio_ops.finish_all_io(); + free_page(diskpage.address); + + close_bdevs(); + return; +} + +/* + * Parse Image Location + * + * Attempt to parse a resume2= parameter. + * Swap Writer accepts: + * resume2=swap:DEVNAME[:FIRSTBLOCK][@BLOCKSIZE] + * + * Where: + * DEVNAME is convertable to a dev_t by name_to_dev_t + * FIRSTBLOCK is the location of the first block in the swap file + * (specifying for a swap partition is nonsensical but not prohibited). + * Data is validated by attempting to read a swap header from the + * location given. Failure will result in swapwriter refusing to + * save an image, and a reboot with correct parameters will be + * necessary. + */ + +static int swapwriter_parse_sig_location(char *commandline, int only_writer) +{ + char *thischar, *devstart, *colon = NULL, *at_symbol = NULL; + union p_diskpage diskpage; + int signature_found, result = -EINVAL, temp_result; + + if (strncmp(commandline, "swap:", 5)) { + if (!only_writer) + return 1; + } else + commandline += 5; + + devstart = thischar = commandline; + while ((*thischar != ':') && (*thischar != '@') && + ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == ':') { + colon = thischar; + *colon = 0; + thischar++; + } + + while ((*thischar != '@') && ((thischar - commandline) < 250) && (*thischar)) + thischar++; + + if (*thischar == '@') { + at_symbol = thischar; + *at_symbol = 0; + } + + if (colon) + resume_firstblock = (int) simple_strtoul(colon + 1, NULL, 0); + else + resume_firstblock = 0; + + /* Legacy */ + if (at_symbol) { + resume_blocksize = (int) simple_strtoul(at_symbol + 1, NULL, 0); + if (resume_blocksize & (SECTOR_SIZE - 1)) { + printk("Swapwriter: Blocksizes are multiples of %d!\n", SECTOR_SIZE); + return -EINVAL; + } + resume_firstblock = resume_firstblock * (resume_blocksize / SECTOR_SIZE); + } + + temp_result = try_to_parse_resume_device(devstart); + + if (colon) + *colon = ':'; + if (at_symbol) + *at_symbol = '@'; + + if (temp_result) + return -EINVAL; + + diskpage.address = get_zeroed_page(GFP_ATOMIC); + if (!diskpage.address) { + printk(KERN_ERR name_suspend "Swapwriter: Failed to allocate a diskpage for I/O.\n"); + return -ENOMEM; + } + + temp_result = suspend_bio_ops.bdev_page_io(READ, + resume_block_device, + resume_firstblock, + virt_to_page(diskpage.ptr)); + + suspend_bio_ops.finish_all_io(); + + if (temp_result) { + printk(KERN_ERR name_suspend "Swapwriter: Failed to submit I/O.\n"); + goto invalid; + } + + signature_found = parse_signature(diskpage.pointer->swh.magic.magic, 0); + + if (signature_found != -1) { + printk(name_suspend "Swapwriter: Signature found.\n"); + result = 0; + + suspend_bio_ops.set_devinfo(devinfo); + suspend_writer_posn.chains = &block_chain[0]; + suspend_writer_posn.num_chains = MAX_SWAPFILES; + } else + printk(KERN_ERR name_suspend "Swapwriter: No swap signature found at specified location.\n"); +invalid: + free_page((unsigned long) diskpage.address); + return result; + +} + +static int header_locations_read_proc(char *page, char **start, off_t off, int count, + int *eof, void *data) +{ + int i, printedpartitionsmessage = 0, len = 0, haveswap = 0; + struct inode *swapf = 0; + int zone; + char *path_page = (char *) __get_free_page(GFP_KERNEL); + char *path; + int path_len; + + *eof = 1; + if (!page) + return 0; + + for (i = 0; i < MAX_SWAPFILES; i++) { + if (!swap_info[i].swap_file) + continue; + + if (S_ISBLK(swap_info[i].swap_file->f_mapping->host->i_mode)) { + haveswap = 1; + if (!printedpartitionsmessage) { + len += sprintf(page + len, + "For swap partitions, simply use the format: resume2=swap:/dev/hda1.\n"); + printedpartitionsmessage = 1; + } + } else { + path_len = 0; + + path = d_path( swap_info[i].swap_file->f_dentry, + swap_info[i].swap_file->f_vfsmnt, + path_page, + PAGE_SIZE); + path_len = snprintf(path_page, 31, "%s", path); + + haveswap = 1; + swapf = swap_info[i].swap_file->f_mapping->host; + if (!(zone = bmap(swapf,0))) { + len+= sprintf(page + len, + "Swapfile %s has been corrupted. Reuse mkswap on it and try again.\n", + path_page); + } else { + char name_buffer[255]; + len+= sprintf(page + len, "For swapfile `%s`, use resume2=swap:/dev/%s:0x%x.\n", + path_page, + bdevname(swap_info[i].bdev, name_buffer), + zone << (swapf->i_blkbits - 9)); + } + + } + } + + if (!haveswap) + len = sprintf(page, "You need to turn on swap partitions before examining this file.\n"); + + free_page((unsigned long) path_page); + return len; +} + +static struct suspend_proc_data swapwriter_proc_data[] = { + { + .filename = "swapfilename", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = swapfilename, + .max_length = 255, + } + } + }, + + { + .filename = "headerlocations", + .permissions = PROC_READONLY, + .type = SUSPEND_PROC_DATA_CUSTOM, + .data = { + .special = { + .read_proc = header_locations_read_proc, + } + } + }, + + { .filename = "disable_swapwriter", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &swapwriterops.disabled, + .minimum = 0, + .maximum = 1, + } + }, + .write_proc = attempt_to_parse_resume_device2, + } +}; + +static struct suspend_plugin_ops swapwriterops = { + .type = WRITER_PLUGIN, + .name = "Swap Writer", + .module = THIS_MODULE, + .memory_needed = swapwriter_memory_needed, + .print_debug_info = swapwriter_print_debug_stats, + .storage_needed = swapwriter_storage_needed, + .initialise = swapwriter_initialise, + .cleanup = swapwriter_cleanup, + + .ops = { + .writer = { + .noresume_reset = swapwriter_noresume_reset, + .storage_available = swapwriter_storage_available, + .storage_allocated = swapwriter_storage_allocated, + .release_storage = swapwriter_release_storage, + .allocate_header_space = swapwriter_allocate_header_space, + .allocate_storage = swapwriter_allocate_storage, + .image_exists = swapwriter_image_exists, + .mark_resume_attempted = swapwriter_mark_resume_attempted, + .write_header_init = swapwriter_write_header_init, + .write_header_cleanup = swapwriter_write_header_cleanup, + .read_header_init = swapwriter_read_header_init, + .read_header_cleanup = swapwriter_read_header_cleanup, + .invalidate_image = swapwriter_invalidate_image, + .parse_sig_location = swapwriter_parse_sig_location, + } + } +}; + +/* ---- Registration ---- */ +static __init int swapwriter_load(void) +{ + int result; + int i, numfiles = sizeof(swapwriter_proc_data) / sizeof(struct suspend_proc_data); + + printk("Suspend2 Swap Writer loading.\n"); + + swapwriterops.read_init = suspend_bio_ops.read_init; + swapwriterops.ops.writer.read_chunk = suspend_bio_ops.read_chunk; + swapwriterops.read_cleanup = suspend_bio_ops.read_cleanup; + swapwriterops.write_init = suspend_bio_ops.write_init; + swapwriterops.ops.writer.write_chunk = suspend_bio_ops.write_chunk; + swapwriterops.write_cleanup = suspend_bio_ops.write_cleanup; + swapwriterops.ops.writer.read_header_chunk = + suspend_bio_ops.read_header_chunk; + swapwriterops.ops.writer.write_header_chunk = + suspend_bio_ops.write_header_chunk; + + if (!(result = suspend_register_plugin(&swapwriterops))) { + + for (i=0; i< numfiles; i++) + suspend_register_procfile(&swapwriter_proc_data[i]); + } else + printk("Suspend2 Swap Writer unable to register!\n"); + return result; +} + +#ifdef MODULE +static __exit void swapwriter_unload(void) +{ + int i, numfiles = sizeof(swapwriter_proc_data) / sizeof(struct suspend_proc_data); + + printk("Suspend2 Swap Writer unloading.\n"); + + for (i=0; i< numfiles; i++) + suspend_unregister_procfile(&swapwriter_proc_data[i]); + suspend_unregister_plugin(&swapwriterops); +} + +module_init(swapwriter_load); +module_exit(swapwriter_unload); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Nigel Cunningham"); +MODULE_DESCRIPTION("Suspend2 swap writer"); +#else +late_initcall(swapwriter_load); +#endif diff -urN oldtree/kernel/power/swsusp.c newtree/kernel/power/swsusp.c --- oldtree/kernel/power/swsusp.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/power/swsusp.c 2006-02-13 14:51:54.215923184 -0500 @@ -49,9 +49,7 @@ #include #include #include -#include #include -#include #include #include #include @@ -72,6 +70,8 @@ #include #include "power.h" +#include "swsusp.h" +#include "suspend.h" #ifdef CONFIG_HIGHMEM int save_highmem(void); diff -urN oldtree/kernel/power/swsusp.h newtree/kernel/power/swsusp.h --- oldtree/kernel/power/swsusp.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/swsusp.h 2006-02-13 14:51:54.215923184 -0500 @@ -0,0 +1,24 @@ + +struct suspend_header { + u32 version_code; + unsigned long num_physpages; + unsigned long orig_mem_free; + char machine[65]; + char version[65]; + int num_cpus; + int page_size; + int pageset_2_size; + int param0; + int param1; + int param2; + int param3; + int progress0; + int progress1; + int progress2; + int progress3; + int io_time[2][2]; + + suspend_pagedir_t *suspend_pagedir; + unsigned int num_pbes; +}; + diff -urN oldtree/kernel/power/ui.c newtree/kernel/power/ui.c --- oldtree/kernel/power/ui.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/ui.c 2006-02-13 14:51:54.216923032 -0500 @@ -0,0 +1,853 @@ +/* + * kernel/power/ui.c + * + * Copyright (C) 1998-2001 Gabor Kuti + * Copyright (C) 1998,2001,2002 Pavel Machek + * Copyright (C) 2002-2003 Florent Chabaud + * Copyright (C) 2002-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for Suspend2's user interface. + * + * The user interface code talks to a userspace program via a + * netlink socket. + * + * The kernel side: + * - starts the userui program; + * - sends text messages and progress bar status; + * + * The user space side: + * - passes messages regarding user requests (abort, toggle reboot etc) + * + */ + +#define __KERNEL_SYSCALLS__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "proc.h" +#include "plugins.h" +#include "suspend2.h" +#include "suspend2_common.h" +#include "ui.h" +#include "version.h" +#include "netlink.h" +#include "power.h" + +static char local_printf_buf[1024]; /* Same as printk - should be safe */ + +#ifdef CONFIG_NET +static struct user_helper_data ui_helper_data; +static struct suspend_plugin_ops userui_ops; +static int orig_loglevel; +static int orig_default_message_loglevel; +static int orig_kmsg; + +static char lastheader[512]; +static int lastheader_message_len = 0; + +/* Number of distinct progress amounts that userspace can display */ +static int progress_granularity = 50; + +DECLARE_WAIT_QUEUE_HEAD(userui_wait_for_key); + +static void ui_nl_set_state(int n) +{ + /* Only let them change certain settings */ + static const int suspend_action_mask = + (1 << SUSPEND_REBOOT) | (1 << SUSPEND_PAUSE) | (1 << SUSPEND_SLOW) | + (1 << SUSPEND_LOGALL) | (1 << SUSPEND_SINGLESTEP) | + (1 << SUSPEND_PAUSE_NEAR_PAGESET_END); + + suspend_action = (suspend_action & (~suspend_action_mask)) | + (n & suspend_action_mask); + + if (!test_action_state(SUSPEND_PAUSE) && + !test_action_state(SUSPEND_SINGLESTEP)) + wake_up_interruptible(&userui_wait_for_key); +} + +void userui_redraw(void) +{ + if (ui_helper_data.pid == -1) + return; + + suspend2_send_netlink_message(&ui_helper_data, + USERUI_MSG_REDRAW, NULL, 0); +} + +/* request_abort_suspend + * + * Description: Handle the user requesting the cancellation of a suspend by + * pressing escape. + * Callers: Invoked from a netlink packet from userspace when the user presses + * escape. + */ +void request_abort_suspend(void) +{ + if (test_suspend_state(SUSPEND_NOW_RESUMING) || (test_result_state(SUSPEND_ABORT_REQUESTED))) + return; + + suspend2_prepare_status(CLEAR_BAR, "--- ESCAPE PRESSED :" + " ABORTING PROCESS ---"); + set_result_state(SUSPEND_ABORTED); + set_result_state(SUSPEND_ABORT_REQUESTED); + + wake_up_interruptible(&userui_wait_for_key); +} + +static int userui_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) +{ + int type; + int *data; + + type = nlh->nlmsg_type; + + /* A control message: ignore them */ + if (type < NETLINK_MSG_BASE) + return 0; + + /* Unknown message: reply with EINVAL */ + if (type >= USERUI_MSG_MAX) + return -EINVAL; + + /* All operations require privileges, even GET */ + if (security_netlink_recv(skb)) + return -EPERM; + + /* Only allow one task to receive NOFREEZE privileges */ + if (type == NETLINK_MSG_NOFREEZE_ME && ui_helper_data.pid != -1) + return -EBUSY; + + data = (int*)NLMSG_DATA(nlh); + + switch (type) { + case USERUI_MSG_ABORT: + request_abort_suspend(); + break; + case USERUI_MSG_GET_STATE: + suspend2_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_STATE, &suspend_action, + sizeof(suspend_action)); + break; + case USERUI_MSG_GET_DEBUG_STATE: + suspend2_send_netlink_message(&ui_helper_data, + USERUI_MSG_GET_DEBUG_STATE, + &suspend_debug_state, + sizeof(suspend_debug_state)); + break; + case USERUI_MSG_SET_STATE: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + ui_nl_set_state(*data); + break; + case USERUI_MSG_SET_DEBUG_STATE: + if (nlh->nlmsg_len < NLMSG_LENGTH(sizeof(int))) + return -EINVAL; + suspend_debug_state = (*data); + break; + case USERUI_MSG_SPACE: + wake_up_interruptible(&userui_wait_for_key); + break; + } + + return 1; +} + +static unsigned long userui_storage_needed(void) +{ + return sizeof(ui_helper_data.program); +} + +static int userui_save_config_info(char *buf) +{ + *((int *) buf) = progress_granularity; + memcpy(buf + sizeof(int), ui_helper_data.program, sizeof(ui_helper_data.program)); + return sizeof(ui_helper_data.program) + sizeof(int); +} + +static void userui_load_config_info(char *buf, int size) +{ + /* Don't load the saved path if one has already been set */ + if (ui_helper_data.program[0]) + return; + + progress_granularity = *((int *) buf); + size -= sizeof(int); + + if (size > sizeof(ui_helper_data.program)) + size = sizeof(ui_helper_data.program); + + memcpy(ui_helper_data.program, buf + sizeof(int), size); + ui_helper_data.program[sizeof(ui_helper_data.program)-1] = '\0'; +} + +static unsigned long userui_memory_needed(void) +{ + /* ball park figure of 128 pages */ + return (128 * PAGE_SIZE); +} + +unsigned long userui_update_progress(unsigned long value, unsigned long maximum, + const char *fmt, va_list args) +{ + static int last_step = -1; + struct userui_msg_params msg; + int bitshift; + int this_step; + unsigned long next_update; + + if (ui_helper_data.pid == -1) + return 0; + + if ((!maximum) || (!progress_granularity)) + return maximum; + + if (value < 0) + value = 0; + + if (value > maximum) + value = maximum; + + /* Try to avoid math problems - we can't do 64 bit math here + * (and shouldn't need it - anyone got screen resolution + * of 65536 pixels or more?) */ + bitshift = generic_fls(maximum) - 16; + if (bitshift > 0) { + unsigned long temp_maximum = maximum >> bitshift; + unsigned long temp_value = value >> bitshift; + this_step = (int) + (temp_value * progress_granularity / temp_maximum); + next_update = (((this_step + 1) * temp_maximum / + progress_granularity) + 1) << bitshift; + } else { + this_step = (int) (value * progress_granularity / maximum); + next_update = ((this_step + 1) * maximum / + progress_granularity) + 1; + } + + if (this_step == last_step) + return next_update; + + memset(&msg, 0, sizeof(msg)); + + msg.a = this_step; + msg.b = progress_granularity; + + if (fmt) { + vsnprintf(msg.text, sizeof(msg.text), fmt, args); + msg.text[sizeof(msg.text)-1] = '\0'; + } + + suspend2_send_netlink_message(&ui_helper_data, USERUI_MSG_PROGRESS, + &msg, sizeof(msg)); + last_step = this_step; + + return next_update; +} + +/* __suspend_message. + * + * Description: This function is intended to do the same job as printk, but + * without normally logging what is printed. The point is to be + * able to get debugging info on screen without filling the logs + * with "1/534. ^M 2/534^M. 3/534^M" + * + * It may be called from an interrupt context - can't sleep! + * + * Arguments: int mask: The debugging section(s) this message belongs to. + * int level: The level of verbosity of this message. + * int restartline: Whether to output a \r or \n with this line + * (\n if we're logging all output). + * const char *fmt, ...: Message to be displayed a la printk. + */ +void __suspend_message(unsigned long section, unsigned long level, + int normally_logged, + const char *fmt, ...) +{ + struct userui_msg_params msg; + + va_list args; + + if ((level) && (level > console_loglevel)) + return; + + if (ui_helper_data.pid == -1) + return; + + memset(&msg, 0, sizeof(msg)); + + msg.a = section; + msg.b = level; + msg.c = normally_logged; + + if (fmt) { + va_start(args, fmt); + vsnprintf(msg.text, sizeof(msg.text), fmt, args); + va_end(args); + msg.text[sizeof(msg.text)-1] = '\0'; + } + + if (test_action_state(SUSPEND_LOGALL)) + printk("%s\n", msg.text); + + suspend2_send_netlink_message(&ui_helper_data, USERUI_MSG_MESSAGE, + &msg, sizeof(msg)); +} + +static void wait_for_key_via_userui(void) +{ + DECLARE_WAITQUEUE(wait, current); + + add_wait_queue(&userui_wait_for_key, &wait); + set_current_state(TASK_INTERRUPTIBLE); + + interruptible_sleep_on(&userui_wait_for_key); + + set_current_state(TASK_RUNNING); + remove_wait_queue(&userui_wait_for_key, &wait); +} + +char suspend_wait_for_keypress(int timeout) +{ + int fd; + char key = '\0'; + struct termios t, t_backup; + + if (ui_helper_data.pid != -1) { + wait_for_key_via_userui(); + key = ' '; + goto out; + } + + /* We should be guaranteed /dev/console exists after populate_rootfs() in + * init/main.c + */ + if ((fd = sys_open("/dev/console", O_RDONLY, 0)) < 0) { + printk("Couldn't open /dev/console.\n"); + goto out; + } + + if (sys_ioctl(fd, TCGETS, (long)&t) < 0) + goto out_close; + + memcpy(&t_backup, &t, sizeof(t)); + + t.c_lflag &= ~(ISIG|ICANON|ECHO); + t.c_cc[VMIN] = 0; + if (timeout) + t.c_cc[VTIME] = timeout*10; + + if (sys_ioctl(fd, TCSETS, (long)&t) < 0) + goto out_restore; + + while (1) { + if (sys_read(fd, &key, 1) <= 0) { + key = '\0'; + break; + } + key = tolower(key); + if (test_suspend_state(SUSPEND_SANITY_CHECK_PROMPT)) { + if (key == 'c') { + set_suspend_state(SUSPEND_CONTINUE_REQ); + break; + } else if (key == ' ') + break; + } else + break; + } + +out_restore: + sys_ioctl(fd, TCSETS, (long)&t_backup); + +out_close: + sys_close(fd); +out: + return key; +} + +/* abort_suspend + * + * Description: Begin to abort a cycle. If this wasn't at the user's request + * (and we're displaying output), tell the user why and wait for + * them to acknowledge the message. + * Arguments: A parameterised string (imagine this is printk) to display, + * telling the user why we're aborting. + */ + +void abort_suspend(const char *fmt, ...) +{ + va_list args; + int printed_len = 0; + + if (!test_result_state(SUSPEND_ABORTED)) { + if (!test_result_state(SUSPEND_ABORT_REQUESTED)) { + va_start(args, fmt); + printed_len = vsnprintf(local_printf_buf, + sizeof(local_printf_buf), fmt, args); + va_end(args); + if (ui_helper_data.pid != -1) + printed_len = sprintf(local_printf_buf + printed_len, + " (Press SPACE to continue)"); + suspend2_prepare_status(CLEAR_BAR, local_printf_buf); + + /* + * Make sure message seen - wait for shift to be + * released if being pressed + */ + if (ui_helper_data.pid != -1) + suspend_wait_for_keypress(0); + } + /* Turn on aborting flag */ + set_result_state(SUSPEND_ABORTED); + } +} + +/* suspend2_prepare_status + * Description: Prepare the 'nice display', drawing the header and version, + * along with the current action and perhaps also resetting the + * progress bar. + * Arguments: + * int clearbar: Whether to reset the progress bar. + * const char *fmt, ...: The action to be displayed. + */ +void suspend2_prepare_status(int clearbar, const char *fmt, ...) +{ + va_list args; + + if (fmt) { + va_start(args, fmt); + lastheader_message_len = vsnprintf(lastheader, 512, fmt, args); + va_end(args); + } + + if (clearbar) + userui_update_progress(0, 1, NULL, NULL); + + __suspend_message(0, SUSPEND_STATUS, 1, lastheader, NULL); + + if (ui_helper_data.pid == -1) + printk(KERN_EMERG "%s\n", lastheader); +} + +/* update_status + * + * Description: Update the progress bar and (if on) in-bar message. + * Arguments: UL value, maximum: Current progress percentage (value/max). + * const char *fmt, ...: Message to be displayed in the middle + * of the progress bar. + * Note that a NULL message does not mean that any previous + * message is erased! For that, you need suspend2_prepare_status with + * clearbar on. + * Returns: Unsigned long: The next value where status needs to be updated. + * This is to reduce unnecessary calls to update_status. + */ +unsigned long suspend2_update_status(unsigned long value, unsigned long maximum, + const char *fmt, ...) +{ + unsigned long next_update = maximum; + va_list args; + + if (!maximum) + return maximum; + + if (value < 0) + value = 0; + + if (value > maximum) + value = maximum; + + va_start(args, fmt); + + next_update = userui_update_progress(value, maximum, fmt, args); + + va_end(args); + + return next_update; +} + +/* check_shift_keys + * + * Description: Potentially pause and wait for the user to tell us to continue. + * We normally only pause when @pause is set. + * Arguments: int pause: Whether we normally pause. + * char *message: The message to display. Not parameterised + * because it's normally a constant. + */ + +void check_shift_keys(int pause, char *message) +{ +#ifdef CONFIG_PM_DEBUG + int displayed_message = 0, last_key = 0; + + while (last_key != 32 && + ui_helper_data.pid != -1 && + (!test_result_state(SUSPEND_ABORTED)) && + ((test_action_state(SUSPEND_PAUSE) && pause) || + (test_action_state(SUSPEND_SINGLESTEP)))) { + if (!displayed_message) { + suspend2_prepare_status(DONT_CLEAR_BAR, + "%s Press SPACE to continue.%s", + message ? message : "", + (test_action_state(SUSPEND_SINGLESTEP)) ? + " Single step on." : ""); + displayed_message = 1; + } + last_key = suspend_wait_for_keypress(0); + } +#endif + schedule(); +} + +extern asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, + unsigned long arg); + +/* suspend2_prepare_console + * + * Description: Prepare a console for use, save current settings. + * Returns: Boolean: Whether an error occured. Errors aren't + * treated as fatal, but a warning is printed. + */ +void suspend2_prepare_console(void) +{ + orig_loglevel = console_loglevel; + orig_default_message_loglevel = default_message_loglevel; + orig_kmsg = kmsg_redirect; + kmsg_redirect = fg_console + 1; + default_message_loglevel = 1; + console_loglevel = suspend_default_console_level; + + ui_helper_data.pid = -1; + + if (userui_ops.disabled) + return; + + if (!*ui_helper_data.program) { + printk("suspend_userui: program not configured. suspend_userui disabled.\n"); + return; + } + + suspend2_netlink_setup(&ui_helper_data); + + return; +} + +/* suspend2_restore_console + * + * Description: Restore the settings we saved above. + */ + +void suspend2_cleanup_console(void) +{ + suspend_default_console_level = console_loglevel; + + if (ui_helper_data.pid > -1) { + struct task_struct *t; + + suspend2_send_netlink_message(&ui_helper_data, + NETLINK_MSG_CLEANUP, NULL, 0); + + read_lock(&tasklist_lock); + if ((t = find_task_by_pid(ui_helper_data.pid))) + t->flags &= ~PF_NOFREEZE; + read_unlock(&tasklist_lock); + + suspend2_netlink_close(&ui_helper_data); + + ui_helper_data.pid = -1; + } + + console_loglevel = orig_loglevel; + kmsg_redirect = orig_kmsg; + default_message_loglevel = orig_default_message_loglevel; +} +#else +static char suspend_wait_for_keypress(int timeout) +{ + return 0; +} + +unsigned long suspend2_update_status(unsigned long value, unsigned long maximum, + const char *fmt, ...) +{ + return maximum; +} + +void __suspend_message(unsigned long section, unsigned long level, + int normally_logged, + const char *fmt, ...) { } +void suspend2_prepare_status(int clearbar, const char *fmt, ...) { } +void check_shift_keys(int pause, char *message) { } +void abort_suspend(const char *fmt, ...) { } +void suspend2_prepare_console(void) { } +void suspend2_cleanup_console(void) { } +void userui_redraw(void) { } +#endif + +/* suspend_early_boot_message() + * Description: Handle errors early in the process of booting. + * The user may press C to continue booting, perhaps + * invalidating the image, or space to reboot. + * This works from either the serial console or normally + * attached keyboard. + * + * Note that we come in here from init, while the kernel is + * locked. If we want to get events from the serial console, + * we need to temporarily unlock the kernel. + * + * suspend_early_boot_message may also be called post-boot. + * In this case, it simply printks the message and returns. + * + * Arguments: int Whether we are able to erase the image. + * int default_answer. What to do when we timeout. This + * will normally be continue, but the user might + * provide command line options (__setup) to override + * particular cases. + * Char *. Pointer to a string explaining why we're moaning. + */ + +#define say(message, a...) printk(KERN_EMERG message, ##a) +#define message_timeout 25 /* message_timeout * 10 must fit in 8 bits */ + +int suspend_early_boot_message(int message_detail, int default_answer, char *warning_reason, ...) +{ + unsigned long orig_state = get_suspend_state(), continue_req = 0; + va_list args; + int printed_len; + + if (warning_reason) { + va_start(args, warning_reason); + printed_len = vsnprintf(local_printf_buf, + sizeof(local_printf_buf), + warning_reason, + args); + va_end(args); + } + + if (!test_suspend_state(SUSPEND_BOOT_TIME)) { + printk(name_suspend "%s\n", local_printf_buf); + return default_answer; + } + + /* We might be called directly from do_mounts_initrd if the + * user fails to set up their initrd properly. We need to + * enable the keyboard handler by setting the running flag */ + set_suspend_state(SUSPEND_RUNNING); + +#if defined(CONFIG_VT) || defined(CONFIG_SERIAL_CONSOLE) + console_loglevel = 7; + + say("=== Suspend2 ===\n\n"); + if (warning_reason) { + say("BIG FAT WARNING!! %s\n\n", local_printf_buf); + switch (message_detail) { + case 0: + say("If you continue booting, note that any image WILL NOT BE REMOVED.\n"); + say("Suspend is unable to do so because the appropriate modules aren't\n"); + say("loaded. You should manually remove the image to avoid any\n"); + say("possibility of corrupting your filesystem(s) later.\n"); + break; + case 1: + say("If you want to use the current suspend image, reboot and try\n"); + say("again with the same kernel that you suspended from. If you want\n"); + say("to forget that image, continue and the image will be erased.\n"); + break; + } + say("Press SPACE to reboot or C to continue booting with this kernel\n\n"); + say("Default action if you don't select one in %d seconds is: %s.\n", + message_timeout, + default_answer == SUSPEND_CONTINUE_REQ ? + "continue booting" : "reboot"); + } else { + say("BIG FAT WARNING!!\n\n"); + say("You have tried to resume from this image before.\n"); + say("If it failed once, it may well fail again.\n"); + say("Would you like to remove the image and boot normally?\n"); + say("This will be equivalent to entering noresume2 on the\n"); + say("kernel command line.\n\n"); + say("Press SPACE to remove the image or C to continue resuming.\n\n"); + say("Default action if you don't select one in %d seconds is: %s.\n", + message_timeout, + !!default_answer ? + "continue resuming" : "remove the image"); + } + + set_suspend_state(SUSPEND_SANITY_CHECK_PROMPT); + clear_suspend_state(SUSPEND_CONTINUE_REQ); + + if (suspend_wait_for_keypress(message_timeout) == 0) /* We timed out */ + continue_req = !!default_answer; + else + continue_req = test_suspend_state(SUSPEND_CONTINUE_REQ); + + if ((warning_reason) && (!continue_req)) + machine_restart(NULL); + + restore_suspend_state(orig_state); + if (continue_req) + set_suspend_state(SUSPEND_CONTINUE_REQ); + +#endif // CONFIG_VT or CONFIG_SERIAL_CONSOLE + return -EPERM; +} +#undef say + +/* + * User interface specific /proc/suspend entries. + */ + +static struct suspend_proc_data proc_params[] = { +#ifdef CONFIG_NET +#ifdef CONFIG_PROC_FS + { .filename = "default_console_level", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &suspend_default_console_level, + .minimum = 0, +#ifdef CONFIG_PM_DEBUG + .maximum = 7, +#else + .maximum = 1, +#endif + + } + } + }, + + { .filename = "enable_escape", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_CAN_CANCEL, + } + } + }, + +#ifdef CONFIG_PM_DEBUG + { .filename = "debug_sections", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_UL, + .data = { + .ul = { + .variable = &suspend_debug_state, + .minimum = 0, + .maximum = 2 << 30, + } + } + }, + + { .filename = "log_everything", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_LOGALL, + } + } + }, + + { .filename = "pause_between_steps", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_BIT, + .data = { + .bit = { + .bit_vector = &suspend_action, + .bit = SUSPEND_PAUSE, + } + } + }, +#endif + { .filename = "disable_userui_support", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &userui_ops.disabled, + .minimum = 0, + .maximum = 1, + } + } + }, + { .filename = "userui_progress_granularity", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_INTEGER, + .data = { + .integer = { + .variable = &progress_granularity, + .minimum = 1, + .maximum = 2048, + } + } + }, + { .filename = "userui_program", + .permissions = PROC_RW, + .type = SUSPEND_PROC_DATA_STRING, + .data = { + .string = { + .variable = ui_helper_data.program, + .max_length = 255, + } + } + } +#endif +#endif +}; + +static struct suspend_plugin_ops userui_ops = { + .type = MISC_PLUGIN, + .name = "Userspace UI Support", + .module = THIS_MODULE, +#ifdef CONFIG_NET + .storage_needed = userui_storage_needed, + .save_config_info = userui_save_config_info, + .load_config_info = userui_load_config_info, + .memory_needed = userui_memory_needed, +#endif +}; + +/* suspend_console_proc_init + * Description: Boot time initialisation for user interface. + */ +static __init int suspend_console_proc_init(void) +{ + int result, i, numfiles = sizeof(proc_params) / sizeof(struct suspend_proc_data); + + if (!(result = suspend_register_plugin(&userui_ops))) + for (i=0; i< numfiles; i++) + suspend_register_procfile(&proc_params[i]); + +#ifdef CONFIG_NET + ui_helper_data.nl = NULL; + ui_helper_data.program[0] = '\0'; +#endif + ui_helper_data.pid = -1; + ui_helper_data.skb_size = sizeof(struct userui_msg_params); + ui_helper_data.pool_limit = 6; + ui_helper_data.netlink_id = NETLINK_SUSPEND2_USERUI; + ui_helper_data.name = "userspace ui"; + ui_helper_data.rcv_msg = userui_user_rcv_msg; + ui_helper_data.interface_version = 6; + ui_helper_data.must_init = 0; + ui_helper_data.not_ready = suspend2_cleanup_console; + init_completion(&ui_helper_data.wait_for_process); + + return result; +} + +late_initcall(suspend_console_proc_init); diff -urN oldtree/kernel/power/ui.h newtree/kernel/power/ui.h --- oldtree/kernel/power/ui.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/ui.h 2006-02-13 14:51:54.216923032 -0500 @@ -0,0 +1,44 @@ +/* + * + */ + +extern void suspend2_prepare_console(void); +extern void suspend2_cleanup_console(void); + +extern void check_shift_keys(int pause, char *message); +extern unsigned long suspend2_update_status(unsigned long value, unsigned long maximum, + const char *fmt, ...); + +extern void abort_suspend(const char *fmt, ...); + +extern void userui_redraw(void); + +enum { + DONT_CLEAR_BAR, + CLEAR_BAR +}; + +enum { + /* Userspace -> Kernel */ + USERUI_MSG_ABORT = 0x11, + USERUI_MSG_SET_STATE = 0x12, + USERUI_MSG_GET_STATE = 0x13, + USERUI_MSG_GET_DEBUG_STATE = 0x14, + USERUI_MSG_SET_DEBUG_STATE = 0x15, + USERUI_MSG_SET_PROGRESS_GRANULARITY = 0x17, + USERUI_MSG_SPACE = 0x18, + + /* Kernel -> Userspace */ + USERUI_MSG_MESSAGE = 0x21, + USERUI_MSG_PROGRESS = 0x22, + USERUI_MSG_REDRAW = 0x25, + USERUI_MSG_KEYPRESS = 0x26, + USERUI_MSG_DEBUG_STATE = 0x29, + + USERUI_MSG_MAX, +}; + +struct userui_msg_params { + unsigned long a, b, c, d; + char text[255]; +}; diff -urN oldtree/kernel/power/version.h newtree/kernel/power/version.h --- oldtree/kernel/power/version.h 1969-12-31 19:00:00.000000000 -0500 +++ newtree/kernel/power/version.h 2006-02-13 14:51:54.216923032 -0500 @@ -0,0 +1,2 @@ +#define SUSPEND_CORE_VERSION "2.2-rc16" +#define name_suspend "Suspend2 " SUSPEND_CORE_VERSION ": " diff -urN oldtree/kernel/sched.c newtree/kernel/sched.c --- oldtree/kernel/sched.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/sched.c 2006-02-13 14:51:54.218922728 -0500 @@ -4549,7 +4549,7 @@ struct list_head *head; migration_req_t *req; - try_to_freeze(); + try_todo_list(); spin_lock_irq(&rq->lock); @@ -4765,7 +4765,6 @@ p = kthread_create(migration_thread, hcpu, "migration/%d",cpu); if (IS_ERR(p)) return NOTIFY_BAD; - p->flags |= PF_NOFREEZE; kthread_bind(p, cpu); /* Must be high prio: stop_machine expects to yield to it. */ rq = task_rq_lock(p, &flags); diff -urN oldtree/kernel/signal.c newtree/kernel/signal.c --- oldtree/kernel/signal.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/signal.c 2006-02-13 14:51:54.220922424 -0500 @@ -213,7 +213,7 @@ fastcall void recalc_sigpending_tsk(struct task_struct *t) { if (t->signal->group_stop_count > 0 || - (freezing(t)) || + (t->todo) || PENDING(&t->pending, &t->blocked) || PENDING(&t->signal->shared_pending, &t->blocked)) set_tsk_thread_flag(t, TIF_SIGPENDING); @@ -2212,7 +2212,7 @@ timeout = schedule_timeout_interruptible(timeout); - try_to_freeze(); + try_todo_list(); spin_lock_irq(¤t->sighand->siglock); sig = dequeue_signal(current, &these, &info); current->blocked = current->real_blocked; diff -urN oldtree/kernel/softirq.c newtree/kernel/softirq.c --- oldtree/kernel/softirq.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/softirq.c 2006-02-13 14:51:54.220922424 -0500 @@ -350,7 +350,6 @@ static int ksoftirqd(void * __bind_cpu) { set_user_nice(current, 19); - current->flags |= PF_NOFREEZE; set_current_state(TASK_INTERRUPTIBLE); @@ -456,7 +455,7 @@ case CPU_UP_PREPARE: BUG_ON(per_cpu(tasklet_vec, hotcpu).list); BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list); - p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu); + p = kthread_nofreeze_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu); if (IS_ERR(p)) { printk("ksoftirqd for %i failed\n", hotcpu); return NOTIFY_BAD; diff -urN oldtree/kernel/sys.c newtree/kernel/sys.c --- oldtree/kernel/sys.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/sys.c 2006-02-13 14:51:54.221922272 -0500 @@ -173,15 +173,18 @@ { int ret=NOTIFY_DONE; struct notifier_block *nb = *n; + struct notifier_block *next; while(nb) { - ret=nb->notifier_call(nb,val,v); + /* Determining next here allows the notifier to unregister itself */ + next = nb->next; + ret = nb->notifier_call(nb,val,v); if(ret&NOTIFY_STOP_MASK) { return ret; } - nb=nb->next; + nb = next; } return ret; } @@ -530,12 +533,12 @@ unlock_kernel(); return -EINVAL; -#ifdef CONFIG_SOFTWARE_SUSPEND +#ifdef CONFIG_SUSPEND2 case LINUX_REBOOT_CMD_SW_SUSPEND: { - int ret = software_suspend(); + suspend2_try_suspend(); unlock_kernel(); - return ret; + return 0; } #endif diff -urN oldtree/kernel/workqueue.c newtree/kernel/workqueue.c --- oldtree/kernel/workqueue.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/kernel/workqueue.c 2006-02-13 14:51:54.222922120 -0500 @@ -188,8 +188,6 @@ struct k_sigaction sa; sigset_t blocked; - current->flags |= PF_NOFREEZE; - set_user_nice(current, -5); /* Block and flush all signals */ @@ -210,6 +208,7 @@ schedule(); else __set_current_state(TASK_RUNNING); + try_todo_list(); remove_wait_queue(&cwq->more_work, &wait); if (!list_empty(&cwq->worklist)) @@ -279,7 +278,8 @@ } static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, - int cpu) + int cpu, + unsigned long freezer_flags) { struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu); struct task_struct *p; @@ -293,10 +293,21 @@ init_waitqueue_head(&cwq->more_work); init_waitqueue_head(&cwq->work_done); - if (is_single_threaded(wq)) - p = kthread_create(worker_thread, cwq, "%s", wq->name); - else - p = kthread_create(worker_thread, cwq, "%s/%d", wq->name, cpu); + if (is_single_threaded(wq)) { + if (freezer_flags) + p = kthread_nofreeze_create(worker_thread, cwq, + "%s", wq->name); + else + p = kthread_create(worker_thread, cwq, + "%s", wq->name); + } else { + if (freezer_flags) + p = kthread_nofreeze_create(worker_thread, cwq, + "%s/%d", wq->name, cpu); + else + p = kthread_create(worker_thread, cwq, + "%s/%d", wq->name, cpu); + } if (IS_ERR(p)) return NULL; cwq->thread = p; @@ -304,7 +315,8 @@ } struct workqueue_struct *__create_workqueue(const char *name, - int singlethread) + int singlethread, + unsigned long freezer_flags) { int cpu, destroy = 0; struct workqueue_struct *wq; @@ -320,7 +332,8 @@ lock_cpu_hotplug(); if (singlethread) { INIT_LIST_HEAD(&wq->list); - p = create_workqueue_thread(wq, any_online_cpu(cpu_online_map)); + p = create_workqueue_thread(wq, any_online_cpu(cpu_online_map), + freezer_flags); if (!p) destroy = 1; else @@ -330,7 +343,7 @@ list_add(&wq->list, &workqueues); spin_unlock(&workqueue_lock); for_each_online_cpu(cpu) { - p = create_workqueue_thread(wq, cpu); + p = create_workqueue_thread(wq, cpu, freezer_flags); if (p) { kthread_bind(p, cpu); wake_up_process(p); @@ -502,7 +515,7 @@ case CPU_UP_PREPARE: /* Create a new workqueue thread for it. */ list_for_each_entry(wq, &workqueues, list) { - if (!create_workqueue_thread(wq, hotcpu)) { + if (!create_workqueue_thread(wq, hotcpu, 0)) { printk("workqueue for %i failed\n", hotcpu); return NOTIFY_BAD; } @@ -544,7 +557,7 @@ void init_workqueues(void) { hotcpu_notifier(workqueue_cpu_callback, 0); - keventd_wq = create_workqueue("events"); + keventd_wq = create_nofreeze_workqueue("events"); BUG_ON(!keventd_wq); } diff -urN oldtree/lib/Kconfig newtree/lib/Kconfig --- oldtree/lib/Kconfig 2006-01-02 22:21:10.000000000 -0500 +++ newtree/lib/Kconfig 2006-02-13 14:51:54.222922120 -0500 @@ -38,6 +38,9 @@ require M here. See Castagnoli93. Module will be libcrc32c. +config DYN_PAGEFLAGS + bool + # # compression support is select'ed if needed # diff -urN oldtree/lib/Makefile newtree/lib/Makefile --- oldtree/lib/Makefile 2006-01-02 22:21:10.000000000 -0500 +++ newtree/lib/Makefile 2006-02-13 14:51:54.222922120 -0500 @@ -28,6 +28,8 @@ lib-y += dec_and_lock.o endif +obj-$(CONFIG_DYN_PAGEFLAGS) += dyn_pageflags.o + obj-$(CONFIG_CRC_CCITT) += crc-ccitt.o obj-$(CONFIG_CRC16) += crc16.o obj-$(CONFIG_CRC32) += crc32.o diff -urN oldtree/lib/dyn_pageflags.c newtree/lib/dyn_pageflags.c --- oldtree/lib/dyn_pageflags.c 1969-12-31 19:00:00.000000000 -0500 +++ newtree/lib/dyn_pageflags.c 2006-02-13 14:51:54.223921968 -0500 @@ -0,0 +1,267 @@ +/* + * lib/dyn_pageflags.c + * + * Copyright (C) 2004-2005 Nigel Cunningham + * + * This file is released under the GPLv2. + * + * Routines for dynamically allocating and releasing bitmaps + * used as pseudo-pageflags. + * + * Arrays are not contiguous. The first sizeof(void *) bytes are + * the pointer to the next page in the bitmap. This allows us to + * work under low memory conditions where order 0 might be all + * that's available. In their original use (suspend2), it also + * lets us save the pages at suspend time, reload and relocate them + * as necessary at resume time without much effort. + */ + +#include +#include +#include +#include + +#define page_to_zone_offset(pg) (page_to_pfn(pg) - page_zone(pg)->zone_start_pfn) + +int num_zones(void) +{ + int result = 0; + struct zone *zone; + + for_each_zone(zone) + result++; + + return result; +} + +int pages_for_zone(struct zone *zone) +{ + return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >> + (PAGE_SHIFT + 3); +} + +int page_zone_number(struct page *page) +{ + struct zone *zone, *zone_sought = page_zone(page); + int zone_num = 0; + + for_each_zone(zone) + if (zone == zone_sought) + return zone_num; + else + zone_num++; + + printk("Was looking for a zone for page %p.\n", page); + BUG_ON(1); + + return 0; +} + +/* + * dyn_pageflags_pages_per_bitmap + * + * Number of pages needed for a bitmap covering all zones. + */ +int dyn_pageflags_pages_per_bitmap(void) +{ + int total = 0; + struct zone *zone; + + for_each_zone(zone) + total += pages_for_zone(zone); + + return total; +} + +/* clear_map + * + * Description: Clear an array used to store local page flags. + * Arguments: dyn_pageflags_t: The pagemap to be cleared. + */ + +void clear_dyn_pageflags(dyn_pageflags_t pagemap) +{ + int i = 0, zone_num = 0; + struct zone *zone; + + for_each_zone(zone) { + for (i = 0; i < pages_for_zone(zone); i++) + memset((pagemap[zone_num][i]), 0, PAGE_SIZE); + zone_num++; + } +} + +/* allocate_local_pageflags + * + * Description: Allocate a bitmap for local page flags. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap. + */ +int allocate_dyn_pageflags(dyn_pageflags_t *pagemap) +{ + int i, zone_num = 0; + struct zone *zone; + + BUG_ON(*pagemap); + + *pagemap = kmalloc(sizeof(void *) * num_zones(), GFP_ATOMIC); + + if (!*pagemap) + return 1; + + for_each_zone(zone) { + int zone_pages = pages_for_zone(zone); + (*pagemap)[zone_num] = kmalloc(sizeof(void *) * zone_pages, + GFP_ATOMIC); + + if (!(*pagemap)[zone_num]) { + kfree (*pagemap); + return 1; + } + + for (i = 0; i < zone_pages; i++) { + unsigned long address = get_zeroed_page(GFP_ATOMIC); + (*pagemap)[zone_num][i] = (unsigned long *) address; + if (!(*pagemap)[zone_num][i]) { + printk("Error. Unable to allocate memory for " + "dynamic pageflags."); + free_dyn_pageflags(pagemap); + return 1; + } + } + zone_num++; + } + + return 0; +} + +/* free_dyn_pageflags + * + * Description: Free a dynamically allocated pageflags bitmap. + * Arguments: dyn_pageflags_t *: Pointer to the bitmap being freed. + */ +int free_dyn_pageflags(dyn_pageflags_t *pagemap) +{ + int i = 0, zone_num = 0; + struct zone *zone; + + if (!*pagemap) + return 1; + + for_each_zone(zone) { + int zone_pages = pages_for_zone(zone); + + if (!((*pagemap)[zone_num])) + continue; + for (i = 0; i < zone_pages; i++) + if ((*pagemap)[zone_num][i]) + free_page((unsigned long) (*pagemap)[zone_num][i]); + + if (PageSlab(virt_to_page((*pagemap)[zone_num]))) + kfree((*pagemap)[zone_num]); + else + free_page((unsigned long) (*pagemap)[zone_num]); + + zone_num++; + } + + if (PageSlab(virt_to_page((*pagemap)))) + kfree(*pagemap); + else + free_page((unsigned long) (*pagemap)); + + *pagemap = NULL; + return 0; +} + +/* + * + */ + +unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, struct page *pg) +{ + int zone_pfn = page_to_zone_offset(pg); + int zone_num = page_zone_number(pg); + int pagenum = PAGENUMBER(zone_pfn); + int page_offset = PAGEINDEX(zone_pfn); + return ((*bitmap)[zone_num][pagenum]) + page_offset; +} + +int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page); + int zone_offset = page_to_zone_offset(page); + int bit = PAGEBIT(zone_offset); + + return test_bit(bit, ul); +} + +void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page); + int zone_offset = page_to_zone_offset(page); + int bit = PAGEBIT(zone_offset); + set_bit(bit, ul); +} + +void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page) +{ + unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page); + int zone_offset = page_to_zone_offset(page); + int bit = PAGEBIT(zone_offset); + clear_bit(bit, ul); +} + +int get_next_bit_on(dyn_pageflags_t bitmap, int counter) +{ + struct page *page; + struct zone *zone; + unsigned long *ul; + int zone_offset, pagebit, zone_num, first; + + BUG_ON(counter == max_pfn); + + first = (counter == -1); + + if (first) + counter = pgdat_list->node_zones->zone_start_pfn; + + page = pfn_to_page(counter); + zone = page_zone(page); + zone_num = page_zone_number(page); + + if (!first) + counter++; + + zone_offset = counter - zone->zone_start_pfn; + + do { + if (zone_offset >= zone->spanned_pages) { + do { + zone = next_zone(zone); + if (!zone) + return max_pfn; + zone_num++; + } while(!zone->spanned_pages); + + counter = zone->zone_start_pfn; + zone_offset = 0; + page = pfn_to_page(counter); + } + + /* + * This could be optimised, but there are more + * important things and the code is simple at + * the moment + */ + ul = (bitmap[zone_num][PAGENUMBER(zone_offset)]) + PAGEINDEX(zone_offset); + + pagebit = PAGEBIT(zone_offset); + + counter++; + zone_offset++; + page = pfn_to_page(counter); + + } while((counter <= max_pfn) && (!test_bit(pagebit, ul))); + return counter - 1; +} + diff -urN oldtree/lib/vsprintf.c newtree/lib/vsprintf.c --- oldtree/lib/vsprintf.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/lib/vsprintf.c 2006-02-13 14:51:54.223921968 -0500 @@ -236,6 +236,34 @@ return buf; } +/* + * vsnprintf_used + * + * Functionality : Print a string with parameters to a buffer of a + * limited size. Unlike vsnprintf, we return the number + * of bytes actually put in the buffer, not the number + * that would have been put in if it was big enough. + */ +int snprintf_used(char *buffer, int buffer_size, const char *fmt, ...) +{ + int result; + va_list args; + + if (!buffer_size) { + return 0; + } + + va_start(args, fmt); + result = vsnprintf(buffer, buffer_size, fmt, args); + va_end(args); + + if (result > buffer_size) { + return buffer_size; + } + + return result; +} + /** * vsnprintf - Format a string and place it in a buffer * @buf: The buffer to place the result into diff -urN oldtree/mm/bootmem.c newtree/mm/bootmem.c --- oldtree/mm/bootmem.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/bootmem.c 2006-02-13 14:51:54.224921816 -0500 @@ -301,12 +301,14 @@ page = pfn_to_page(pfn); count += BITS_PER_LONG; __ClearPageReserved(page); + ClearPageNosave(page); order = ffs(BITS_PER_LONG) - 1; set_page_refs(page, order); for (j = 1; j < BITS_PER_LONG; j++) { if (j + 16 < BITS_PER_LONG) prefetchw(page + j + 16); __ClearPageReserved(page + j); + ClearPageNosave(page + j); set_page_count(page + j, 0); } __free_pages(page, order); @@ -320,6 +322,7 @@ if (v & m) { count++; __ClearPageReserved(page); + ClearPageNosave(page); set_page_refs(page, 0); __free_page(page); } @@ -340,6 +343,7 @@ for (i = 0; i < ((bdata->node_low_pfn-(bdata->node_boot_start >> PAGE_SHIFT))/8 + PAGE_SIZE-1)/PAGE_SIZE; i++,page++) { count++; __ClearPageReserved(page); + ClearPageNosave(page); set_page_count(page, 1); __free_page(page); } diff -urN oldtree/mm/memory.c newtree/mm/memory.c --- oldtree/mm/memory.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/memory.c 2006-02-13 14:51:54.225921664 -0500 @@ -950,6 +950,15 @@ return page; } +/* + * We want the address of the page for Suspend2 to mark as being in pageset1. + */ + +struct page *suspend2_follow_page(struct mm_struct *mm, unsigned long address) +{ + return follow_page(mm->mmap, address, 0); +} + int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, struct page **pages, struct vm_area_struct **vmas) diff -urN oldtree/mm/page_alloc.c newtree/mm/page_alloc.c --- oldtree/mm/page_alloc.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/page_alloc.c 2006-02-13 14:51:54.226921512 -0500 @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -920,8 +921,8 @@ /* This allocation should allow future memory freeing. */ - if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE))) - && !in_interrupt()) { + if ((((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE))) && + !in_interrupt()) || (test_freezer_state(FREEZER_ON))) { if (!(gfp_mask & __GFP_NOMEMALLOC)) { nofail_alloc: /* go through the zonelist yet again, ignoring mins */ @@ -992,6 +993,7 @@ do_retry = 1; } if (do_retry) { + try_todo_list(); blk_congestion_wait(WRITE, HZ/50); goto rebalance; } diff -urN oldtree/mm/pdflush.c newtree/mm/pdflush.c --- oldtree/mm/pdflush.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/pdflush.c 2006-02-13 14:51:54.226921512 -0500 @@ -106,7 +106,7 @@ spin_unlock_irq(&pdflush_lock); schedule(); - if (try_to_freeze()) { + if (try_todo_list()) { spin_lock_irq(&pdflush_lock); continue; } diff -urN oldtree/mm/swapfile.c newtree/mm/swapfile.c --- oldtree/mm/swapfile.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/swapfile.c 2006-02-13 14:51:54.227921360 -0500 @@ -1155,6 +1155,7 @@ swap_file = p->swap_file; p->swap_file = NULL; p->max = 0; + p->bdev = NULL; swap_map = p->swap_map; p->swap_map = NULL; p->flags = 0; diff -urN oldtree/mm/vmscan.c newtree/mm/vmscan.c --- oldtree/mm/vmscan.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/mm/vmscan.c 2006-02-13 14:51:54.227921360 -0500 @@ -1244,7 +1244,8 @@ for ( ; ; ) { unsigned long new_order; - try_to_freeze(); + if (try_todo_list()) + pgdat->kswapd_max_order = 0; prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); new_order = pgdat->kswapd_max_order; diff -urN oldtree/net/rxrpc/krxiod.c newtree/net/rxrpc/krxiod.c --- oldtree/net/rxrpc/krxiod.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/net/rxrpc/krxiod.c 2006-02-13 14:51:54.228921208 -0500 @@ -138,7 +138,7 @@ _debug("### End Work"); - try_to_freeze(); + try_todo_list(); /* discard pending signals */ rxrpc_discard_my_signals(); diff -urN oldtree/net/rxrpc/krxsecd.c newtree/net/rxrpc/krxsecd.c --- oldtree/net/rxrpc/krxsecd.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/net/rxrpc/krxsecd.c 2006-02-13 14:51:54.228921208 -0500 @@ -107,7 +107,7 @@ _debug("### End Inbound Calls"); - try_to_freeze(); + try_todo_list(); /* discard pending signals */ rxrpc_discard_my_signals(); diff -urN oldtree/net/rxrpc/krxtimod.c newtree/net/rxrpc/krxtimod.c --- oldtree/net/rxrpc/krxtimod.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/net/rxrpc/krxtimod.c 2006-02-13 14:51:54.228921208 -0500 @@ -90,7 +90,7 @@ complete_and_exit(&krxtimod_dead, 0); } - try_to_freeze(); + try_todo_list(); /* discard pending signals */ rxrpc_discard_my_signals(); diff -urN oldtree/net/sunrpc/sched.c newtree/net/sunrpc/sched.c --- oldtree/net/sunrpc/sched.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/net/sunrpc/sched.c 2006-02-13 14:51:54.229921056 -0500 @@ -656,6 +656,9 @@ /* sync task: sleep here */ dprintk("RPC: %4d sync task going to sleep\n", task->tk_pid); + + try_todo_list(); + /* Note: Caller should be using rpc_clnt_sigmask() */ status = out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_QUEUED, rpc_wait_bit_interruptible, @@ -698,6 +701,7 @@ { BUG_ON(task->tk_active); + try_todo_list(); task->tk_active = 1; rpc_set_running(task); return __rpc_execute(task); diff -urN oldtree/net/sunrpc/svcsock.c newtree/net/sunrpc/svcsock.c --- oldtree/net/sunrpc/svcsock.c 2006-01-02 22:21:10.000000000 -0500 +++ newtree/net/sunrpc/svcsock.c 2006-02-13 14:51:54.230920904 -0500 @@ -1177,7 +1177,7 @@ arg->len = (pages-1)*PAGE_SIZE; arg->tail[0].iov_len = 0; - try_to_freeze(); + try_todo_list(); cond_resched(); if (signalled()) return -EINTR; @@ -1219,7 +1219,7 @@ schedule_timeout(timeout); - try_to_freeze(); + try_todo_list(); spin_lock_bh(&serv->sv_lock); remove_wait_queue(&rqstp->rq_wait, &wait);