DragonFly BSD
DragonFly commits List (threaded) for 2004-11
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

cvs commit: src/sys/vm

From: Matthew Dillon <dillon@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 10 Nov 2004 09:39:20 -0800 (PST)

dillon      2004/11/10 09:39:20 PST

DragonFly src repository

  Modified files:
    sys/vm               vm_contig.c 
  Fix a very serious bug in contigmalloc() which we inherited from FreeBSD-4.x.
  The contigmalloc() code incorrectly assumes that a page in PQ_CACHE can
  be reused without having to do any further checks and it unconditionally
  busies and frees such pages, and assumes that the page becomes PQ_FREE even
  though it might actually have gone to a PQ_HOLD state.  Additionally the
  contigmalloc() code unconditionally sets m->object to NULL, ignoring the
  fact that the page will be in the VM page bucket hash table if object
  happens to not be NULL, leading to page bucket hash table corruption.
  The fix is two fold.  First, we add checks for m->busy, (m->flags & PG_BUSY),
  m->wire_count, and m->hold_count and do not reuse a page with any of those
  set.  We do this for all pages, not just PQ_CACHE pages, though it is
  believed that it only needs to be done for PQ_CACHE pages.  Second, we
  replace the m->object = NULL assignment with an assertion that it is
  already NULL, since it had better be NULL and we cannot just set it to NULL
  unconditionally without blowing up the VM page hash table.
  Symptoms of the bug include:
      * Filesystem corruption, in particular with slower disk drivers (e.g.
        like the 'twe' driver), or in systems with drivers which use
        contigmalloc() a lot (e.g. require bounce buffers).
        Mangled directory entries, bad indirect blocks (containing data instead
        of indirect block pointers), and files containing other file's data.
      * 'page not found in hash' panic.
  This is the last major VM issue in DragonFly, one that has plagued in
  particular David Rhodus (who is a heavy user of the 'twe' driver) for over
  a year.  I would never have found this bug if not for DR's persistence and
  the dozens of kernel cores he was able to provide me over the last year.  We
  finally got a core with a 'smoking gun', after having written a program
  (/usr/src/test/debug/vmpageinfo.c) to run through all the VM pages and check
  their hash table association for correctness it became obvious that pages
  were being reused without being removed from the hash table which finally
  led to contigmalloc*().
  Many thanks to: David Rhodus!  Free gift enclosed!
  Revision  Changes    Path
  1.11      +22 -14    src/sys/vm/vm_contig.c


[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]