Owen Smith [Mon, 21 Oct 2024 11:47:08 +0000 (12:47 +0100)]
Allow BLKIF_OP_WRITE_BARRIER or BLKIF_OP_FLUSH_DISKCACHE...
... when a SCSIOP_SYNCHRONIZE_CACHE is issued, which should be the last
operation issued by the crash kernel.
Certain backends do not advertise WriteBarrier support, and use SyncDiskCache
instead. Add support for creating crashdumps on these backends.
Owen Smith [Tue, 9 Jul 2024 10:43:19 +0000 (11:43 +0100)]
Use UNPLUG v3
Use UnplugBootEmulated to detect whether the boot disk should remain emulated
to avoid openning an absolute registry path.
Use UnplugReboot to request a reboot from xenbus_monitor. Also removes the
RequestKey property from the INF file, as its no longer needed.
Bumps binding in INF file to match revision exposed by xenbus that implements
UNPLUG v3 (0x0900000B)
Owen Smith [Tue, 9 Jul 2024 10:43:18 +0000 (11:43 +0100)]
Add RegistryOpenParametersKey
Server 2025 WHQL tests enables "verifier.exe /onecheck /rc 33 36" on some drivers
under test, which will detect a violation if drivers attempt to access absolute
registry paths.
IoOpenDriverRegistryKey will open the parameters key for a driver, but its not
defined for Server 2016. Use MmGetSystemRoutineAddress to dynamically find the
function so that a single binary can be used on Server 2016 and Server 2025.
Martin Harvey [Tue, 12 Mar 2024 07:23:52 +0000 (07:23 +0000)]
Asynchronous power handling.
XenDisk requires minimal IRP_MN_SET_POWER/IRP_MN_QUERY_POWER interactions.
No IoWorkItems are required as operations perform no significant work.
Power handlers are is limited to tracking state changes and calling PoSetPowerState.
Signed-off-by: Martin Harvey <martin.harvey@citrix.com>
Refactored Signed-off-by: Owen Smith <owen.smith@cloud.com>
Fix gratuitous and unrelated removal of brackets.
Owen Smith [Fri, 29 Sep 2023 08:03:14 +0000 (09:03 +0100)]
Remove CoInstaller from INF
Windows 11 22H2 WHQL requires INF files pass "InfVerif /k", which highlights
several issues
- PnpLockdown=1 needs to be specified
- CoInstallers are no longer allowed
The CoInstaller has several functions that will need alternative solutions:
- The AllowUpdate mechanism is no longer possible
- The safety checks that ensure interface versionings remain compatible
- The updating of various system config registry values
Interface safety checks need to be handled by changes to child device bindings,
and assuming upgrade via emulated devices is safe. The unplug keys are cleared
in the INF to revert to emulated on the next boot, incase the current child
drivers rely on an interface that is no longer present (note: in this case,
child drivers will need updating).
Also updates unplug_interface.h and device bindings so that this driver is only
loaded on a later XenBus that has removed the XenBus CoInstaller.
Owen Smith [Tue, 18 Apr 2023 08:50:45 +0000 (09:50 +0100)]
Rebuild CodeQL builds
CodeQL can sometimes fail to detect any source code if the codebase is
not rebuilt. Use the Rebuild target to force all intermediate build artifacts
to be cleaned beforehand.
Martin Harvey [Mon, 5 Dec 2022 09:01:59 +0000 (09:01 +0000)]
Correct return codes during racy destruction.
Errors in PnP retun codes found when testing under driver
verifier with mixed VM lifecycle operations. Under some
rare cases, it is possible to get more than one PnP
"remove-like" operation. This results in a PnP remove
operation being processed whilst the device is already
in the deleted state.
This patch fixes the immediate cause of the bugfixes,
by fixing the return code. Device destruction is
unchanged. Investigation into the root cause is still
ongoing.
Signed-off-by: Martin Harvey <martin.harvey@citrix.com>
Cosmetic fixes.
Owen Smith [Fri, 18 Nov 2022 10:06:10 +0000 (10:06 +0000)]
Pass SignMode to MSBuild
Allows overriding of SignMode to "Off" to prevent signing binaries with the PFX
file. This is useful if wrapper builds sign binaries with alternative signatures
or when signing is not required.
Signed-off-by: Owen Smith <owen.smith@citrix.com>
Small whitespace fix.
Paul Durrant [Mon, 31 Oct 2022 13:38:04 +0000 (13:38 +0000)]
Add build options for EWDK 22621
VisualStudioVersion = 17.0 maps to Visual Studio 2022
* Adds project files for vs2022
* Adds mapping from VisualStudioVersion 17.0 to "vs2022" project folder
* Adds mapping from VisualStudioVersion 17.0 to "Windows 10" build target
* Adds guard to build.ps1 - EWDK 22621 does not build x86 binaries
* Adds include directive where compiler intrinsics are used
Suggested-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Owen Smith [Thu, 5 May 2022 07:04:53 +0000 (08:04 +0100)]
Fix compiler options
Adds '/ZH:SHA_256' '/CETCOMPAT' '/sdl' to compiler and '/SafeSEH' to x86 linker
command lines
These changes were prompted by binskim https://github.com/microsoft/binskim
Note: Rule BA2004 (Warning_NativeWithInsecureStaticLibraryCompilands) is still
reported for xenvbd_coinst.dll and xencrsh.sys
Rule BA2018 (Error, empty SEH table) is still reported for xencrsh.sys
Owen Smith [Tue, 30 Nov 2021 16:06:22 +0000 (16:06 +0000)]
Interpret "removable" and "info" flags correctly
"removable" relates to the ability to remove the device (not media)
"removable" = "1" is used to indicate the device can be hot unplugged, as
PV devices should support hot plug/unplug in the majority of cases.
"removable" = "0" is used to indicate that the device is being prevented
from hot unplug by the tool stack. This will allow XenVbd to report the
correct device capabilities to the OS to indicate that this disk is not
capable of being removed. This will allow certain policies to be applied
which restrict access to removable disks (for security and to prevent data
exfiltration)
"info" contains various flags for the media (not device)
VDISK_CDROM implies RemovableMedia (a specific case of VDISK_REMOVABLE)
VDISK_REMOVABLE imples RemovableMedia (underlying disk has GENHD_FL_REMOVABLE)
VDISK_READONLY implies a READ_ONLY_DIRECT_ACCESS_DEVICE
'Standard' disks usually set no flags (i.e. media is RW and not removable)
A CDROM will set VDISK_CDROM | VDISK_READONLY, to indicate the media is RO
and removable.
STOR_FEATURE_FULL_PNP_DEVICE_CAPABILITIES must be set, otherwise StorPort will
not use the values provided for EjectSupported and SurpriseRemovalOK in the
STOR_DEVICE_CAPABILITIES_EX structure. Without this, CM_DEVCAP_EJECTSUPPORTED
and CM_DEVCAP_SURPRISEREMOVALOK are left unchanged, and prevents a non-removable
device from identifying correctly as non-removable.
Owen Smith [Thu, 12 Aug 2021 12:41:36 +0000 (13:41 +0100)]
Fix SDV/CodeQL log generation
- sarif files need to be stored with SDV logs when generating the DVL file
- Disable PREFast and CodeAnalysis by default
- Run a seperate CodeAnalysis build after SDV, but before generating DVL file
DVL file should contain multiple summary lines for SDV, at least 1 line
for CodeAnalysis and at least 1 line for Semmle (CodeQL)
Owen Smith [Thu, 12 Aug 2021 12:41:35 +0000 (13:41 +0100)]
Fix build with later WDKs
- Adds alias for GetProjectInfoForReference target to version.vcxproj
Later kits seemed to have renamed the build target, and will fail without
this alias target.
- Adds "/fd sha256" to signtool command line
WDK 20344 and later require binaries signed with a SHA256 file digest, or
the build outputs are deleted
- Disables warning 4061 - switch statement on enum types need to have a case for
all values of the enumeration
Signed-off-by: Owen Smith <owen.smith@citrix.com>
- Cast enum types used as array indices to avoid bounds check complaint
Owen Smith [Mon, 21 Jun 2021 12:54:44 +0000 (13:54 +0100)]
Match __FreePage with __AllocatePage calls
Replace __FreePages with __FreePage when the memory was allocated with
__AllocatePage. This is a cosmetic change, as __FreePage is an alias for
__FreePages
Owen Smith [Fri, 5 Mar 2021 10:16:05 +0000 (10:16 +0000)]
Add CodeQL build stage
CodeQL logs will be required for future WHQL submissions. Add a stage
that generates the required SARIF files. CodeQL is a semantic code
analysis engine, which will highlight vunerabilities that will need
fixing.
In order to use CodeQL, the CodeQL binaries must be on the path and the
Windows-Driver-Developer-Supplemental-Tools must be on the path defined
by the CODEQL_QUERY_SUITE environment variable (if defined), or under
the parent folder (if CODEQL_QUERY_SUITE variable is not defined)
Note: Due to the way the codeql command line is built, using quotes in a
MSBuild command line is not possible, so generate a batch file to wrap
the command line.
Owen Smith [Wed, 16 Dec 2020 15:36:10 +0000 (15:36 +0000)]
Add XEN:BOOT_EMULATED handler
If XEN:BOOT_EMULATED=TRUE is in the system start options, xen.sys will
issue an unplug for only the AUX disks (by writing 0x0004 instead of
0x0001 to port 0x10 during the unplug), this leads to a target being
created for the boot disk which will not be used (due to an emulated
device being present). The non-functioning target will request a reboot
to resolve this, which will return to the current state and request
another reboot.
Paul Durrant [Thu, 19 Nov 2020 14:43:33 +0000 (14:43 +0000)]
Introduce a BlkifRing watchdog
Analogous to similar watchdog threads for XENVIF transmitter and receiver
rings, this patch introduces code to start a watchdog thread for blkif rings.
The thread wakes every 30s and checks for responses remaining pending on the
ring (without the frontend making progress) across two consecutive iterations.
If the ring appears to be 'stuck' in this manner then the ring DebugCallback()
function is triggered, the ring is polled and an event is send to wake up
the backend.
Inherit versioning info from environment if present
As the drivers stabilize and mature, there is an ever-growing
chance that other opensource virtualization projects will adopt
them. Allow external projects to inject their own versioning
into the drivers instead of hardcoding the latest winpv version.
Signed-off-by: Nicholas Tsirakis <tsirakisn@ainfosec.com> Acked-by: Owen Smith <owen.smith@citrix.com>
Often times we only need to build a driver for a single
targeted architecture. Continue to build both by default,
but allow the user to specify one if desired.
Signed-off-by: Nicholas Tsirakis <tsirakisn@ainfosec.com>
Use [string]::IsNullOrEmpty($Arch)
Paul Durrant [Fri, 28 Aug 2020 16:49:47 +0000 (17:49 +0100)]
Stop mis-interpreting the 'removable' node in xenstore...
... and the VDISK_REMOVABLE bit in the value of the 'info' node.
They both apply to the media and not the device itself. PV devices are always
removable.
The comment in libxl_disk.c concerning the 'removable' flag states that:
"Currently there is only one removable device -- CDROM"
This is not conclusive but it is reasonable to infer from that the removabilty
refers to the media and not the drive itself. (CDROM drives in typical servers
are not removable).
The code in Linux xen-blkback/xenbus.c sets VDISK_REMOVABLE if the underlying
block device has the GENHD_FL_REMOVABLE flag, and the comment above the
definition of that flag in genhd.h states:
"``GENHD_FL_REMOVABLE`` (0x0001): indicates that the block device gives access
to removable media.
When set, the device remains present even when media is not inserted.
Must not be set for devices which are removed entirely when the media is
removed."
This patch, therefore, stops using these values to indicate the removability
of the target devices and instead uses them as they were intended, to indicate
the removability of the media.
NOTE: The code in XENCRSH is modified to simply ignore the 'removable' node.
The value currently sampled is not used.
The Removable BOOLEAN field currently in XENVBD_CAPS is also moved into
XENVBD_FEATURES for consistency with other values sampled from xenstore
nodes.
These bugchecks have been observed in recent updates of Server 2019.
This patch, rather than replacing calls to MmAllocatePagesForMdlEx() with
calls to MmMapLockedPagesSpecifyCache(), just avoids passing
MM_DONT_ZERO_ALLOCATION to work round the bug.
The patch instead passes MM_ALLOCATE_FULLY_REQUIRED, which arguably should
have always been passed for allocations larger than a single page. It also
fixes a formatting issue.
Reported-by: Jan Bakuwel <jan.bakuwel@gmail.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Owen Smith [Wed, 15 Jan 2020 11:50:44 +0000 (11:50 +0000)]
Increase FrontendPath length for long DeviceIds
The DeviceIds maximum value is 1 << 28 | 0xfffff << 8 | 0xff, which is a
9 digit decimal numver. Values higher than this are invalid. Ensure the
FrontendPath buffer is large enough to contain valid DeviceId values.
Owen Smith [Fri, 11 Oct 2019 13:51:10 +0000 (14:51 +0100)]
Fix > 2TB disks
In order to determine the size of a disk, Windows will issue a
SCSIOP_READ_CAPACITY. Disks larger than 2TB will respond with a max LBA
of 0xFFFFFFFF, which causes Windows to issue a SCSIOP_READ_CAPACITY16.
The read capacity 16 is passed with a 12 byte buffer to be filled in
using the READ_CAPACITY_DATA_EX structure, not the 16 or 32byte
(depending on packing) READ_CAPACITY16_DATA buffer.
Also adds Error labels to the failure conditions.
Owen Smith [Thu, 19 Sep 2019 08:44:15 +0000 (09:44 +0100)]
Attempt to process responses on the ring
When Disabling the ring, outstanding responses need to be completed.
Poll the ring to complete outstanding responses if the backend is still
connected and valid.
Owen Smith [Thu, 19 Sep 2019 08:30:41 +0000 (09:30 +0100)]
Rework BlkifRingDisable
Clean up all prepared and submitted requests when the ring is disabled,
so that outstanding SRBs are returned to storport for queueing. This is
especially important on the return from suspend path, as the ring is no
longer valid, and any submitted requests would be lost and trigger a
storport target reset.
Also ignores missing requests for responses.
Owen Smith [Thu, 19 Sep 2019 08:24:28 +0000 (09:24 +0100)]
Attempt to process responses on the ring
When Disabling the ring, outstanding responses need to be completed.
Poll the ring to complete outstanding responses if the backend is still
connected and valid.
Owen Smith [Thu, 5 Sep 2019 15:29:04 +0000 (16:29 +0100)]
Rework request submission
Make BlkifRingPostRequests return success for submitting 0 or more requests,
or failure when the ring is full. This prevents the loop in
BlkifRingSchedule() from preparing the next SRB when the ring is already
full.
Also attempt to notify the backend of changes every iteration of the loop in
BlkifRingSchedule(), to trigger the backend as soon as possible.
Owen Smith [Thu, 5 Sep 2019 08:42:38 +0000 (09:42 +0100)]
Update rsp_event during BlkifRingPoll()
Currently, by updating it in __BlkifRingPushRequests(), the code is
attempting to defer events until all of the posted requests have responses
on the ring. This is likely to lead to a cycle of fill...empty...fill...
empty etc., which is bad for performance.
This patch instead updates rsp_event when BlkifRingPoll() completes, such
that the very next response placed on the ring by the backend should
cause an event to be sent.
Signed-off-by: Owen Smith <owen.smith@citrix.com>
[Expanded commit message] Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Owen Smith [Fri, 14 Jun 2019 15:42:49 +0000 (16:42 +0100)]
Add PowerShell build scripts, version.vcxproj
Based on the sequence of commits to xenbus, add powershell scripts to
build the solution using the EWDK
version.vcxproj generates versioned files (version.h and xenvbd.inf) using
scripts/genfiles.ps1
Strips duplicated functionality from build.py to produce consistant
builds between python and powershell.
Paul Durrant [Fri, 26 Apr 2019 08:53:34 +0000 (09:53 +0100)]
Try to avoid dumping non-RAM pages
When XENCRSH sets up blkif requests they may end up referring to PFNs that
are ballooned out. When these requests reach the backend driver, it will
unsurprisingly encounter failures when trying to map or copy the data from
these PFNs, generally resulting in the request as a whole being failed and
a lot of noise being emitted to various logs.
This patch adds a check into PrepareReadWrite() to check the P2M type of
PFNs being dumped. If the type is found to be anything other than writable
RAM then the PFN is substituted with a buffer PFN, which will just contain
zeroes. The storage backend will be able to map or copy these pages, so
stalls in the dump process and useless log messages will be avoided.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Paul Durrant [Tue, 9 Apr 2019 16:03:08 +0000 (17:03 +0100)]
Report disk size and logical sector size in XENDISK...
...rather than XENVBD.
This allows us to use the PDO name rather than the more obscure target
number. Also, report the size in MB rather rather than in sectors (now
that sector size may be something other than 512B).
Also fix some whitespace bugs while in the neighbourhood.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Paul Durrant [Fri, 5 Apr 2019 15:51:08 +0000 (16:51 +0100)]
Add 'feature-large-sector-size'
As explained in Xen commit 67e1c050 "public/io/blkif.h: try to fix the
semantics of sector based quantities" [1], frontends that always
supply and interpret sector based quantities in terms of the 'sector-size'
of the backend should declare 'feature-large-sector-size'.
Owen Smith [Wed, 26 Sep 2018 09:47:36 +0000 (10:47 +0100)]
Fix BSOD on RingDestroy
Zero Frontend->MaxQueues after calling RingDestroy, as RingDestroy will
query this value to free each BlkifRing, which will decrement an
unsigned value below 0.
Also adds an ASSERT to detect if FrontendGetMaxQueues returns 0.
Signed-off-by: Owen Smith <owen.smith@citrix.com>
Test that Index != 0 rather than > 0, since it is an unsigned quantity.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Owen Smith [Fri, 24 Aug 2018 16:46:43 +0000 (17:46 +0100)]
Conditionally package DPInst
Since DPInst.exe is not shipped with the Windows Driver Kit 10, an
environment variable must point to local copies. Make the inclusion of
DPInst conditional on DPINST_REDIST being defined and that path
existing. This simplifies building packages which do not require DPInst
for installation, and removes a required step to create a working build.
Paul Durrant [Mon, 6 Aug 2018 09:47:58 +0000 (10:47 +0100)]
Remove bogus ASSERTion
In a checked build the code in BlkifRingSchedule() sometimes hits the
ASSERTion:
ASSERT3U(State->Count, ==, 0);
This check is there because this code was ported across from XENVIF. In
the context of that driver the check is valid because it should never be
possible to post a partial sequence of netif requests (since that would
violate the protocol). However, in the context of XENVBD posting blkif
requests, it is perfectly reasonable for a subset of blkif requests for
a single SRB to be posted, and hence __BlkifRingPostRequests() may exit
before State->Count falls to zero. Thus the ASSERTion is invalid in this
context and needs to be removed.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Owen Smith [Fri, 3 Aug 2018 13:48:17 +0000 (14:48 +0100)]
Add some sensible default overrides
Without overrides, "max-ring-page-order" and "multi-queue-max-queues"
use values that will consume large amounts of grant references.
"max-ring-page-order" will default to 4 (16 pages per ring)
"multi-queue-max-queues" will default to the lowest of guest vCPU count
or backend's vCPU count.
Override "max-ring-page-order" to 1 (2 pages per ring) and
"multi-queue-max-queues" to 2 (2 rings per block device)
Owen Smith [Mon, 4 Jun 2018 14:37:36 +0000 (15:37 +0100)]
Set StorPortInitializePerfOpts
Sets DPC_REDIRECTION, OPTIMIZE_FOR_COMPLETION_DURING_STARTIO and
CONCURRENT_CHANNELS (to "multi-queue-max-queues") to improve StorPort's
distribution of SRBs to vCPUs.
Owen Smith [Mon, 4 Jun 2018 14:11:26 +0000 (15:11 +0100)]
Implement multi-queues
Splits XENVBD_RING into multiple XENVBD_BLKIF_RINGs, one for each shared
ring. Up-to "multi-queue-max-queues" rings are used to pass
blkif_requests and blkif_responses between frontend and backend. Reworks
the ring interactions to remove the locks used by XENVBD_QUEUE,
implementing a queue system similar to XenVifs transmitter queues.
Signed-off-by: Owen Smith <owen.smith@citrix.com>
s/Packets/SRBs in ring.c
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Owen Smith [Mon, 4 Jun 2018 13:57:43 +0000 (14:57 +0100)]
Force single queue in crash/hibernate path
Remove "multi-queue-num-queues" from frontend area when the crashump
driver is active. This will force the backend to use a single queue,
which the crashdump frontend is supplying
Signed-off-by: Owen Smith <owen.smith@citrix.com>
Add whitespace after cast for consistency with other code.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Owen Smith [Mon, 4 Jun 2018 13:47:15 +0000 (14:47 +0100)]
Dont close ring during FrontendReset
Closing the ring (and destroying the shared pages, etc) is not required
when a HwStorResetBus or SRB_FUNCTION_RESET_DEVICE is triggered.
Disabling the ring will cause any outstanding blkif_requests and SRBs to
be failed.
Owen Smith [Mon, 4 Jun 2018 13:42:23 +0000 (14:42 +0100)]
Remove RingTrigger
RingTrigger calls XENBUS_EVTCHN(Trigger..) on during the suspend
callback. Just before this, the ring is recreated and enabled, which
also calls XENBUS_EVTCHN(Trigger..). The explicit call RingTrigger is
unneccessary
Owen Smith [Mon, 4 Jun 2018 13:36:21 +0000 (14:36 +0100)]
Query "multi-queue-max-queues"
Query "multi-queue-max-queues", and override if neccessary, and work
out a suitable value for the number of queues used. Also adds the
commented out writing code to set "multi-queue-num-queues"
Paul Durrant [Mon, 4 Jun 2018 13:19:52 +0000 (14:19 +0100)]
Update Xen header files
The original patch from Owen contained too much whitespace damage. This
patch replaces it with the results of an invocation of get_xen_headers.py
and includes all resulting header updates.
NOTE: This patch adds one extra header (physdev.h) and a fix to remove
now-duplicate definitions of DOMID_INVALID from frontend.c.
Suggested-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>