Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
docs:techref:flash.layout [2023/10/17 20:21] – [Example flash partitioning] lanchondocs:techref:flash.layout [2023/10/18 09:06] (current) – [NOR flash vs NAND flash] lanchon
Line 30: Line 30:
   * Some partitions are used as large files that can only be read or written completely and in one go. This is the case of raw bootloaders and kernels in MTD partitions. For these partitions, bad blocks are simply skipped during both reads and writes. Because new defects almost exclusively develop during erase and writes, once written these partitions are mostly trusted to be readable forever. (But newer devices tend to duplicate these partitions to minimize failures.)   * Some partitions are used as large files that can only be read or written completely and in one go. This is the case of raw bootloaders and kernels in MTD partitions. For these partitions, bad blocks are simply skipped during both reads and writes. Because new defects almost exclusively develop during erase and writes, once written these partitions are mostly trusted to be readable forever. (But newer devices tend to duplicate these partitions to minimize failures.)
   * Some partitions are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of raw read-only file systems (such as squashfs) in MTD partitions. For these partitions, bad blocks are simply skipped during writes, and a kernel driver is used to read them. The driver reads the complete partition during setup skipping bad blocks, and builds a logical-block-to-flash-block table in RAM to be able to later access the partition random-access.   * Some partitions are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of raw read-only file systems (such as squashfs) in MTD partitions. For these partitions, bad blocks are simply skipped during writes, and a kernel driver is used to read them. The driver reads the complete partition during setup skipping bad blocks, and builds a logical-block-to-flash-block table in RAM to be able to later access the partition random-access.
-  * Some large partitions are used as containers for other compartmentalized data. Note that the amount of bad blocks in a certain partition is a-priory unknown, and thus a raw partition size cannot be taken as the its usable size. For smaller partitions this effect is amplified: although there is a manufacturer-defined limit on the number of bad blocks in a flash, nothing precludes all bad blocks from residing in the same partition. Thus, for guaranteed operation, a system designer should allow _in each and every partition_ the maximum number of bad blocks specified for the complete flash. (In practice though, this is almost never done.) Also note that the previous kinds of defect handling do not spread wear produced by erase/write cycles across the whole flash, and thus in general reduce the lifespan of the device. These problems are both solved by UBI. Ideally a single very large UBI partition is created that entirely manages flash defects and wear-leveling for contained volumes, and inside it different UBI volumes are created:+  * Some large partitions are used as containers for other compartmentalized data. Note that the amount of bad blocks in a certain partition is a-priory unknown, and thus a raw partition size cannot be taken as the its usable size. For smaller partitions this effect is amplified: although there is a manufacturer-defined limit on the number of bad blocks in a flash, nothing precludes all bad blocks from residing in the same partition. Thus, for guaranteed operation, a system designer should allow //in each and every partition// the maximum number of bad blocks specified for the complete flash. (In practice though, this is almost never done.) Also note that the previous kinds of defect handling do not spread wear produced by erase/write cycles across the whole flash, and thus in general reduce the lifespan of the device. These problems are both solved by UBI. Ideally a single very large UBI partition is created that entirely manages flash defects and wear-leveling for contained volumes, and inside it different UBI volumes are created:
     * Some UBI volumes are used as large files that can only be read or written completely and in one go. This is the case of kernels in UBI partitions.     * Some UBI volumes are used as large files that can only be read or written completely and in one go. This is the case of kernels in UBI partitions.
     * Some UBI volumes are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of read-only file systems (such as squashfs) in UBI partitions. For these volumes, an ubiblock kernel device is used to read them: the device emulates a read-only block device and maintains a logical-block-to-flash-block table in RAM to be able to access the volume random-access.     * Some UBI volumes are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of read-only file systems (such as squashfs) in UBI partitions. For these volumes, an ubiblock kernel device is used to read them: the device emulates a read-only block device and maintains a logical-block-to-flash-block table in RAM to be able to access the volume random-access.
Line 45: Line 45:
  
 ===== Partitioning of NOR flash-based devices ===== ===== Partitioning of NOR flash-based devices =====
 +
 On these systems, the storage is presented by the kernel as an MTD device, and it is divided into MTD partitions. The device is not partitioned in the traditional way, where you store information about partitions in a [[wp>GUID Partition Table|GPT]] or [[wp>Master boot record|MBR]]. Instead, the partitioning information is directly known by the bootloader and the kernel, either through configuration, or more typically through baking it in at build time. For example, in the kernel it may simply be defined that //"MTD partition **''kernel''** starts at flash block ''X'' and consists of ''Y'' blocks"//. MTD partitions can be accessed by name or number. On these systems, the storage is presented by the kernel as an MTD device, and it is divided into MTD partitions. The device is not partitioned in the traditional way, where you store information about partitions in a [[wp>GUID Partition Table|GPT]] or [[wp>Master boot record|MBR]]. Instead, the partitioning information is directly known by the bootloader and the kernel, either through configuration, or more typically through baking it in at build time. For example, in the kernel it may simply be defined that //"MTD partition **''kernel''** starts at flash block ''X'' and consists of ''Y'' blocks"//. MTD partitions can be accessed by name or number.
  
 The generic flash layout is: The generic flash layout is:
 ^ Layer0 |  raw flash  |||||| ^ Layer0 |  raw flash  ||||||
-^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  OpenWrt firmware partition  |||  optional \\ SoC \\ specific \\ partition(s) +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  firmware partition  |||  optional \\ SoC \\ specific \\ partition(s) 
-^ Layer2 |:::|:::|  Linux Kernel  |  **''rootfs''** \\ mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] with ''/overlay''  ||:::| +^ Layer2 |:::|:::|  OpenWrt firmware image  ||  //(space available for storage)//  |:::
-^ Layer3 |:::|:::|:::|  **''/dev/root''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#SquashFS|JFFS2]] \\ "freespace  |:::|+^ Layer3 |:::|::: Linux kernel \\ (raw image)  |  **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#JFFS2|JFFS2]] \\ all remaining free space  |:::| 
 +^ Layer4 |:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +Many NOR devices share this scheme, but the flash layout can differ between the devices. Please see the wiki pages for each SoC and devices for information about a particular layout. In case the flash layout differs for your device please update the wiki pages. 
 + 
 + 
 +==== Sysupgrade and ''rootfs_data'' ==== 
 + 
 +To better use the minimal storage on devices available when OpenWrt was originally being developed, the **''rootfs_data''** partition was placed immediately after the OpenWrt firmware image (which contains the kernel and rootfs), without any padding in-between. This means that during upgrades, the beginning of **''rootfs_data''** might need to be overwritten (either because the OpenWrt image grew, or because the NAND flash developed new defects in the firmware area that need to be skipped during firmware flashing). 
 + 
 +To handle this situation, sysupgrade works in an atypical fashion. During an upgrade OpenWrt reads selected content from **''rootfs_data''** that it wants surviving the upgrade into RAM, flashes the new firmware, formats the remaining flash space as the new **''rootfs_data''** partition, and writes back the selected content to it from RAM. 
 + 
 +Because of this, a failed sysupgrade might not only brick the device, it might also cause the contents of **''rootfs_data''** to be irrevocably lost.
  
-Many NOR devices share this scheme, but the flash layout can differ between the devices. Mostly minor details differ concerning U-Boot and SoC specific firmware images. Please see the wiki pages for each SoC and devices for information about particular layout. In case the flash layout differs for your device please update the wiki pages.+Note: Arbitrary files you may choose to store in **''rootfs_data''** are by default **not kept** across sysupgrades (but there is way to request future sysupgrades to conserve selected files).
  
 ==== Example NOR flash partitioning ==== ==== Example NOR flash partitioning ====
Line 115: Line 128:
  
  
-===== Partitioning of UBIFS-Images ===== +===== Partitioning of NAND flash-based devices =====
-UBIFS-Images are suitable for devices with //"raw NAND flash memory"//-chips.+
  
-TODO+On these systems, the storage is presented by the kernel as an MTD device, and it is divided into MTD partitions. The device is not partitioned in the traditional way, where you store information about partitions in a [[wp>GUID Partition Table|GPT]] or [[wp>Master boot record|MBR]]. Instead, the partitioning information is directly known by the bootloader and the kernel, either through configuration, or more typically through baking it in at build time. For example, in the kernel it may simply be defined that //"MTD partition **''kernel''** starts at flash block ''X'' and consists of ''Y'' blocks"//. MTD partitions can be accessed by name or number. 
 + 
 +Some NAND devices contain bootloaders that do not understand UBI partitions and thus cannot boot kernels contained in UBI volumes. The generic flash layout for these devices is: 
 +^ Layer0 |  raw flash  ||||||| 
 +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  Linux kernel \\ (raw image)  |  optional \\ SoC \\ specific \\ partition(s)  |  UBI partition  ||  optional \\ SoC \\ specific \\ partition(s) 
 +^ Layer2 |:::|:::|:::|::: **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#UBIFS|UBIFS]] \\ all remaining free space  |:::| 
 +^ Layer3 |:::|:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +The generic flash layout for NAND devices that can boot kernels contained in UBI volumes is: 
 +^ Layer0 |  raw flash  |||||| 
 +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  UBI partition  |||  optional \\ SoC \\ specific \\ partition(s) 
 +^ Layer2 |:::|::: **''kernel''** \\ Linux kernel \\ (raw image)  |  **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#UBIFS|UBIFS]] \\ all remaining free space  |:::| 
 +^ Layer3 |:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +Many NAND devices share this scheme, but the flash layout can differ between the devices. Please see the wiki pages for each SoC and devices for information about a particular layout. In case the flash layout differs for your device please update the wiki pages. 
 + 
 + 
 +==== Reserving UBI partition space for user-defined UBI volumes ==== 
 + 
 +For [[:docs:techref:flash.layout#sysupgrade_and_rootfs_data|historical reasons]] concerning NOR flash-based devices, sysupgrade works in an atypical fashion. During upgrades OpenWrt reads selected content from **''rootfs_data''** that it wants surviving the upgrade into RAM, creates an all-new **''rootfs_data''**, and writes back the selected content to it from RAM. 
 + 
 +On NAND devices using UBI, sysupgrade partially reads the **''rootfs_data''** volume to RAM, deletes **''kernel''** (for kernel-in-UBI devices), **''rootfs''** and **''rootfs_data''** volumes, recreates **''kernel''** (if kernel-in-UBI) and **''rootfs''** volumes sizing them to fit the new images, recreates the **''rootfs_data''** volume utilizing all remaining free space in the UBI partition, flashes the firmware, and writes back data from RAM to **''rootfs_data''**. 
 + 
 +While this setup worked well for old space-limited NOR devices, it may not be optimal for today's large NANDs. Nowadays, devices with flash sizes of 1 GiB or more are not uncommon, and for these devices moving all flash data to RAM and back is inefficient, unduly dangerous, and may not even be possible. 
 + 
 +Fortunately the default behavior of sysupgrade on NAND devices using UBI can be modified: instead of recreating the **''rootfs_data''** volume utilizing all the free space in the UBI partition, sysupgrade can restrict the volume to a specific user-defined size. The requested **''rootfs_data''** size must be specified in bytes in the **''rootfs_data_max''** bootloader environment variable. (The variable is evaluated when read, so "128*1024*1024", "0x8000000", "134217728" are all valid and equivalent.) 
 + 
 +The relevant bootloader variable can be read with this command: 
 + 
 +<code> 
 +fw_printenv -n rootfs_data_max 
 +</code> 
 + 
 +Set with: 
 + 
 +<code> 
 +fw_setenv rootfs_data_max <VALUE> 
 +</code> 
 + 
 +And cleared with: 
 + 
 +<code> 
 +fw_setenv rootfs_data_max 
 +</code> 
 + 
 +Note that sysupgrade will fail if there is not enough space in the UBI partition to create **''rootfs_data''** of the specified size, and the contents of **''rootfs_data''** will then be lost. (The **''rootfs_data_max''** variable should have better been named **''rootfs_data_size''**.) The user must make sure that enough free space exists in UBI to accommodate growth of future OpenWrt images and/or custom OpenWrt images with more packages. 
 + 
 +==== Example: Creating a UBI volume for persistent storage across sysupgrades ==== 
 + 
 +In an Askey RT4230W REV6 router with 512 MiB flash, the **''rootfs_data''** volume is normally sized at around 370 MiB (the remaining flash space being used for bootloaders, SoC-specific partitions, kernel, rootfs, and recovery). You can check this using: 
 + 
 +<code> 
 +root@router:~# ubinfo -d 0 -N rootfs_data 
 +Volume ID:   2 (on ubi0) 
 +Type:        dynamic 
 +Alignment:   1 
 +Size:        3086 LEBs (391847936 bytes, 373.6 MiB) 
 +State:       OK 
 +Name:        rootfs_data 
 +Character device major/minor: 246:3 
 +</code> 
 + 
 +Given that this volume is routinely wiped by sysupgrade, storing any remotely valuable files here would be ill-advised. For this router you might choose to limit **''rootfs_data''** to a generous 128 MiB, and create a new 192 MiB UBIFS volume for persistent storage, while still reserving 50+ MiB as free space to accommodate future growth of OpenWrt images. Let's do just that and name the new volume **''extra''**. 
 + 
 +First you need to limit **''rootfs_data''** to 128 MiB for all following sysupgrades: 
 + 
 +<code> 
 +root@router:~# fw_setenv rootfs_data_max 0x8000000 
 +</code> 
 + 
 +Next do a sysupgarde (even if no upgrade is needed) to resize **''rootfs_data''**. After that, verify its new size: 
 + 
 +<code> 
 +root@router:~# ubinfo -d 0 -N rootfs_data 
 +Volume ID:   2 (on ubi0) 
 +Type:        dynamic 
 +Alignment:   1 
 +Size:        1058 LEBs (134340608 bytes, 128.1 MiB) 
 +State:       OK 
 +Name:        rootfs_data 
 +Character device major/minor: 246:3 
 +</code> 
 + 
 +You just freed 240+ MiB in the UBI partition. Next, you could manually create, format, and mount a new UBIFS volume. But OpenWrt has a tool to automate this, so let's use it. 
 + 
 +Connect the router to the internet if necessary, and use Luci to install package ''**uvol**'' (**System > Software**). You might also want to install your favorite text editor now (''**nano-full**'' is a good option). 
 + 
 +Now check the installation (sizes are in bytes): 
 + 
 +<code> 
 +root@router:~# uvol list 
 +root@router:~# uvol total 
 +422576128 
 +root@router:~# uvol free 
 +253317120 
 +</code> 
 + 
 +Create and enable the ''**extra**'' volume using ''**uvol**'': 
 + 
 +<code> 
 +root@router:~# uvol create extra $(( 192*1024*1024 )) rw 
 +Volume ID 4, size 1586 LEBs (201383936 bytes, 192.0 MiB), LEB size 126976 bytes (124.0 KiB), dynamic, name "uvol-wp-extra", alignment 1 
 +root@router:~# uvol up extra 
 +root@router:~# uvol list 
 +extra rw 201383936 
 +root@router:~# mount | grep extra 
 +/dev/ubi0_4 on /tmp/run/uvol/extra type ubifs (rw,relatime,assert=read-only,ubi=0,vol=4) 
 +</code> 
 + 
 +You do not like the default mount path (''**/tmp/run/uvol/extra**''), so you change it to ''**/extra**'' using you text editor: 
 + 
 +<code> 
 +root@router:~# nano /etc/config/fstab  
 +</code> 
 + 
 +Finally reboot and check that your new volume is mounted where you want it: 
 + 
 +<code> 
 +root@router:~# mount | grep extra 
 +/dev/ubi0_4 on /extra type ubifs (rw,relatime,assert=read-only,ubi=0,vol=4) 
 +</code>
  
 ===== MTD (Memory Technology Device) and MTDSPLIT ===== ===== MTD (Memory Technology Device) and MTDSPLIT =====
  • Last modified: 2023/10/17 20:21
  • by lanchon