Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
docs:techref:flash.layout [2018/02/20 20:11] – ↷ Links adapted because of a move operation docs:techref:flash.layout [2023/10/18 09:06] (current) – [NOR flash vs NAND flash] lanchon
Line 9: Line 9:
 Non-mechanical wear does not only occur when flash memory is erased! Non-mechanical wear does not only occur when flash memory is erased!
  
-| {{:meta:icons:tango:dialog-information.png?nolink}} | 1. Flash memory is more likely to experience failure then a [[wp>hard disk drive (HDD)]] (the ones with the platters rotating at 5400–15000 [[wp>Revolutions per minute|RPM]])\\ 2. Some types of flash memory seem to experience more non-mechanical wear then other types\\ 3. How do we deal with failure? |+| {{:meta:icons:tango:dialog-information.png?nolink}} | 1. Flash memory is more likely to experience failure than a [[wp>Hard_disk_drive]] (the ones with the platters rotating at 5400–15000 [[wp>Revolutions per minute|RPM]])\\ 2. Some types of flash memory seem to experience more non-mechanical wear then other types\\ 3. How do we deal with failure? |
  
  
 ==== Host-managed vs. self-managed ==== ==== Host-managed vs. self-managed ====
-Based on how the flash memory chip is connected with the [[docs:hardware:soc|SoC]] (i.e. the "host") we at OpenWrt distinguish between **//"raw flash"//** or **//"host-managed"//** and **//"FTL (Flash Translation Layer) flash"//** or **//"self-managed"//**: in case the flash memory chip is connected directly with the SoC we call it "raw flash" / "host-managed" and in case there is an additional controller chip between the flash memory chip and the SoC, we call it "FTL flash" / "self-managed". Primarily the controller chip does [[wp>wear-leveling]] and manages known bad blocks, but it may do other stuff as well. The flash memory cannot be accessed directly, but only through this controller. The controller has to be considered a [[wp>black box]].+Based on how the flash memory chip is connected with the [[docs:techref:hardware:soc|SoC]] (i.e. the "host") we at OpenWrt distinguish between **//"raw flash"//** or **//"host-managed"//** and **//"FTL (Flash Translation Layer) flash"//** or **//"self-managed"//**: in case the flash memory chip is connected directly with the SoC we call it "raw flash" / "host-managed" and in case there is an additional controller chip between the flash memory chip and the SoC, we call it "FTL flash" / "self-managed". Primarily the controller chip does [[wp>wear-leveling]] and manages known bad blocks, but it may do other stuff as well. The flash memory cannot be accessed directly, but only through this controller. The controller has to be considered a [[wp>black box]].
  
 | {{:meta:icons:tango:dialog-information.png?nolink}} | Embedded systems almost exclusively use "raw flash", while [[wp>Solid-state drive|solid-state drives (SSDs)]] and USB memory sticks, almost exclusively use "FTL flash"!\\ | | {{:meta:icons:tango:dialog-information.png?nolink}} | Embedded systems almost exclusively use "raw flash", while [[wp>Solid-state drive|solid-state drives (SSDs)]] and USB memory sticks, almost exclusively use "FTL flash"!\\ |
Line 21: Line 21:
 Additionally we at OpenWrt distinguish between the two basic types of flash memory: [[wp>Flash_memory#NOR_flash|NOR flash]] and [[wp>Flash_memory#NAND_flash|NAND flash]].\\ Additionally we at OpenWrt distinguish between the two basic types of flash memory: [[wp>Flash_memory#NOR_flash|NOR flash]] and [[wp>Flash_memory#NAND_flash|NAND flash]].\\
  
-"Raw NOR flash" in typical routers is generally small (4 MiB – 16 MiB) and __error-free__, i.e. there cannot be bad erase blocks. Because raw NOR flash is error-free, the installed file system(s) do not need to take bad erase blocks into account, and neither SquashFS nor JFFS2 doThe combination of OverlayFS with SquashFS and JFFS2 has been the default OpenWrt setup since the beginning, and it works flawlessly on "raw NOR flash".+"Raw NOR flash" in typical routers is generally small (4 MiB – 16 MiB) and __error-free__: all data blocks are guaranteed to work correctly. Because raw NOR flash is error-free, the installed file system(s) do not need to take bad blocks into account, and neither SquashFS nor JFFS2 doThe combination of OverlayFS with SquashFS and JFFS2 has been the default OpenWrt setup since the beginning, and it works flawlessly on "raw NOR flash". Older routers typically use NOR flash.
  
-"Raw NAND flash" in typical routers is generally larger (32 MiB – 256 MiB) and __not error-free__, i.e. it may contain bad erase blocks. A solution to deal with bad erase blocks comprises three provisions:+"Raw NAND flash" in typical routers is generally much larger (32 MiB – 1 GiB) and __not error-free__: in general the flash contains bad blocks when new and may develop more at any timeNewer routers use NAND flash because it is much cheaper for a given capacity and is also faster for bulk access (disk emulation), but at the cost of the increased complexity required to handle flash defects.
  
-  - the manufacturer of the "raw NAND flash" has to guarantee that certain erase blocks are error-free: +Bad blocks in NAND flash and handled in various ways:
-    * namely the one(s) which the bootloader is to be written to +
-    * but also the ones which the Linux kernel and the SquashFS are to be written to, because the firmware image file is generated on some desktop computer, that cannot know which erase blocks of the "raw NAND flash" of the device are bad. +
-  - the [[doc/howto/obtain.firmware.generate|OpenWrt Firmware Generator]] has be constrained to build only file sizes that are equal or smaller then the size of the area of the "raw NAND flash", that consists of guaranteed error-free erase blocks. +
-  - OpenWrt would replace JFFS2 with [[docs:techref:filesystems#UBIFS|UBIFS]], and the entire area of the "raw NAND flash", that consists of potentially bad erase blocks, would be written to exclusively from an installed OpenWrt system through UBIFS.+
  
-| {{:meta:icons:tango:dialog-information.png?nolink}} | Older routers generally have "raw NOR flash" but many newer routers have "raw NAND flash". |+  * The NAND flash manufacturer guarantees that certain very small areas of the flash are defect-freeThe use of such areas is up to the system designer. Some SoCs may store the first stage bootloader there (but since newer SoCs tend to support chain-of-trust booting, they typically store the first stage bootloader on-chip). 
 +  * Some partitions are used as large files that can only be read or written completely and in one go. This is the case of raw bootloaders and kernels in MTD partitions. For these partitions, bad blocks are simply skipped during both reads and writes. Because new defects almost exclusively develop during erase and writes, once written these partitions are mostly trusted to be readable forever. (But newer devices tend to duplicate these partitions to minimize failures.) 
 +  * Some partitions are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of raw read-only file systems (such as squashfs) in MTD partitions. For these partitions, bad blocks are simply skipped during writes, and a kernel driver is used to read them. The driver reads the complete partition during setup skipping bad blocks, and builds a logical-block-to-flash-block table in RAM to be able to later access the partition random-access. 
 +  * Some large partitions are used as containers for other compartmentalized data. Note that the amount of bad blocks in a certain partition is a-priory unknown, and thus a raw partition size cannot be taken as the its usable size. For smaller partitions this effect is amplified: although there is a manufacturer-defined limit on the number of bad blocks in a flash, nothing precludes all bad blocks from residing in the same partition. Thus, for guaranteed operation, a system designer should allow //in each and every partition// the maximum number of bad blocks specified for the complete flash. (In practice though, this is almost never done.) Also note that the previous kinds of defect handling do not spread wear produced by erase/write cycles across the whole flash, and thus in general reduce the lifespan of the device. These problems are both solved by UBI. Ideally a single very large UBI partition is created that entirely manages flash defects and wear-leveling for contained volumes, and inside it different UBI volumes are created: 
 +    * Some UBI volumes are used as large files that can only be read or written completely and in one go. This is the case of kernels in UBI partitions. 
 +    * Some UBI volumes are used as large files that can only be written completely and in one go, but can be read in a random access fashion. This is the case of read-only file systems (such as squashfs) in UBI partitions. For these volumes, an ubiblock kernel device is used to read them: the device emulates a read-only block device and maintains a logical-block-to-flash-block table in RAM to be able to access the volume random-access. 
 +    * Some UBI volumes are used as read-write filesystems. Only the UBIFS filesystem is used for this. (It would be possible to emulate read-write block devices on top of UBI and use regular filesystems on top of that, but such setups would underperfom compared to UBIFS, and it seems that the necessary UBI block emulation driver has not yet been implemented, if ever.) 
 + 
 +Note that because of these factors, the OpenWrt [[docs:guide-user:additional-software:imagebuilder|Image Generator]] has been constrained to build images that are smaller than the size of the partitions to which they are supposed to be flashed by an arbitrary margin, to maximize the probability that such images can be flashed on all devices.
  
  
 ==== MLC vs. SLC flash ==== ==== MLC vs. SLC flash ====
-We can further distinguish between [[wp>Multi-level cell|multi-level cell (MLC)]] and single-level cell (SLC) flash+The main difference between SLC and MLC is durability. 
 +[[wp>Single-level cell|single-level cell (SLC)]] flash memory may have a lifetime of about 50,000 to 100,000 program/erase cycles, while [[wp>Multi-level cell|multi-level cell (MLC)]] flash may have a lifetime of about 1,000 to 10,000 program/erase cycles.
  
 +To be noted that it is **NOT RIGHT** to estimate the life of a NAND flash in embedded devices using the same method for SSD!
  
-===== Partitioning of the Flash ===== +===== Partitioning of NOR flash-based devices =====
-Almost all embedded systems contain //"raw flash"//-chips. The available storage is not partitioned in the traditional way, where you store the data about the partitions in the [[wp>Master boot record|MBR]] and [[wp>Volume boot record|PBR]]s, but it is done in the Linux Kernel (and sometimes independently in the [[docs:techref:bootloader]] as well!). It's simply defined, that "//partition **''kernel''** starts at offset ''x'' and ends at offset ''y''//". Using names allows convenient addressing of partitions by name instead of giving the start offset over and over again.+
  
-The generic Flash layout is:+On these systems, the storage is presented by the kernel as an MTD device, and it is divided into MTD partitions. The device is not partitioned in the traditional way, where you store information about partitions in a [[wp>GUID Partition Table|GPT]] or [[wp>Master boot record|MBR]]. Instead, the partitioning information is directly known by the bootloader and the kernel, either through configuration, or more typically through baking it in at build time. For example, in the kernel it may simply be defined that //"MTD partition **''kernel''** starts at flash block ''X'' and consists of ''Y'' blocks"//. MTD partitions can be accessed by name or number. 
 + 
 +The generic flash layout is:
 ^ Layer0 |  raw flash  |||||| ^ Layer0 |  raw flash  ||||||
-^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  OpenWrt firmware partition  |||  optional \\ SoC \\ specific \\ partition(s) +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  firmware partition  |||  optional \\ SoC \\ specific \\ partition(s) 
-^ Layer2 |:::|:::|  Linux Kernel  |  **''rootfs''** \\ mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] with ''/overlay''  ||:::| +^ Layer2 |:::|:::|  OpenWrt firmware image  ||  //(space available for storage)//  |:::
-^ Layer3 |:::|:::|:::|  **''/dev/root''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#SquashFS|JFFS2]] \\ "freespace  |:::|+^ Layer3 |:::|::: Linux kernel \\ (raw image)  |  **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#JFFS2|JFFS2]] \\ all remaining free space  |:::| 
 +^ Layer4 |:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +Many NOR devices share this scheme, but the flash layout can differ between the devices. Please see the wiki pages for each SoC and devices for information about a particular layout. In case the flash layout differs for your device please update the wiki pages. 
 + 
 + 
 +==== Sysupgrade and ''rootfs_data'' ==== 
 + 
 +To better use the minimal storage on devices available when OpenWrt was originally being developed, the **''rootfs_data''** partition was placed immediately after the OpenWrt firmware image (which contains the kernel and rootfs), without any padding in-between. This means that during upgrades, the beginning of **''rootfs_data''** might need to be overwritten (either because the OpenWrt image grew, or because the NAND flash developed new defects in the firmware area that need to be skipped during firmware flashing). 
 + 
 +To handle this situation, sysupgrade works in an atypical fashion. During an upgrade OpenWrt reads selected content from **''rootfs_data''** that it wants surviving the upgrade into RAM, flashes the new firmware, formats the remaining flash space as the new **''rootfs_data''** partition, and writes back the selected content to it from RAM. 
 + 
 +Because of this, a failed sysupgrade might not only brick the device, it might also cause the contents of **''rootfs_data''** to be irrevocably lost.
  
-Many newer devices share this scheme, but the flash layout can differ between the devices! Mostly minor details slightly differ concerning U-Boot and SoC specific firmware images. Please see the wiki pages for each SoC and devices for information about particular layoutIn case the flash layout differs for your device please update the wiki pages.\\ +Note: Arbitrary files you may choose to store in **''rootfs_data''** are by default **not kept** across sysupgrades (but there is way to request future sysupgrades to conserve selected files).
-Here are some examples how it looks on actual devices:+
  
-=== Example 1: TP-Link TL-WR1043ND === +==== Example NOR flash partitioning ==== 
-[[docs:hardware:soc:soc.qualcomm|Qualcomm Atheros]]-based [[toh:tp-link:TL-WR1043ND]]. Somebody also provided a [[https://web.archive.org/web/20131021013058/http://ubuntuone.com/2aPBH9pwkxtYzy93S0cS1z|LibreOffice Calc ODS]].+[[docs:techref:hardware:soc:soc.qualcomm|Qualcomm Atheros]]-based [[toh:tp-link:TL-WR1043ND]]. Somebody also provided a [[https://web.archive.org/web/20131021013058/http://ubuntuone.com/2aPBH9pwkxtYzy93S0cS1z|LibreOffice Calc ODS]].
  
 SquashFS-Images are suitable for devices with //"raw NOR flash memory"//-chips and it is not recommended to install them onto devices with //"raw NAND flash memory"//-chips. SquashFS-Images comprise both, a SquashFS partition and an JFFS2 partition. JFFS2-Images omit the SquashFS partition. SquashFS-Images are suitable for devices with //"raw NOR flash memory"//-chips and it is not recommended to install them onto devices with //"raw NAND flash memory"//-chips. SquashFS-Images comprise both, a SquashFS partition and an JFFS2 partition. JFFS2-Images omit the SquashFS partition.
Line 66: Line 84:
 ^ <color magenta>mountpoint</color>    //none//                    |  //none//                      <color magenta>''/rom''</color>  |  <color magenta>''/overlay''</color>    //none//  | ^ <color magenta>mountpoint</color>    //none//                    |  //none//                      <color magenta>''/rom''</color>  |  <color magenta>''/overlay''</color>    //none//  |
 ^ filesystem    //none//                    |  //none//                      [[docs:techref:filesystems#SquashFS]]  |  [[docs:techref:filesystems#JFFS2]]  |  //none//  | ^ filesystem    //none//                    |  //none//                      [[docs:techref:filesystems#SquashFS]]  |  [[docs:techref:filesystems#JFFS2]]  |  //none//  |
 +
 +=== Another Flash layout example  ===
 +[[:toh:tp-link:archer_c6_v2#flash_layout|TP-Link Archer C6 V2 (EU/RU/JP)]]
  
 ==== Explanations ==== ==== Explanations ====
Line 71: Line 92:
  
 Since the partitions are nested we look at this whole thing in layers: Since the partitions are nested we look at this whole thing in layers:
-  - Layer0: So we have the Flashchip, 8 MiB in size, which is soldered to the PCB and connected to the [[docs:hardware:soc]] over [[wp>Serial Peripheral Interface Bus|SPI (Serial Peripheral Interface Bus)]].+  - Layer0: So we have the Flashchip, 8 MiB in size, which is soldered to the PCB and connected to the [[docs:techref:hardware:soc]] over [[wp>Serial Peripheral Interface Bus|SPI (Serial Peripheral Interface Bus)]].
   - Layer1: We "partition" the space into mtd0 for the bootloader, mtd5 for OpenWrt and, in this case, mtd4 for the ART (Atheros Radio Test) - it contains calibration data for the wifi (EEPROM). If it is missing or corrupt, ''ath9k'' (wireless driver) won't come up anymore. The bootloader (128 KiB) contains of the u-boot 64KiB block AND a data section which contains the MAC, WPS-PIN and type description. If no MAC is configured ath9k will not work correctly due to a faulty MAC.   - Layer1: We "partition" the space into mtd0 for the bootloader, mtd5 for OpenWrt and, in this case, mtd4 for the ART (Atheros Radio Test) - it contains calibration data for the wifi (EEPROM). If it is missing or corrupt, ''ath9k'' (wireless driver) won't come up anymore. The bootloader (128 KiB) contains of the u-boot 64KiB block AND a data section which contains the MAC, WPS-PIN and type description. If no MAC is configured ath9k will not work correctly due to a faulty MAC.
-  - Layer2: we subdivide mtd5 (firmware) into mtd1 (kernel) and mtd2 (rootfs); In the generation process of the firmware (see [[doc:howto:obtain.firmware.generate]]) the Kernel binary file is first packed with [[wp>Lempel–Ziv–Markov chain algorithm|LZMA]], then the obtained file is packed with [[wp>gzip]] and then this file will be written onto the raw flash (mtd1) without being part of any filesystem! During boot, u-boot copies this entire section into RAM and executes it. From there on, the Linux kernel bootstraps itself…+  - Layer2: we subdivide mtd5 (firmware) into mtd1 (kernel) and mtd2 (rootfs); In the generation process of the firmware (see [[docs:guide-user:additional-software:imagebuilder]]) the Kernel binary file is first packed with [[wp>Lempel–Ziv–Markov chain algorithm|LZMA]], then the obtained file is packed with [[wp>gzip]] and then this file will be written onto the raw flash (mtd1) without being part of any filesystem! During boot, u-boot copies this entire section into RAM and executes it. From there on, the Linux kernel bootstraps itself…
   - Layer3: we subdivide rootfs even further into mtd3 for rootfs_data and the rest for an unnamed partition which will accommodate the SquashFS-partition.   - Layer3: we subdivide rootfs even further into mtd3 for rootfs_data and the rest for an unnamed partition which will accommodate the SquashFS-partition.
  
 === Mount Points === === Mount Points ===
   * <color magenta>''/''</color> this is your entire root filesystem, it comprises ''/rom'' and ''/overlay''. Please ignore ''/rom'' and ''/overlay'' and use exclusively ''/'' for your daily routines!   * <color magenta>''/''</color> this is your entire root filesystem, it comprises ''/rom'' and ''/overlay''. Please ignore ''/rom'' and ''/overlay'' and use exclusively ''/'' for your daily routines!
-  * <color magenta>''/rom''</color>  contains all the basic files, like ''busybox'', ''dropbear'' or ''iptables''. It also includes default configuration files used when booting into [[docs:user-guide:troubleshooting:failsafe_and_factory_reset|OpenWrt Failsafe mode]]. It does not contain the Linux kernel. All files in this directory are located on the SqashFS partition, and thus cannot be altered or deleted. But, because we use overlay_fs filesystem, //overlay-whiteout//-symlinks can be created on the JFFS2 partition. +  * <color magenta>''/rom''</color>  contains all the basic files, like ''busybox'', ''dropbear'' or ''iptables''. It also includes default configuration files used when booting into [[docs:guide-user:troubleshooting:failsafe_and_factory_reset|OpenWrt Failsafe mode]]. It does not contain the Linux kernel. All files in this directory are located on the SquashFS partition, and thus cannot be altered or deleted. But, because we use overlay_fs filesystem, //overlay-whiteout//-symlinks can be created on the JFFS2 partition. 
-  * <color magenta>''/overlay''</color>  is the writable part of the file system that gets merged with ''/rom'' to create a uniform ''/''-tree. It contains anything that was written to the router after [[docs:user-guide:installation:generic.flashing|installation]], e.g. changed configuration files, additional packages installed with ''[[docs:user-guide:additional-software:opkg]]'', etc. It is formated with JFFS2.+  * <color magenta>''/overlay''</color>  is the writable part of the file system that gets merged with ''/rom'' to create a uniform ''/''-tree. It contains anything that was written to the router after [[docs:guide-user:installation:generic.flashing|installation]], e.g. changed configuration files, additional packages installed with ''[[docs:guide-user:additional-software:opkg]]'', etc. It is formatted with JFFS2.
  
 Whenever the system is asked to look for an existing file in ''/'', it first looks in ''/overlay'', and if not there, then in ''/rom'' In this way ''/overlay'' overrides ''/rom'' and creates the effect of a writable ''/'' while much of the content is safely and efficiently stored in the read-only ''/rom''. Whenever the system is asked to look for an existing file in ''/'', it first looks in ''/overlay'', and if not there, then in ''/rom'' In this way ''/overlay'' overrides ''/rom'' and creates the effect of a writable ''/'' while much of the content is safely and efficiently stored in the read-only ''/rom''.
  
-When the system is asked to delete a file that is in ''/rom'', it instead creates a corresponding entry in ''/overlay'', a whiteout.  A whiteout is a symlink to ''(overlay-whiteout)'' that mostly behaves like a file that doesn't exist.+When the system is asked to delete a file that is in ''/rom'', it instead creates a corresponding entry in ''/overlay'', a whiteout.  A whiteout is a symlink to ''(overlay-whiteout)'' that mostly behaves like a file that doesn't exist. In newer versions, the whiteout is created as a character device with 0/0 device number instead.
  
 <code bash> <code bash>
 #!/bin/sh #!/bin/sh
 # shows all overlay-whiteout symlinks # shows all overlay-whiteout symlinks
-# 2018 dont think this script works anymore.  overlay-whiteouts are a character device on CC +# 2018overlay-whiteouts are a character device on CC 'find /overlay -type c' seems to work
-'find /overlay -type -c'  seems to work+
 #  https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt  put me on the right track #  https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt  put me on the right track
  
- +find /overlay -type c; find /overlay -type l -exec sh -c \
-find /overlay -type l -exec sh -c \+
     'for x; do [ "$(readlink -n -- "$x")" = "(overlay-whiteout)" ] && printf %s\\n "$x"; done' -- {} +     'for x; do [ "$(readlink -n -- "$x")" = "(overlay-whiteout)" ] && printf %s\\n "$x"; done' -- {} +
 </code> </code>
  
 === Example 2: Hoo Too HT-TM02 === === Example 2: Hoo Too HT-TM02 ===
-[[docs:hardware:soc:soc.ralink|Ralink RT5350F]]-based [[toh:hwdata:hootoo:hootoo_tripmatenano_v15|Hoo Too HT-TM02]].+[[docs:techref:hardware:soc:soc.ralink|Ralink RT5350F]]-based [[toh:hwdata:hootoo:hootoo_tripmatenano_v15|Hoo Too HT-TM02]].
  
 ^ Layer0 |  raw flash, 8192 KiB  |||||| ^ Layer0 |  raw flash, 8192 KiB  ||||||
Line 106: Line 125:
  
 === Example 3: D-Link DIR-300 === === Example 3: D-Link DIR-300 ===
-For some devices, the OpenWrt partition ''firmware'' may not exist at all. The [[toh:d-link:DIR-300#flash.layout|DIR-300 flash layout]] is such an example.+For some devices, the OpenWrt partition ''firmware'' may not exist at all. The [[toh:d-link:DIR-300#flash_layout|DIR-300 flash layout]] is such an example.
  
  
-===== Partitioning of UBIFS-Images ===== +===== Partitioning of NAND flash-based devices =====
-UBIFS-Images are suitable for devices with //"raw NAND flash memory"//-chips.+
  
-TODO+On these systems, the storage is presented by the kernel as an MTD device, and it is divided into MTD partitions. The device is not partitioned in the traditional way, where you store information about partitions in a [[wp>GUID Partition Table|GPT]] or [[wp>Master boot record|MBR]]. Instead, the partitioning information is directly known by the bootloader and the kernel, either through configuration, or more typically through baking it in at build time. For example, in the kernel it may simply be defined that //"MTD partition **''kernel''** starts at flash block ''X'' and consists of ''Y'' blocks"//. MTD partitions can be accessed by name or number.
  
-===== MTD (Memory Technology Device) =====+Some NAND devices contain bootloaders that do not understand UBI partitions and thus cannot boot kernels contained in UBI volumes. The generic flash layout for these devices is: 
 +^ Layer0 |  raw flash  ||||||| 
 +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  Linux kernel \\ (raw image)  |  optional \\ SoC \\ specific \\ partition(s)  |  UBI partition  ||  optional \\ SoC \\ specific \\ partition(s) 
 +^ Layer2 |:::|:::|:::|::: **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#UBIFS|UBIFS]] \\ all remaining free space  |:::| 
 +^ Layer3 |:::|:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +The generic flash layout for NAND devices that can boot kernels contained in UBI volumes is: 
 +^ Layer0 |  raw flash  |||||| 
 +^ Layer1 |  bootloader \\ partition(s)  |  optional \\ SoC \\ specific \\ partition(s)  |  UBI partition  |||  optional \\ SoC \\ specific \\ partition(s) 
 +^ Layer2 |:::|::: **''kernel''** \\ Linux kernel \\ (raw image)  |  **''rootfs''** \\ mounted: "''/rom''", [[docs:techref:filesystems#SquashFS|SquashFS]] \\ size depends on selected packages  |  **''rootfs_data''** \\ mounted: "''/overlay''", [[docs:techref:filesystems#UBIFS|UBIFS]] \\ all remaining free space  |:::| 
 +^ Layer3 |:::|:::|::: mounted: "''/''", [[docs:techref:filesystems#overlayfs|OverlayFS]] \\ stacking ''/overlay'' on top of ''/rom''  ||:::| 
 + 
 +Many NAND devices share this scheme, but the flash layout can differ between the devices. Please see the wiki pages for each SoC and devices for information about a particular layout. In case the flash layout differs for your device please update the wiki pages. 
 + 
 + 
 +==== Reserving UBI partition space for user-defined UBI volumes ==== 
 + 
 +For [[:docs:techref:flash.layout#sysupgrade_and_rootfs_data|historical reasons]] concerning NOR flash-based devices, sysupgrade works in an atypical fashion. During upgrades OpenWrt reads selected content from **''rootfs_data''** that it wants surviving the upgrade into RAM, creates an all-new **''rootfs_data''**, and writes back the selected content to it from RAM. 
 + 
 +On NAND devices using UBI, sysupgrade partially reads the **''rootfs_data''** volume to RAM, deletes **''kernel''** (for kernel-in-UBI devices), **''rootfs''** and **''rootfs_data''** volumes, recreates **''kernel''** (if kernel-in-UBI) and **''rootfs''** volumes sizing them to fit the new images, recreates the **''rootfs_data''** volume utilizing all remaining free space in the UBI partition, flashes the firmware, and writes back data from RAM to **''rootfs_data''**. 
 + 
 +While this setup worked well for old space-limited NOR devices, it may not be optimal for today's large NANDs. Nowadays, devices with flash sizes of 1 GiB or more are not uncommon, and for these devices moving all flash data to RAM and back is inefficient, unduly dangerous, and may not even be possible. 
 + 
 +Fortunately the default behavior of sysupgrade on NAND devices using UBI can be modified: instead of recreating the **''rootfs_data''** volume utilizing all the free space in the UBI partition, sysupgrade can restrict the volume to a specific user-defined size. The requested **''rootfs_data''** size must be specified in bytes in the **''rootfs_data_max''** bootloader environment variable. (The variable is evaluated when read, so "128*1024*1024", "0x8000000", "134217728" are all valid and equivalent.) 
 + 
 +The relevant bootloader variable can be read with this command: 
 + 
 +<code> 
 +fw_printenv -n rootfs_data_max 
 +</code> 
 + 
 +Set with: 
 + 
 +<code> 
 +fw_setenv rootfs_data_max <VALUE> 
 +</code> 
 + 
 +And cleared with: 
 + 
 +<code> 
 +fw_setenv rootfs_data_max 
 +</code> 
 + 
 +Note that sysupgrade will fail if there is not enough space in the UBI partition to create **''rootfs_data''** of the specified size, and the contents of **''rootfs_data''** will then be lost. (The **''rootfs_data_max''** variable should have better been named **''rootfs_data_size''**.) The user must make sure that enough free space exists in UBI to accommodate growth of future OpenWrt images and/or custom OpenWrt images with more packages. 
 + 
 +==== Example: Creating a UBI volume for persistent storage across sysupgrades ==== 
 + 
 +In an Askey RT4230W REV6 router with 512 MiB flash, the **''rootfs_data''** volume is normally sized at around 370 MiB (the remaining flash space being used for bootloaders, SoC-specific partitions, kernel, rootfs, and recovery). You can check this using: 
 + 
 +<code> 
 +root@router:~# ubinfo -d 0 -N rootfs_data 
 +Volume ID:   2 (on ubi0) 
 +Type:        dynamic 
 +Alignment:   1 
 +Size:        3086 LEBs (391847936 bytes, 373.6 MiB) 
 +State:       OK 
 +Name:        rootfs_data 
 +Character device major/minor: 246:3 
 +</code> 
 + 
 +Given that this volume is routinely wiped by sysupgrade, storing any remotely valuable files here would be ill-advised. For this router you might choose to limit **''rootfs_data''** to a generous 128 MiB, and create a new 192 MiB UBIFS volume for persistent storage, while still reserving 50+ MiB as free space to accommodate future growth of OpenWrt images. Let's do just that and name the new volume **''extra''**. 
 + 
 +First you need to limit **''rootfs_data''** to 128 MiB for all following sysupgrades: 
 + 
 +<code> 
 +root@router:~# fw_setenv rootfs_data_max 0x8000000 
 +</code> 
 + 
 +Next do a sysupgarde (even if no upgrade is needed) to resize **''rootfs_data''**. After that, verify its new size: 
 + 
 +<code> 
 +root@router:~# ubinfo -d 0 -N rootfs_data 
 +Volume ID:   2 (on ubi0) 
 +Type:        dynamic 
 +Alignment:   1 
 +Size:        1058 LEBs (134340608 bytes, 128.1 MiB) 
 +State:       OK 
 +Name:        rootfs_data 
 +Character device major/minor: 246:3 
 +</code> 
 + 
 +You just freed 240+ MiB in the UBI partition. Next, you could manually create, format, and mount a new UBIFS volume. But OpenWrt has a tool to automate this, so let's use it. 
 + 
 +Connect the router to the internet if necessary, and use Luci to install package ''**uvol**'' (**System > Software**). You might also want to install your favorite text editor now (''**nano-full**'' is a good option). 
 + 
 +Now check the installation (sizes are in bytes): 
 + 
 +<code> 
 +root@router:~# uvol list 
 +root@router:~# uvol total 
 +422576128 
 +root@router:~# uvol free 
 +253317120 
 +</code> 
 + 
 +Create and enable the ''**extra**'' volume using ''**uvol**'': 
 + 
 +<code> 
 +root@router:~# uvol create extra $(( 192*1024*1024 )) rw 
 +Volume ID 4, size 1586 LEBs (201383936 bytes, 192.0 MiB), LEB size 126976 bytes (124.0 KiB), dynamic, name "uvol-wp-extra", alignment 1 
 +root@router:~# uvol up extra 
 +root@router:~# uvol list 
 +extra rw 201383936 
 +root@router:~# mount | grep extra 
 +/dev/ubi0_4 on /tmp/run/uvol/extra type ubifs (rw,relatime,assert=read-only,ubi=0,vol=4) 
 +</code> 
 + 
 +You do not like the default mount path (''**/tmp/run/uvol/extra**''), so you change it to ''**/extra**'' using you text editor: 
 + 
 +<code> 
 +root@router:~# nano /etc/config/fstab  
 +</code> 
 + 
 +Finally reboot and check that your new volume is mounted where you want it: 
 + 
 +<code> 
 +root@router:~# mount | grep extra 
 +/dev/ubi0_4 on /extra type ubifs (rw,relatime,assert=read-only,ubi=0,vol=4) 
 +</code> 
 + 
 +===== MTD (Memory Technology Device) and MTDSPLIT =====
 The Linux kernel treats "raw/host-managed" flash memory (NOR and NAND alike) as an MTD (Memory Technology Device). An MTD is different to a [[wp>block device]] or a [[wp>character device]]. The Linux kernel treats "raw/host-managed" flash memory (NOR and NAND alike) as an MTD (Memory Technology Device). An MTD is different to a [[wp>block device]] or a [[wp>character device]].
  
-  * e.g. a [[wp>hard disk drive]] with platters spinning at 7200rpm is treated as a block device:+On a common block device such as a hard drive, the storage space is split up into "blocks", which are also named "sectors", of a size of 512 Bytes or 4096 BytesBlocks do not get corrupted during common operation, but only exceptionallyIn the very rare case this happens, the LBA hard disk controller takes care, that accesses to such bad block are redirected to a replacement block. Block devices support 2 main operations - read a whole block and write a whole block. When a block device is partitioned, the information is stored in the [[wp>Master boot record|MBR]] or the [[wp>GUID Partition Table|GPT]].
  
-The storage space of a block device is split up into "blocks", which are also named "sectors", of a size of 512 Bytes or 4096 Bytes. Blocks do not get corrupted during common operation, but only exceptionally. In the very rare case this happens, the LBA hard disk controller takes care, that accesses to such a bad block are redirected to a replacement block. Block devices support 2 main operations - read a whole block and write a whole block. When a block device is partitioned, the information is stored in the [[wp>Master boot record|MBR]] or the [[wp>GUID Partition Table|GPT]].+Flash memory using MTD is different from this.
  
-  * e.g. flash memory is treated as MTD (Memory Technology Device):+The storage space of a MTD is split up into "erase-blocks", of a size of e.g 64 KiB, 128 KiB or much more, which themselves are split up into "blocks", which are more correctly named "pages", of smaller sizes.
  
-The storage space of a MTD is split up into "erase-blocks", of a size of 64 KiB or much more, which themselves are split up into "blocks", which are more correctly named "pages", of smaller sizes. A single "page" can be written to, but it cannot be overwritten, but instead the entire "erase block" that page is part of, has to be  +A single "page" can be written to, but it cannot be overwritten, but instead the entire "erase block" that page is part of, has to be erased before it becomes possible to re-write its "pages". Erase-blocks do become worn out after some number of erase cycles – typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC NAND flashes. Erase-blocks may become bad (only NAND). In case of "FTL flash", the controller should notice and avoid further access to bad erase-blocks. In case of "raw flash", the operating system should deal with such cases.
-erased before it becomes possible to re-write its "pages". Erase-blocks do become worn out after some number of erase cycles – typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC NAND flashes. Erase-blocks may become bad (only NAND). In case of "FTL flash", the controller should notice and avoid further access to bad erase-blocks. In case of "raw flash", the operating system should deal with such cases.+
  
 MTD devices support 3 main operations - read from some offset within an erase block, write to some offset within an erase-block, and erase a whole erase-block. MTD devices support 3 main operations - read from some offset within an erase block, write to some offset within an erase-block, and erase a whole erase-block.
  
-The partitioning is not stored in the MBR/GPT, but it is done in the Linux Kernel (and sometimes independently in the [[docs:techref:bootloader]] as well!).+The utility program [[docs:techref:mtd]] can be used to manage MTD devices. 
 + 
 +==== MTD partitions ==== 
 + 
 +The MTD device is often subdivided into logical chunks of memory called partitions. Each partition start at the beginning of an erase-block and end at the end of an erase-block. 
 + 
 +The partitioning of MTD devices is not stored in some MBR/GPT, but it is done in the Linux Kernel using MTD-specific partition parsers determining the location and size of these partitions. (sometimes the partitioning is implemented independently in the [[docs:techref:bootloader]] as well!).
  
 The kernel boot process involves discovering of partitions within the NOR flash and it can be done by various target-dependent means: The kernel boot process involves discovering of partitions within the NOR flash and it can be done by various target-dependent means:
   * some bootloaders store a partition table at a known location   * some bootloaders store a partition table at a known location
   * some pass the partition layout via kernel command line   * some pass the partition layout via kernel command line
 +  * some pass the partition layout using Device Tree
   * some targets require specifying the kernel command line at the compile time (thus overriding the one provided by the bootloader).   * some targets require specifying the kernel command line at the compile time (thus overriding the one provided by the bootloader).
  
-Either way, if there is a partition named ''rootfs'' and ''MTD_ROOTFS_ROOT_DEV'' kernel config option is set to ''yes''this partition is automatically used for the root filesystem.+Some of these schemes but not all are implemented in the mainline Linux kernel. The standard kernel can usually detect the top level coarse partitioning schemebut not the more fine-grained sub-partitions.
  
-After that, if ''MTD_ROOTFS_SPLIT'' is enabled, the kernel adjusts the ''rootfs'' partition size to the minimum required by the particular SquashFS image and automatically adds ''rootfs_data'' to the list of the available mtd partitions setting its beginning to the first appropriate address after the SquashFS end and size to the remainder of the original ''rootfs'' partition. The resulting list is stored in RAM only, so no partition table of any kind gets actually modified.+==== MTDSPLIT ====
  
-For more details please refer to the actual patch at+In order to deal with some of the custom flash partitioning schemes directly in the kernel, OpenWrt has developed ''mtdsplit'' which is a set of patches currently maintained separately from the mainline kernel, but used in OpenWrt to parse different flash layouts and split them into further "logical" partitions. 
-[[https://dev.openwrt.org/browser/trunk/target/linux/generic/patches-2.6.37/065-rootfs_split.patch]]+ 
 +This is done recursively so that further split of a new "child" partition may be attempted. Whether an attempt is made to split a partition depends on the partition name. 
 + 
 +  * ''rootfs'' is hardcoded to be split. 
 +  * ''CONFIG_MTD_SPLIT_FIRMWARE'' can be used to control whether attempt is made on ''firmware'' partition. The most common splitting here is kernel, followed by padding, followed by SquashFS root filesystem, followed by padding, followed by free space. 
 + 
 +During splitting, the kernel walks the erase blocks and detects magic bytes via parsers. Each partition type (usually determined from name) has its own list of parsers. 
 + 
 +New partitions are usually some offset into the start of the original partition. The size and number of the "children" depends on what is detected. For example if SquashFS image is found then the ''rootfs'' partition is added. For SquashFS image the splitter also automatically adds ''rootfs_data'' to the list of the available mtd partitions, setting this partition's beginning to the first appropriate address after the SquashFS end and size to the remainder of the ''rootfs'' partition. 
 + 
 +The resulting list of split off partitions is stored in RAM only, so no partition table of any kind gets actually modified. This also includes detection and creation of ''ubi'' partition and others, as well as for vendor-specific layouts. 
 + 
 +For more details please refer to the code for the mtdsplit
 +[[https://github.com/openwrt/openwrt/tree/master/target/linux/generic/files/drivers/mtd/mtdsplit]]
  
 For overlaying a special ''mini_fo'' filesystem is used, the ''README'' is available from the sources at For overlaying a special ''mini_fo'' filesystem is used, the ''README'' is available from the sources at
 [[https://dev.openwrt.org/browser/trunk/target/linux/generic/patches-2.6.37/209-mini_fo.patch]] [[https://dev.openwrt.org/browser/trunk/target/linux/generic/patches-2.6.37/209-mini_fo.patch]]
- 
-The utility program [[docs:techref:mtd]] can be used. 
  
 ===== UBI (Unsorted Block Images) ===== ===== UBI (Unsorted Block Images) =====
Line 197: Line 352:
 The partition or partitions containing so called //Special Configuration Data// differ very much from each other. Example: The ''ART''-partition you will meet in conjunction with Atheros-Wireless and U-Boot, contains only data regarding the wireless driver, while the ''NVRAM''-partition of broadcom devices is used for much more than only that. There are special utilities to access and modify special configuration partitions. For Broadcom devices this is the ''nvram'' utility. To find out what is written in ''NVRAM'' you can run ''nvram show''. The partition or partitions containing so called //Special Configuration Data// differ very much from each other. Example: The ''ART''-partition you will meet in conjunction with Atheros-Wireless and U-Boot, contains only data regarding the wireless driver, while the ''NVRAM''-partition of broadcom devices is used for much more than only that. There are special utilities to access and modify special configuration partitions. For Broadcom devices this is the ''nvram'' utility. To find out what is written in ''NVRAM'' you can run ''nvram show''.
  
-Note that clearing these special configuration data partitions like ''ART, NVRAM'' and ''FIS'' does not clear much of OpenWrt's configuration, unlike other router software which keep configuration data solely in e.g. ''NVRAM''. Instead, as a consequence of using the overlay_fs filesystem configuration with JFFS2 flash partition, the whole file system is writable and allows the flexibility of extending your OpenWrt installation in any way you want. OpenWrt's main configuration is therefore just kept in the root file system, using [[inbox:uci|UCI]] configuration files. For convenience, many other packages are made UCI compatible. If you want to reset your complete installation you should use OpenWrt's built-in functionality such as [[docs:user-guide:generic.sysupgrade|sysupgrade]] to restore settings, by clearing the JFFS2 partition. Or, if you cannot boot normally, you can wipe or change the JFFS2 partition using OpenWrt's [[docs:user-guide:troubleshooting:failsafe_and_factory_reset|failsafe mode]] (look in your device's dedicated page for information how to boot into failsafe). +Note that clearing these special configuration data partitions like ''ART, NVRAM'' and ''FIS'' does not clear much of OpenWrt's configuration, unlike other router software which keep configuration data solely in e.g. ''NVRAM''. Instead, as a consequence of using the overlay_fs filesystem configuration with JFFS2 flash partition, the whole file system is writable and allows the flexibility of extending your OpenWrt installation in any way you want. OpenWrt's main configuration is therefore just kept in the root file system, using [[docs:guide-user:base-system:uci|UCI]] configuration files. For convenience, many other packages are made UCI compatible. If you want to reset your complete installation you should use OpenWrt's built-in functionality such as [[docs:guide-user:installation:generic.sysupgrade|sysupgrade]] to restore settings, by clearing the JFFS2 partition. Or, if you cannot boot normally, you can wipe or change the JFFS2 partition using OpenWrt's [[docs:guide-user:troubleshooting:failsafe_and_factory_reset|failsafe mode]] (look in your device's dedicated page for information how to boot into failsafe). 
  
  
Line 246: Line 401:
  
  
-The difference is, that OpenWrt-Image-File are not created that way ;-) They are being generated with the **Image Generator** (former called Image Builder). You can read about the+The difference is, that OpenWrt-Image-File are not created that way ;-) They are being generated with the [[docs:guide-user:additional-software:imagebuilder|Image Generator]] (former called Image Builder). Other resources
-  * [[docs:techref:flash.layout]] +  * [[docs:techref:headers]] 
-  * [[docs:techref:header]] +  * back to [[:downloads]]
-  * [[doc/howto/obtain.firmware.generate]] If you want to read about the **Image Generator**, go back to [[doc:howto:obtain.firmware]] and choose the second way.+
   * About [[http://skaya.enix.org/wiki/FirmwareFormat|Broadcom Firmware Format]]   * About [[http://skaya.enix.org/wiki/FirmwareFormat|Broadcom Firmware Format]]
  
  • Last modified: 2018/02/20 20:11
  • by