JIYIK CN >

Current Location:Home > Learning > OPERATING SYSTEM >

Parsing XML in Bash

Author:JIYIK Last Updated:2025/03/21 Views:

It's almost impossible to find a developer who still doesn't use XML. It's a popular markup language that's widely used to structure and transmit data.

This article will show how we can parse XML with Bash.

We will discuss two libraries here. Our first library is xmllint and the second library is called XMLStarlet.

Before you can use them, you need to install them.


Parsing XML in Bash using xmllint

This is the most common library available for parsing XML files. But you must download and install the library before you can use it.

To install this library, you need to execute the following command.

sudo apt-get update -qq
sudo apt-get install -y libxml2-utils

We have to apt-getinstall the libxml2-utils package using .

If you have an XML file named MyXML.xml, you can easily get the XML using the following command.

xmllint MyXML.xml

After executing the above command, you will get the output as follows.

<?xml version="1.0"?>
<specification>
        <type>Laptop</type>
        <model>Macbook</model>
        <screenSizeInch>14</screenSizeInch>
</specification>

This library contains some options or flags. The available options of the library are shared below.

  1. --auto - This flag is used to generate documentation for testing.
  2. --catalogs - This flag is used to use the catalogs from SGML_CATALOG_FILES. Otherwise, the default is to use /etc/xml/catalog.
  3. --chkregister - This flag is used to turn on node registration.
  4. --compress - This flag turns on gzip compression of the output.
  5. --copy - This flag is used to test the internal copy implementation.
  6. --c14n - This flag is used to serialize the results of the parsing to stdout using W3C XML Canonicalization (C14N). It also preserves comments in the results.
  7. --dtdvalid URL - This flag is used to validate using the DTD specified by the URL.
  8. --dtdvalidfpi FPI - This flag is used to specify the DTD using the public identifier FPI for validation; note that this flag requires the directory to be exported as a public identifier to work.
  9. --debug - This flag is used to parse the file. It also outputs an annotated tree which is an in-memory version of the document.
  10. --debugent - This flag is used to debug entities defined in the documentation.
  11. --dropdtd - This flag is used to remove the DTD from the output.
  12. --dtdattr - This flag will fetch an external DTD. It also populates the tree with inherited attributes.
  13. --encode - This flag will provide output in a given encoding.
  14. --format - This flag will reformat and re-indent the output.
  15. --help - This flag will print out a summary of xmllint usage.
  16. --html - This flag is used to use the HTML parser.
  17. --htmlout - This flag will display the results as an HTML file. It will output the necessary HTML tags around the result tree output so that the results can be displayed/viewed in a browser.
  18. --insert - This flag is used to test a valid insert.
  19. --loaddtd − This flag is used to fetch an external DTD.
  20. --load-trace - This flag will display all documents loaded as they are processed to stderr.
  21. --maxmem NNBYTES - This flag is used to test parser memory support. Here, NNBYTES is the maximum number of bytes that the library can allocate.
  22. --memory - This flag is used to parse from memory.
  23. --noblanks - This flag will remove ignorable whitespace.
  24. --nocatalogs - This flag specifies not to use any catalogs.
  25. --nocdata - This flag will replace CDATA sections by equivalent text nodes.
  26. --noent - This flag will replace entity references with entity values.
  27. --nonet - This flag specifies not to use the Internet to retrieve the DTD or entities.
  28. --noout - This flag will suppress the output. By default, xmllint will display the output of the result tree.
  29. --nowarning - This flag specifies that no warnings should be emitted from the validator and/or parser.
  30. --nowrap - This flag specifies that the HTML document wrapper should not be output.
  31. --noxincludenod - This flag is used to perform XInclude processing, but specifies that the XInclude start and end nodes are not generated.
  32. --nsclean - This flag is used to remove redundant namespace declarations.
  33. --output FILE - This flag defines the file path where xmllint saves the parsing results.
  34. --path "PATH(S)" - This flag is used to load DTDs or entities using the colon-separated or space-separated list of file system paths specified by PATHS. Here, the space-separated list is enclosed in quotes.
  35. --pattern PATTERNVALUE - This flag is used to exercise the pattern recognition engine that can be used with the reader interface. It is also used for debugging.
  36. --postvalid - This flag is used to validate after parsing is complete.
  37. --push - This flag enables push mode.
  38. --recover - This flag is used to output any parsable portions of the invalid document.
  39. --relaxng SCHEMA - This flag will use a RelaxNG file named SCHEMA for validation.
  40. --repeat - This flag is used to repeat 100 times for timing or profiling purposes.
  41. --schema - This flag will use the W3C XML schema file called SCHEMA.
  42. --shell - Run the navigation shell.
  43. --stream - This flag is for the streaming API.
  44. --testIO - This flag will test user input/output support.
  45. --timing - This flag will output information about how long xmllint takes to perform various steps.
  46. --valid - This flag will check the validity of the document.
  47. --version - This flag will display the version of the library.
  48. --walker - This flag will test the walker module
  49. --xinclude - This flag will perform XInclude processing.
  50. --xmlout - This flag is mainly used in conjunction with --html. It will save the document using the XML serializer. It is mainly used for conversion from HTML to XHTML.

Parsing XML in Bash using XMLStarlet

Another popular library for parsing any XML document is called XMLStarlet. The main command of this library is xmlstarlet.

You must execute the following command as root to install this library.

sudo dnf install xmlstarlet

It contains useful options that make it easier to validate, transform or query XML files. You can easily get XML files with the simplest commands of the library.

xmlstarlet format MyXML.xml

After executing the above command, you will see the contents of the XML file as the output below.

<?xml version="1.0"?>
<specification>
        <type>Laptop</type>
        <model>Macbook</model>
        <screenSizeInch>14</screenSizeInch>
</specification>

All the codes used in this article are written in Bash. It will only work in Linux Shell environment.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Solve R command not found on Bash (or Cygwin)

Publish Date:2025/03/21 Views:102 Category:OPERATING SYSTEM

Commands may sometimes behave differently than you expect, even if it appears that you have done everything right, such as in the case of bash: '\r': command not found or similar error messages, for example syntax error near unexpected toke

How to fix the error Make Command Not Found in Cygwin

Publish Date:2025/03/21 Views:119 Category:OPERATING SYSTEM

Cygwin allows Windows users to access certain Linux features and includes a large number of GNU and open source tools that are commonly found in popular Linux distributions. When using Cygwin, it is easy to encounter a command not found err

Error handling in Bash

Publish Date:2025/03/21 Views:93 Category:OPERATING SYSTEM

This article introduces error handling in bash. Remember, understanding exit codes, options such as errexit and trap allow us to build robust scripts and manage bash problems more effectively. Exit Codes in Bash Handling errors based on exi

Getting the absolute path in Bash

Publish Date:2025/03/21 Views:60 Category:OPERATING SYSTEM

In this Bash article, we will learn different ways to get the absolute path in Linux. We will also learn some different Linux commands to get the absolute path of a file. Before we begin, we need to understand the basic concepts of absolute

Difference between Bash Nohup and &

Publish Date:2025/03/21 Views:151 Category:OPERATING SYSTEM

This short article introduces the nohup command and the control operator to run Linux processes in the background through Bash. In addition, we will further study the key differences between nohup and . Running Linux Processes in the Backgr

Renaming Files in Bash

Publish Date:2025/03/21 Views:89 Category:OPERATING SYSTEM

With the help of Bash scripts, you can automate your tasks. File renaming is a common task on various systems. You can rename all the files manually. However, if your file names have a sequence, it is better to automate this task. This way

Open Emacs in Bash

Publish Date:2025/03/21 Views:141 Category:OPERATING SYSTEM

This article will show you how to open Emacs from within Bash. We will also discuss how to install the Emacs text editor. Install EMACS in your system Suppose you don't have Emacs in your system. You can easily install it in your system wit

Clear the Terminal Screen in Bash

Publish Date:2025/03/21 Views:145 Category:OPERATING SYSTEM

There are various ways to clear the terminal in bash script. This article will discuss 3 methods to clear the terminal. Use tput reset to clear the terminal screen. The first method uses the keyword tput reset to clear the screen. When your

Reload .bash_profile from the command line

Publish Date:2025/03/21 Views:67 Category:OPERATING SYSTEM

In the shell, .bash_profile is used to customize the configuration of user settings. It is stored in the root directory or home directory and is mostly hidden from other users. This file holds all the configuration of the shell and is also

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial