Slicing up XML files is best done with an XML parser. (Regular expressions, csplit, etc. are too easily confused by arbitrary strings in CDATA sections.) xml_split (may be obtained with CPAN by installing XML::Twig) mostly does the trick. Given a file like:
<?xml version="1.0" encoding="UTF-8"?>
<foo:Root xmlns:foo="http://www.foo.bar/fnarf/foo">
<foo:child>
...
</foo:child>
<foo:child>
...
</foo:child>
</foo:Root>
…xml_split can create many files, each containing:
<?xml version="1.0" encoding="UTF-8"?>
<foo:child>
...
</foo:child>
However, this loses the namespace declaration and the enclosing root element. Luckily, a little sed magic can bring those back:
find . -name '*.xml' | xargs -n1 sed -e '1 a\
<foo:Root xmlns:foo="http://www.foo.bar/fnarf/foo">
' -e '$ a\
</foo:Root>
' -i ''
find lists all the files, xargs invokes sed on them one by one (-n1
), and sed adds the opening tag with namespace declaration after the first line (1 a
) and the closing tag after the last line ($ a
). Now each file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<foo:Root xmlns:foo="http://www.foo.bar/fnarf/foo">
<foo:child>
...
</foo:child>
</foo:Root>