Solr has a handy ability to ingest local CSV files. The neatest aspect of which is that you can populate multi-valued fields by sub-parsing an individual field. E.g. the following will ingest /tmp/input.csv and split SomeField into multiple values by semi-colon delimiters:
curl http://localhost:80/solr/my_core/update\?stream.file\=/tmp/input.csv\&stream.contentType\=text/csv\;charset\=utf-8\&commit\=true\&f.SpmeField.split\=true\&f.SomeField.separator\=%3B
When running an ingest, I got the following response, which was confusing since myField was, in fact, defined in my schema:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">1</int></lst> <lst name="error"> <str name="msg">undefined field: "myField"</str> <int name="code">400</int> </lst> </response>
A peek in the log provided a clue (note the leading question mark):
SEVERE: org.apache.solr.common.SolrException: undefined field: "?myField"
Examining a hex dump of the CSV file revealed that it started with a UTF-8 Byte Order Mark:
xxd /tmp/input.csv | head 0000000: efbb bf...
One way to strip the BOM is with Bomstrip, a collection of BOM-stripping implementations in various languages, including a Perl one-liner. Alternatively, just open the file in Vim, do :set nobomb and save. Done!