Solr has a handy ability to ingest local CSV files. The neatest aspect of which is that you can populate multi-valued fields by sub-parsing an individual field. E.g. the following will ingest /tmp/input.csv and split SomeField into multiple values by semi-colon delimiters:
curl http: //localhost :80 /solr/my_core/update \?stream. file \= /tmp/input .csv\&stream.contentType\=text /csv \;charset\=utf-8\&commit\= true \&f.SpmeField. split \= true \&f.SomeField.separator\=%3B |
When running an ingest, I got the following response, which was confusing since myField was, in fact, defined in my schema:
1 2 3 4 5 6 7 8 9 10 | <? xml version = "1.0" encoding = "UTF-8" ?> < response > < lst name = "responseHeader" > < int name = "status" >400</ int > < int name = "QTime" >1</ int ></ lst > < lst name = "error" > < str name = "msg" >undefined field: "myField"</ str > < int name = "code" >400</ int > </ lst > </ response > |
A peek in the log provided a clue (note the leading question mark):
SEVERE: org.apache.solr.common.SolrException: undefined field: "?myField"
Examining a hex dump of the CSV file revealed that it started with a UTF-8 Byte Order Mark:
xxd /tmp/input.csv | head 0000000: efbb bf...
One way to strip the BOM is with Bomstrip, a collection of BOM-stripping implementations in various languages, including a Perl one-liner. Alternatively, just open the file in Vim, do :set nobomb and save. Done!