-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathmarqo_original_docs_merged.txt
More file actions
6135 lines (4701 loc) · 298 KB
/
marqo_original_docs_merged.txt
File metadata and controls
6135 lines (4701 loc) · 298 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
List Indexes
GET /indexes
List indexes
Example
Marqo Open Source
Marqo Cloud
cURL
Python
curl http://localhost:8882/indexes
Response: 200 OK
{
"results": [
{
"indexName": "Book Collection"
},
{
"indexName": "Animal facts"
}
]
}
---
Delete index
Delete an index.
Index creation and deletion can not be called concurrently. If you try to delete an index when there is an ongoing creation or deletion of an index, the request will fail, you will receive a 409 OperationConflictError.
Note: This operation cannot be undone, and the deleted index can't be recovered
DELETE /indexes/{index_name}
Example
Marqo Open Source
Marqo Cloud
cURL
Python
curl -XDELETE http://localhost:8882/indexes/my-first-index
Response: 200 OK
{"acknowledged": true}
---
Modify index
In Marqo Cloud you can modify the number of inference nodes numberOfInferences, the type of inference node inferenceType, the number of storage shards numberOfShards, the number of replicas numberOfReplicas. Support to update the type of storage class storageClass will be added soon. These are body parameters that are specific to Marqo Cloud only and so this page is specifically for Marqo Cloud.
If you're looking to modify documents in either Marqo Open Source or Marqo Cloud, please see Update Documents.
Marqo Cloud
You can modify the settings of an existing index, such as the number of inference nodes or the type of inference node.
PUT https://api.marqo.ai/api/v2/indexes/{index_name}
Example
cURL
curl -XPUT 'https://api.marqo.ai/api/v2/indexes/my-first-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"numberOfInferences": 1,
"inferenceType": "marqo.CPU.large",
"numberOfShards": 2,
"numberOfReplicas": 1,
"storageClass": "marqo.basic"
}'
Response: 200 OK
{"acknowledged":true}
Path parameters
Name Type Description
index_name String name of the index
Body Parameters
The settings for the index. The settings are represented as a nested JSON object.
Name Type Default value Description
inferenceType String marqo.CPU.small Type of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU".
numberOfInferences Integer 1 Defines the number of inference nodes for the index. The minimum value is 0, and the maximum value is 5 by default, but this is dependent on your account limits.
numberOfShards Integer 1 Defines the number of shards for the index. The minimum value is equal to the current number of shards for the index, and the maximum value is 5 by default, but this is dependent on your account limits.
numberOfReplicas Integer 1 Defines the number of replicas for the index. The minimum value is equal to the current number of replicas for the index, and the maximum value is 1 by default, but this is dependent on your account limits.
storageClass String marqo.basic Defines the storage class for the index. Permisible values are marqo.basic, marqo.balanced and marqo.performance. The value must be equal to the current storage class for the index, but support to change this value is coming soon.
---
Create Index
Create index with (optional) settings. This endpoint accepts the application/json content type.
POST /indexes/{index_name}
Index creation and deletion can not be called concurrently. If you try to create an index when there is an ongoing creation or deletion of an index, the request will fail, you will receive a 409 OperationConflictError.
Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open Source.
Example
Marqo Open Source
Marqo Cloud
This is an example of creating an index with Marqo Open Source:
cURL
Python
curl -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"type": "unstructured"
}'
Response: 200 OK
{"acknowledged":true, "index":"my-first-index"}
Path parameters
Name Type Description
index_name
String name of the index
Body Parameters
The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:
Name Type Default value Description
treatUrlsAndPointersAsImages Boolean False Fetch images from pointers
treatUrlsAndPointersAsMedia* Boolean False Fetch images, videos, and audio from pointers
model String hf/e5-base-v2 The model to use to vectorise doc content in add_documents() calls for the index. To create index with a Marqtuned model, please specify the model name as marqtune/{model_id}/{released_checkpoint_name}. More details can be found here
modelProperties Dictionary "" The model properties object corresponding to model (for custom models). Check here on how to bring your own models.
normalizeEmbeddings
Boolean true Normalize the embeddings to have unit length
textPreprocessing Dictionary "" The text preprocessing object
imagePreprocessing Dictionary "" The image preprocessing object
videoPreprocessing Dictionary "" The video preprocessing object
audioPreprocessing Dictionary "" The audio preprocessing object
annParameters Dictionary "" The ANN algorithm parameter object
type String unstructured Type of the index
vectorNumericType String float Numeric type for vector encoding
filterStringMaxLength Int 50 Specifies the maximum character length allowed for strings used in filtering queries within unstructured indexes. This means that any string field you intend to use as a filter in these indexes should not exceed 50 characters in length.
textChunkPrefix String "" or model default The prefix added to indexed text document chunks when embedding.
textQueryPrefix String "" or model default The prefix added to text queries when embedding.
* treatUrlsAndPointersAsMedia is a new parameter introduced in Marqo 2.12 to support the new modalities of video and audio. Here is how it interacts with treatUrlsAndPointersAsImages:
Both False: All content is processed as text only.
treatUrlsAndPointersAsImages True, treatUrlsAndPointersAsMedia False:
Processes URLs and pointers as images
Does not process other media types (video, audio)
treatUrlsAndPointersAsImages False, treatUrlsAndPointersAsMedia True:
Invalid state since this is a conflict.
Both True:
Processes URLs and pointers as various media types (images, videos, audio)
Note: these body parameters are used in both Marqo Open Source and Marqo Cloud. Marqo Cloud also has additional body parameters. Let's take a look at those now.
Additional Marqo Cloud Body Parameters
Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open Source.
Name Type Default value Description Open Source Cloud
inferenceType String marqo.CPU.small Type of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU". ❌ ✅
storageClass String marqo.basic Type of storage for the index. Options are "marqo.basic", "marqo.balanced", "marqo.performance". ❌ ✅
numberOfShards Integer 1 The number of shards for the index. ❌ ✅
numberOfReplicas Integer 0 The number of replicas for the index. ❌ ✅
numberOfInferences Integer 1 The number of inference nodes for the index. ❌ ✅
Text Preprocessing Object
The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:
Name Type Default value Description
splitLength Integer 2 The length of the chunks after splitting by split_method
splitOverlap
Integer 0 The length of overlap between adjacent chunks
splitMethod String sentence The method by which text is chunked (character, word, sentence, or passage)
Image Preprocessing Object
The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:
Name Type Default value Description
patchMethod
String null The method by which images are chunked (simple or frcnn)
Video Preprocessing Object
The videoPreprocessing object contains the specifics of how you want the index to preprocess videos. The last chunk in the video file will have a start time of the total length of the video file minus the split length.
The parameters are as follows:
Name Type Default value Description
splitLength Integer 20 The length of the video chunks in seconds after splitting by split_method
splitOverlap
Integer 3 The length of overlap in seconds between adjacent chunks
Audio Preprocessing Object
The audioPreprocessing object contains the specifics of how you want the index to preprocess audio. The last chunk in the audio file will have a start time of the total length of the audio file minus the split length.
The parameters are as follows:
Name Type Default value Description
splitLength Integer 10 The length of the audio chunks in seconds after splitting by split_method
splitOverlap
Integer 3 The length of overlap in seconds between adjacent chunks
ANN Algorithm Parameter object
The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:
Name Type Default value Description
spaceType
String prenormalized-angular The function used to measure the distance between two points in ANN (angular, euclidean, dotproduct, geodegrees, hamming, or prenormalized-angular).
parameters Dict "" The hyperparameters for the ANN method (which is always hnsw for Marqo).
HNSW Method Parameters Object
parameters can have the following values:
Name Type Default value Description
efConstruction
int 512 The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
m int 16 The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
Model Properties Object
This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.
For OpenCLIP models, see here for modelProperties format and example usage.
For Generic SBERT models, see here for modelProperties format and example usage.
Prefixes in Index Settings
Parameters: textChunkPrefix, textQueryPrefix
Expected value: A string.
Default value: ""
These fields override the model's default prefixes for text documents and queries. URLs pointing to images are not affected by these prefixes. If these fields are left undefined, Marqo will use the model's default prefixes. Currently, only the e5 series models have default prefixes defined.
Indexes built on Marqo 2.5 and below will not have prefixes added to any new documents, embeddings, or queries when read with Marqo 2.6 and above, even if the index’s model has default prefixes set.
Currently, Marqo adds the prefixes by default to e5 models since these are trained on data with prefixes. So, adding them to text chunks before embedding improves the quality of the embeddings. For more information, refer to the model card here
Example: Setting text chunk and query prefixes during index creation
Marqo Open Source
Marqo Cloud
cURL
Python
curl -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
"textChunkPrefix": "override passage: ",
"textQueryPrefix": "override query: ",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"type": "unstructured"
}'
Example Settings Object
Below is a sample index settings JSON object. When using the Python client, pass this dictionary as the settings_dict parameter for the create_index method.
{
"type": "unstructured",
"vectorNumericType": "float",
"treatUrlsAndPointersAsImages": true,
"model": "open_clip/ViT-L-14/laion2b_s32b_b82k",
"normalizeEmbeddings": true,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence"
},
"imagePreprocessing": {
"patchMethod": null
},
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {
"efConstruction": 512,
"m": 16
}
},
"filterStringMaxLength": 20
}
---
Create Structured Index
Structured indexes in Marqo are tailored for datasets with a defined schema and are particularly effective for complex queries like sorting, grouping, and filtering. They are designed for fast, in-memory operations.
To create your structured index:
POST /indexes/{index_name}
Create index with (optional) settings. This endpoint accepts the application/json content type.
Path parameters
Name Type Description
index_name
String Name of the index
Body Parameters
The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:
Name Type Default value Description
allFields List - List of fields that might be indexed or queried. Valid only if type is structured
tensorFields List [] List of fields that are treated as tensors
model String hf/e5-base-v2 The model to use to vectorise doc content in add_documents() calls for the index
modelProperties Dictionary "" The model properties object corresponding to model (for custom models)
normalizeEmbeddings
Boolean true Normalize the embeddings to have unit length
textPreprocessing Dictionary "" The text preprocessing object
imagePreprocessing Dictionary "" The image preprocessing object
videoPreprocessing Dictionary "" The video preprocessing object
audioPreprocessing Dictionary "" The audio preprocessing object
annParameters Dictionary "" The ANN algorithm parameter object
type String unstructured Type of the index. The default value is unstructured, but for the structured index this needs to be structured
vectorNumericType String float Numeric type for vector encoding
Note: these body parameters are used in both Marqo Open-Source and Marqo Cloud. Marqo Cloud also has additional body parameters. Let's take a look at those now.
Additional Marqo Cloud Body Parameters
Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open-Source.
Name Type Default value Description Open Source Cloud
inferenceType String marqo.CPU.small Type of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU". ❌ ✅
storageClass String marqo.basic Type of storage for the index. Options are "marqo.basic", "marqo.balanced", "marqo.performance". ❌ ✅
numberOfShards Integer 1 The number of shards for the index. ❌ ✅
numberOfReplicas Integer 0 The number of replicas for the index. ❌ ✅
numberOfInferences Integer 1 The number of inference nodes for the index. ❌ ✅
Fields
The allFields object contains the fields that might be indexed or queried. Each field has the following parameters:
Name Type Default value Description
name String - Name of the field
type String - Type of the field
features List [] List of features that the field supports
Available types are:
Field Type Description Supported Features
text Text field lexical_search, filter
int 32-bit integer filter, score_modifier
float 32-bit float filter, score_modifier
long 64-bit integer filter, score_modifier
double 64-bit float filter, score_modifier
array<text> Array of text lexical_search, filter
array<int> Array of 32-bit integers filter
array<float> Array of 32-bit floats filter
array<long> Array of 64-bit integers filter
array<double> Array of 64-bit floats filter
bool Boolean filter
multimodal_combination Multimodal combination field None
image_pointer Image URL. Must only be used with a multimodal model such as CLIP None
video_pointer Video URL. Must only be used with a multimodal model such as LanguageBind None
audio_pointer Audio URL. Must only be used with a multimodal model such as LanguageBind None
custom_vector Custom vector, with optional text for lexical/filtering lexical_search, filter
map<text, int> Map of text to integers score_modifier
map<text, long> Map of text to longs score_modifier
map<text, float> Map of text to floats score_modifier
map<text, double> Map of text to doubles score_modifier
Available features are:
lexical_search: The field can be used for lexical search
filter: The field can be used for exact and range (numerical fields) filtering
score_modifier: The field can be used to modify the score of the document
When using multimodal_combination fields, the dependentFields object is used to define the weights for the multimodal combination field and is required. The dependentFields object is a dictionary where the keys are the names of the fields that are used to create the multimodal combination field and the values are the weights for each field. Field names must refer to fields that are defined in allFields. See the example below for more details.
Text Preprocessing Object
The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:
Name Type Default value Description
splitLength Integer 2 The length of the chunks after splitting by split_method
splitOverlap
Integer 0 The length of overlap between adjacent chunks
splitMethod String sentence The method by which text is chunked (character, word, sentence, or passage)
Image Preprocessing Object
The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:
Name Type Default value Description
patchMethod
String null The method by which images are chunked (simple or frcnn)
Video Preprocessing Object
The videoPreprocessing object contains the specifics of how you want the index to preprocess videos. The last chunk in the video file will have a start time of the total length of the video file minus the split length.
The parameters are as follows:
Name Type Default value Description
splitLength Integer 20 The length of the video chunks in seconds after splitting by split_method
splitOverlap
Integer 3 The length of overlap in seconds between adjacent chunks
Audio Preprocessing Object
The audioPreprocessing object contains the specifics of how you want the index to preprocess audio. The last chunk in the audio file will have a start time of the total length of the audio file minus the split length.
The parameters are as follows:
Name Type Default value Description
splitLength Integer 10 The length of the audio chunks in seconds after splitting by split_method
splitOverlap
Integer 3 The length of overlap in seconds between adjacent chunks
ANN Algorithm Parameter object
The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:
Name Type Default value Description
spaceType
String prenormalized-angular The function used to measure the distance between two points in ANN (angular, euclidean, dotproduct, geodegrees, hamming, or prenormalized-angular).
parameters Dict "" The hyperparameters for the ANN method (which is always hnsw for Marqo).
HNSW Method Parameters Object
parameters can have the following values:
Name Type Default value Description
efConstruction
int 512 The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
m int 16 The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
Model Properties Object
This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.
For OpenCLIP models, see here for modelProperties format and example usage.
For Generic SBERT models, see here for modelProperties format and example usage.
Example 1: Creating a structured index for combining text and images
Marqo Open-Source
Marqo Cloud
cURL
python
curl -X POST 'http://localhost:8882/indexes/my-first-structured-index' \
-H "Content-Type: application/json" \
-d '{
"type": "structured",
"vectorNumericType": "float",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"normalizeEmbeddings": true,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence"
},
"allFields": [
{"name": "text_field", "type": "text", "features": ["lexical_search"]},
{"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "tags", "type": "array<text>", "features": ["filter"]},
{"name": "image_field", "type": "image_pointer"},
{"name": "my_int", "type": "int", "features": ["score_modifier"]},
{
"name": "multimodal_field",
"type": "multimodal_combination",
"dependentFields": {"image_field": 0.9, "text_field": 0.1}
}
],
"tensorFields": ["multimodal_field"],
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {"efConstruction": 512, "m": 16}
}
}'
Example 2: Creating a structured index with no model for use with custom vectors
Marqo Open-Source
Marqo Cloud
cURL
python
curl -X POST 'http://localhost:8882/indexes/my-hybrid-index' \
-H "Content-Type: application/json" \
-d '{
"model": "no_model",
"modelProperties": {
"type": "no_model",
"dimensions": 3072
},
"type": "structured",
"allFields": [
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "int", "features": ["score_modifier"]}
],
"tensorFields": ["title"]
}'
---
Add or Replace Documents
Add an array of documents or replace them if they already exist.
If you send a document with an _id that corresponds to an existing document, the new document will overwrite the existing document.
This endpoint accepts the application/json content type.
POST /indexes/{index_name}/documents
Path parameters
Name Type Description
index_name String name of the index
Query parameters
Query Parameter Type Default Value Description
device String null The device used to index the documents. If device is not specified and CUDA devices are available to Marqo (see here for more info), Marqo will speed up the indexing process by using available CUDA devices. Otherwise, the CPU will be used. Options include cpu and cuda, cuda1, cuda2 etc. The cuda option tells Marqo to use any available cuda devices.
telemetry Boolean False If true, the telemetry object is returned in the add documents response body. This includes information like latency metrics. This is set at client instantiation time in the Python client: mq = marqo.Client(return_telemetry=True)
Body
In the RestAPI and for curl users these parameters are in lowerCamelCase, as presented in the following table. The Python client uses the pythonic snake_case equivalents.
Add documents parameters Value Type Default Value Description
documents Array of objects n/a An array of documents. Each document is represented as a JSON object. You can optionally set a document's ID with the special _id field. The _id must be a string type. If an ID is not specified, marqo will generate one.
tensorFields
Array of Strings [] Structured indexes only support tensor fields at the time of their creation, while unstructured indexes can include these fields during the document-adding process. The fields within these documents which will be tensor fields, and therefore will have vectors generated for them. Tensor search can only be performed on these fields for these documents. Pre-filtering and lexical search are still viable on text fields which are not included in the tensorFields parameter. For the best recall and speed performance, we recommend minimising the number of different tensor fields for your index. For production use cases where speed and recall are critical, we recommend only a single tensor field for the entire index.
useExistingTensors Boolean false Setting this to true will get existing tensors for unchanged fields in documents that are indexed with an id. Note: Marqo analyses the field string for updates, so Marqo can't detect a change if a URL points to a different image.
imageDownloadHeaders (deprecated) Dict null An object that consists of key-value pair headers for image download. Can be used to authenticate the images for download.
mediaDownloadHeaders Dict null An object that consists of key-value pair headers for media download. Can be used to authenticate the all types of media for download.
mappings Dict null An object to handle object fields in documents. Check mappings for more information. Mappings are required to create multimodal combination and custom vector fields - see here for more information
modelAuth Dict null An object that consists of authorisation details used by Marqo to download non-publicly available models. Check here for more information.
clientBatchSize Integer null A Python client only helper parameter that splits up very large lists of documents into batches of a more manageable size for Marqo.
textChunkPrefix String null The prefix added to indexed text document chunks when embedding. Setting this field overrides the textChunkPrefix set in the index settings during index creation. If it unset by the user, it defaults to the prefixes defined in the index settings. For more information on default values for index settings, see create_index.
Additional Marqo Cloud Body Parameters
Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open Source.
Name Type Default value Description Open Source Cloud
inferenceType String marqo.CPU.small Type of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU". ❌ ✅
storageClass String marqo.basic Type of storage for the index. Options are "marqo.basic", "marqo.balanced", "marqo.performance". ❌ ✅
numberOfShards Integer 1 The number of shards for the index. ❌ ✅
numberOfReplicas Integer 0 The number of replicas for the index. ❌ ✅
numberOfInferences Integer 1 The number of inference nodes for the index. ❌ ✅
Response
The response of the add_or_replace_documents endpoint in Marqo operates on two levels. Firstly, a status code of 200 in the overall response indicates that the batch request has been successfully received and processed by Marqo. The response has the following fields:
Field Name Type Description
errors Boolean Indicates whether any errors occurred during the processing of the batch request.
items Array An array of objects, each representing the processing status of an individual document in the batch.
processingTimeMs Integer The time taken to process the batch request, in milliseconds.
index_name String The name of the index to which the documents were added.
However, a 200 status does not necessarily imply that each individual document within the batch was processed without issues. For each document in the batch, there will be an associated response code that specifies the status of that particular document's processing. These individual response codes provide granular feedback, allowing users to discern which documents were successfully processed, which encountered errors, and the nature of any issues encountered.
Each item in the items array has the following fields:
Field Name Type Description
_id String The ID of the document that was processed.
status Integer The status code of the document processing.
message String A message that provides additional information about the processing status of the document. This field only exists when the status is not 200.
Here is the HTTP status code of the individual document responses (non-exhaustive list of status codes):
Status Code Description
200 The document is successfully added to the index.
400 Bad request. Returned for invalid input (e.g., invalid field types). Inspect message for details.
429 Marqo vector store receives too many requests. Please try again later.
500 Internal error.
Example
For unstructured index:
Marqo Open Source
Marqo Cloud
cURL
Python
curl -XPOST 'http://localhost:8882/indexes/my-first-index/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo",
"Genre": "History"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591",
"Genre": "Science"
}
],
"tensorFields": ["Description"]
}'
For structured index:
Marqo Open Source
Marqo Cloud
cURL
Python
curl -XPOST 'http://localhost:8882/indexes/my-first-structured-index/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing the travels of Polo",
"Genre": "History"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591",
"Genre": "Science"
}
]
}'
Response: 200 OK
{
"errors": false,
"items": [
{
"_id": "5aed93eb-3878-4f12-bc92-0fda01c7d23d",
"status": 200
},
{
"_id": "article_591",
"status": 200
}
],
"processingTimeMs": 6,
"index_name": "my-first-index"
}
The first document in this example had its _id generated by Marqo. In this example, there was already a document in Marqo with _id = article_591, so it was updated rather than created. In unstructured index we want Description to be a searchable with tensor search (Marqo's default search), so we explicitly declare it as a tensor field. In structured index the tensor fields are specified during index creation, so we don't need to specify them here. Tensor fields are stored alongside vector representation of the data, allowing for multimodal and semantic searches.
If you would like to see an example of adding video and audio documents, please visit this section.
Documents
Parameter: documents
Expected value: An array of documents (default maximum length: 128). Each document is a JSON object that is to be added to the index. Each key is the name of a document's field and its value is the content for that field. See here for the allowed field data types. The optional _id key can be used to specify a string as the document's ID.
Map Fields
Only flat numeric dictionaries with int, long, float, and double values are currently supported as document fields.
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels",
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection",
"_id": "article_591"
},
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels",
"map_numeric_field": {
"popularity": 56.4,
"availability": 0.9,
"year_published": 1300,
}
},
]
Mappings
Parameter: mappings
Expected value: JSON object with field names as keys, mapped to objects with type (currently only multimodal_combination and custom_vector are supported). Multimodal combination mappings also have weights, which is an object that maps each nested field to a relative weight.
Default value: null
The mappings object allows adding special fields, such as: multimodal fields, custom vector fields, and map score modifiers.
With multimodal fields, child fields are vectorised and combined into a single tensor via weighted-sum approach using the weights object. The combined tensor will be used for tensor search.
With custom vector fields, vectors can be directly inserted into documents. This is useful if you are generating your vectors outside of marqo.
With map score modifiers, child fields values can be of type int, long, float, or double. These values will be used in the score modifier computation during search.
All multimodal combination or custom vector fields must be in tensor_fields.
Dependent fields can be used for lexical search or vector search with filtering. Dependent fields can only have content of type str, representing a text or a pointer (URL) to an image.
The mappings is optional with structured indexes and is only needed if the user needs to override default multimodal weights defined at index creation time. Additionally, custom vector fields should not be declared in mappings for structured indexes.
Read more about using mappings and special fields here
Example: Multimodal Combination
Unstructured Index (Default)
Marqo Open Source
Marqo Cloud
cURL
Python
# Create an unstructured index (default)
curl -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
"treatUrlsAndPointersAsImages": true,
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k"
}'
# Add documents with mappings to specify multimodal combination fields
curl -X POST 'http://localhost:8882/indexes/my-first-index/documents' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"img": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg",
"caption": "A man riding horse"
},
{
"img": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
"caption": "An airplane flying in the sky"
}
],
"mappings": {
"my_combination_field": {
"type": "multimodal_combination",
"weights": {
"img": 0.9, "caption": 0.1
}
}
},
"tensorFields": ["my_combination_field"]
}'
Structured Index
Marqo Open Source
Marqo Cloud
cURL
Python
# Alternatively you can create a structured index with multimodal combination fields
curl -X POST 'http://localhost:8882/indexes/my-first-structured-index' \
-H "Content-Type: application/json" \
-d '{
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"type": "structured",
"allFields": [
{"name": "caption", "type": "text"},
{"name": "img", "type": "image_pointer"},
{"name": "my_combination_field", "type": "multimodal_combination",
"dependentFields": {"caption": 0.5, "img": 0.5}}
],
"tensorFields": ["my_combination_field"]
}'
# Add documents
# The mappings object is optional with structured indexes and is only needed if the user needs to
# override default multimodal weights defined at index creation time.
curl -X POST 'http://localhost:8882/indexes/my-first-structured-index/documents' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"img": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image1.jpg",
"caption": "A man riding horse"
},
{
"img": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
"caption": "An airplane flying in the sky"
}
],
"mappings": {
"my_combination_field": {
"type": "multimodal_combination",
"weights": {
"img": 0.6, "caption": 0.4
}
}
}
}'
Example: Custom Vectors
(Replace 'vector' field value with your own vectors!)
Unstructured Index (Default)
Marqo Open Source
Marqo Cloud
cURL
Python
# Create an index with the model that has the dimensions of your custom vectors. For example: "open_clip/ViT-B-32/laion2b_s34b_b79k" (dimension is 512).
# Only the model dimension matters, as we are not vectorising anything when using custom vector fields.
# Space type CANNOT be 'prenormalized-angular' for custom vectors, as they are not normalized.
curl -X POST 'http://localhost:8882/indexes/my-first-index' \
-H "Content-Type: application/json" \
-d '{
"treatUrlsAndPointersAsImages": true,
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"annParameters": {
"spaceType": "angular",
"parameters": {"efConstruction": 512, "m": 16}
}
}'
# We add the custom vector documents into our index (with mappings)
curl -X POST 'http://localhost:8882/indexes/my-first-index/documents' \
-H "Content-Type: application/json" \
-d '{
"documents":[
{
"_id": "doc1",
"my_custom_vector": {
"vector": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511],
"content": "Singing audio file"
}
},
{
"_id": "doc2",
"my_custom_vector": {
"vector": [1.0, 0.5, 0.3333, 0.25, 0.2, 0.1667, 0.1429, 0.125, 0.1111, 0.1, 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625, 0.0588, 0.0556, 0.0526, 0.05, 0.0476, 0.0455, 0.0435, 0.0417, 0.04, 0.0385, 0.037, 0.0357, 0.0345, 0.0333, 0.0323, 0.0312, 0.0303, 0.0294, 0.0286, 0.0278, 0.027, 0.0263, 0.0256, 0.025, 0.0244, 0.0238, 0.0233, 0.0227, 0.0222, 0.0217, 0.0213, 0.0208, 0.0204, 0.02, 0.0196, 0.0192, 0.0189, 0.0185, 0.0182, 0.0179, 0.0175, 0.0172, 0.0169, 0.0167, 0.0164, 0.0161, 0.0159, 0.0156, 0.0154, 0.0152, 0.0149, 0.0147, 0.0145, 0.0143, 0.0141, 0.0139, 0.0137, 0.0135, 0.0133, 0.0132, 0.013, 0.0128, 0.0127, 0.0125, 0.0123, 0.0122, 0.012, 0.0119, 0.0118, 0.0116, 0.0115, 0.0114, 0.0112, 0.0111, 0.011, 0.0109, 0.0108, 0.0106, 0.0105, 0.0104, 0.0103, 0.0102, 0.0101, 0.01, 0.0099, 0.0098, 0.0097, 0.0096, 0.0095, 0.0094, 0.0093, 0.0093, 0.0092, 0.0091, 0.009, 0.0089, 0.0088, 0.0088, 0.0087, 0.0086, 0.0085, 0.0085, 0.0084, 0.0083, 0.0083, 0.0082, 0.0081, 0.0081, 0.008, 0.0079, 0.0079, 0.0078, 0.0078, 0.0077, 0.0076, 0.0076, 0.0075, 0.0075, 0.0074, 0.0074, 0.0073, 0.0072, 0.0072, 0.0071, 0.0071, 0.007, 0.007, 0.0069, 0.0069, 0.0068, 0.0068, 0.0068, 0.0067, 0.0067, 0.0066, 0.0066, 0.0065, 0.0065, 0.0065, 0.0064, 0.0064, 0.0063, 0.0063, 0.0063, 0.0062, 0.0062, 0.0061, 0.0061, 0.0061, 0.006, 0.006, 0.006, 0.0059, 0.0059, 0.0058, 0.0058, 0.0058, 0.0057, 0.0057, 0.0057, 0.0056, 0.0056, 0.0056, 0.0056, 0.0055, 0.0055, 0.0055, 0.0054, 0.0054, 0.0054, 0.0053, 0.0053, 0.0053, 0.0053, 0.0052, 0.0052, 0.0052, 0.0052, 0.0051, 0.0051, 0.0051, 0.0051, 0.005, 0.005, 0.005, 0.005, 0.0049, 0.0049, 0.0049, 0.0049, 0.0048, 0.0048, 0.0048, 0.0048, 0.0047, 0.0047, 0.0047, 0.0047, 0.0047, 0.0046, 0.0046, 0.0046, 0.0046, 0.0045, 0.0045, 0.0045, 0.0045, 0.0045, 0.0044, 0.0044, 0.0044, 0.0044, 0.0044, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.004, 0.004, 0.004, 0.004, 0.004, 0.004, 0.004, 0.0039, 0.0039, 0.0039, 0.0039, 0.0039, 0.0039, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0032, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.0031, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0029, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0028, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0027, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0026, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0025, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0024, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0023, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0022, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002, 0.002],
"content": "Podcast audio file"
}
}
],
"mappings": {
"my_custom_vector": {
"type": "custom_vector"
}
},
"tensorFields": ["my_custom_vector"]
}'
---
Get multiple documents
Gets a selection of documents based on their IDs.
This endpoint accepts the application/json content type.
GET /indexes/{index_name}/documents
Path parameters
Name Type Description
index_name String name of the index
Query parameters
Search parameter Type Default value Description
expose_facets
Boolean False If true, the documents' tensor facets are returned. This is a list of objects. Each facet object contains document data and its associated embedding (found in the facet's _embedding field)
Response
The response of the get_multiple_documents endpoint in Marqo operates on two levels. Firstly, a status code of 200 in the overall response indicates that the batch request has been successfully received and processed by Marqo.
The response has the following fields:
Field Type Description
results Array An array of objects, each representing a document. Each object contains the document's data.
However, a 200 status does not necessarily imply that each individual document within the batch was processed without issues. For each document in the batch, there will be an associated response code that specifies the status of that particular document's processing. These individual response codes provide granular feedback, allowing users to discern which documents were successfully processed, which encountered errors, and the nature of any issues encountered. If Marqo finds the document, the document will be returned with the _found field set to true in an object. For documents not found, the _found field will be set to false, with the document ID returned in the _id field and details of the error in the message field.
For this endpoint, a 200 status code is not used to indicate successful document retrieval, as we aim to avoid adding extra fields to the returned documents. Here is the HTTP status code of the individual document responses (non-exhaustive list of status codes):
Status Code Description
400 Bad request. Returned for invalid input (e.g., invalid field types). Inspect message for details.
404 The target document is not in the index.
429 Marqo vector store receives too many requests. Please try again later.
500 Internal error.
Body
An array of IDs. Each ID is a string.
["article_152", "article_490", "article_985"]
Example
Marqo Open Source
Marqo Cloud
cURL
Python
curl -XGET http://localhost:8882/indexes/my-first-index/documents -H 'Content-Type: application/json' -d '
["article_152", "article_490", "article_985"]
'
Response 200 OK
{'results': [{'Blurb': 'A rocket car is a car powered by a rocket engine. This '
'treatise proposes that rocket cars are the inevitable '
'future of land-based transport.',
'Title': 'Treatise on the viability of rocket cars',
'_found': true,
'_id': 'article_152'},
{'_found': false, '_id': 'article_490'},
{'Blurb': "One must maintain one's space suite. It is, after all, "
'the tool that will help you explore distant galaxies.',
'Title': 'Your space suit and you',
'_found': true,
'_id': 'article_985'}]}
In this response, the index has no document with and ID of article_490. As a result, the _found field is false.
---
Mappings
The mappings object is a parameter (mappings) for an add_documents call. Mappings can be used for granular control over a field. Currently, it is only supported for the multimodal_combination and custom_vector field types.
When creating a structured index you define weights for a multimodal field under dependent fields. When adding documents mappings is optional with structured indexes and is only needed if the user needs to override default multimodal weights defined at index creation time.
Mappings is used to define custom_vector fields for unstructured indexes only. For structured indexes, do not include custom_vector fields in mappings. Instead, declare them as fields during index creation.
Mappings object
Multimodal Combination Mappings
Defining the mapping for multimodal_combination fields:
my_mappings = {
"my_combination_field": {
"type": "multimodal_combination",
"weights": {"My_image": 0.5, "Some_text": 0.5},
},
"my_2nd_combination_field": {
"type": "multimodal_combination",
"weights": {"Title": -2.5, "Description": 0.3},
},
}
Custom Vector Mappings
Defining the mapping for custom_vector fields (in an unstructured index):
my_mappings = {
"my_custom_audio_vector_1": {"type": "custom_vector"},
"my_custom_audio_vector_2": {"type": "custom_vector"},
}
Adding custom vector documents using that mapping object:
Unstructured Index
Marqo Open Source
Marqo Cloud
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the unstructured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
"type": "unstructured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},
},
{
"_id": "doc2",
"my_custom_audio_vector_2": {
# Put your own vector (of correct length) here.
"vector": example_vector_2,
"content": "Podcast audio file",
},
},
],
tensor_fields=["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
mappings=my_mappings,
)
Structured Index
Marqo Open Source
Marqo Cloud
# Random vectors for example purposes. replace these with your own.
example_vector_1 = [i for i in range(512)]
example_vector_2 = [1 / (i + 1) for i in range(512)]
# Create the structured index
mq = marqo.Client("http://localhost:8882", api_key=None)
settings = {
"type": "structured",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"allFields": [
{"name": "my_custom_audio_vector_1", "type": "custom_vector"},
{"name": "my_custom_audio_vector_2", "type": "custom_vector"},
],
"tensorFields": ["my_custom_audio_vector_1", "my_custom_audio_vector_2"],
}
mq.create_index("my-structured-custom-vector-index", settings_dict=settings)
# Add the custom vectors
mq.index("my-structured-custom-vector-index").add_documents(
documents=[
{
"_id": "doc1",
"my_custom_audio_vector_1": {
# Put your own vector (of correct length) here.
"vector": example_vector_1,
"content": "Singing audio file",
},